Select2Plan: Training-Free ICL-Based Planning Through VQA and Memory Retrieval

Buoso, Davide; Robinson, Luke; Averta, Giuseppe; Torr, Philip; Franzmeyer, Tim; Daniele De Martini,

doi:10.1109/lra.2025.3606790

We introduce Select2Plan (S2P), a novel training-free framework for high-level robot planning that leverages off-the-shelf Vision-Language Models (VLMs) for autonomous navigation. Unlike most learning-based approaches that require extensive task-specific training and large-scale data collection, S2P overcomes the need for fine-tuning by adapting inputs to align with the VLM's pretraining data. Our method achieves this through a combination of structured Visual Question Answering (VQA) to ground action selection on the image, and In-Context Learning (ICL) to exploit knowledge drawn from relevant examples from a memory bank of (visually) annotated data, which can include diverse, in-the-wild sources. We demonstrate S2P flexibility by evaluating it in both First-Person View (FPV) and Third-Person View (TPV) navigation. S2P improves the performance of a baseline VLM by 40% in TPV and surpasses end-to-end trained models by approximately 24% in FPV when tasked with navigating towards unseen objects in novel scenes. These results highlight the adaptability, simplicity, and effectiveness of our training-free approach, demonstrating that the use of pre-trained VLMs with structured memory retrieval enables robust high-level robot planning without costly task-specific training. Our experiments also show that retrieving samples from heterogeneous data sources, including online videos of different robots or humans walking, is highly beneficial for navigation. Notably, our method effectively generalizes to novel scenarios, requiring only a handful of demonstrations. Project Page: lambdavi.github.io/select2plan

Select2Plan: Training-Free ICL-Based Planning Through VQA and Memory Retrieval / Buoso, Davide; Robinson, Luke; Averta, Giuseppe; Torr, Philip; Franzmeyer, Tim; De Martini, Daniele. - In: IEEE ROBOTICS AND AUTOMATION LETTERS. - ISSN 2377-3766. - 10:11(2025), pp. 11267-11274. [10.1109/lra.2025.3606790]

Select2Plan: Training-Free ICL-Based Planning Through VQA and Memory Retrieval

Davide Buoso;Luke Robinson;Giuseppe Averta;Philip Torr;Tim Franzmeyer;Daniele De Martini

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice DOI
	
				https://dx.doi.org/10.1109/lra.2025.3606790
			
	Titolo della Rivista
	
				IEEE ROBOTICS AND AUTOMATION LETTERS
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Select2Plan_Training-Free_ICL-Based_Planning_Through_VQA_and_Memory_Retrieval.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 7.56 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	7.56 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
select2plan_buoso_arxiv_2025.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 3.24 MB Formato Adobe PDF Visualizza/Apri	3.24 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003790

PORTO @ Archivio Istituzionale della Ricerca

Select2Plan: Training-Free ICL-Based Planning Through VQA and Memory Retrieval

Davide Buoso;Luke Robinson;Giuseppe Averta;Philip Torr;Tim Franzmeyer;Daniele De Martini

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)