Show or Tell? A Benchmark to Evaluate Visual and Textual Prompts in Semantic Segmentation

Rosi, Gabriele; Cermelli, Fabio

doi:10.1109/cvprw67362.2025.00399

Prompt engineering has shown remarkable success with large language models, yet its systematic exploration in computer vision remains limited. In semantic segmentation, both textual and visual prompts offer distinct advantages: textual prompts through open-vocabulary methods allow segmentation of arbitrary categories, while visual reference prompts provide intuitive reference examples. However, existing benchmarks evaluate these modalities in isolation, without direct comparison under identical conditions. We present Show or Tell (SoT), a novel benchmark specifically designed to evaluate both visual and textual prompts for semantic segmentation across 14 datasets spanning 7 diverse domains (common scenes, urban, food, waste, parts, tools, and land-cover). We evaluate 5 open-vocabulary methods and 4 visual reference prompt approaches, adapting the latter to handle multi-class segmentation through a confidence-based mask merging strategy. Our extensive experiments reveal that open-vocabulary methods excel with common concepts easily described by text but struggle with complex domains like tools while visual reference prompt methods achieve good average results but exhibit high variability depending on the input prompt. Through comprehensive quantitative and qualitative analysis, we identify the strengths and weaknesses of both prompting modalities, providing valuable insights to guide future research in vision foundation models for segmentation tasks. Code is available at https://github.com/FocoosAI/ShowOrTell.

Show or Tell? A Benchmark to Evaluate Visual and Textual Prompts in Semantic Segmentation / Rosi, Gabriele; Cermelli, Fabio. - (2025), pp. 4153-4163. ( Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Nashville TN (USA) 11-15 June 2025) [10.1109/cvprw67362.2025.00399].

Show or Tell? A Benchmark to Evaluate Visual and Textual Prompts in Semantic Segmentation

Rosi, Gabriele;Cermelli, Fabio

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice ISBN
	
				979-8-3315-9994-2
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Rosi_Show_or_Tell_A_Benchmark_To_Evaluate_Visual_and_Textual_CVPRW_2025_paper.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 6.07 MB Formato Adobe PDF Visualizza/Apri	6.07 MB	Adobe PDF	Visualizza/Apri
Show_or_Tell_A_Benchmark_to_Evaluate_Visual_and_Textual_Prompts_in_Semantic_Segmentation.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 5.96 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	5.96 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003861

PORTO @ Archivio Istituzionale della Ricerca

Show or Tell? A Benchmark to Evaluate Visual and Textual Prompts in Semantic Segmentation

Rosi, Gabriele;Cermelli, Fabio

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)