Prompt-based image segmentation has revolutionized computer vision by enabling more adaptive and efficient segmentation through prompts. In the context of image segmentation, the term prompt broadly refers to any auxiliary input, such as clicks, boxes, scribbles, support sets or free-form text that guides a model’s segmentation behavior. These inputs operate as task-specific signals that enable models to adapt their segmentation behavior to different contexts and objectives. This survey categorizes promptable image segmentation into five primary areas: interactive segmentation, referring segmentation, few-shot semantic segmentation, open vocabulary segmentation and foundation models.The authors explore how different prompting strategies improve segmentation performance while enabling few-shot learning and reducing reliance on extensive labeled datasets. The discussion highlights the role of foundation models in advancing segmentation capabilities by integrating separate components of these complex models and leveraging multimodal interactions.By synthesizing state-of-the-art techniques, this study provides a structured taxonomy, identifies key challenges in multimodal fusion and generalization and outlines future directions for developing more intelligent and adaptable segmentation systems.

Promptable image segmentation: a survey of guided input techniques / Nejabat, Hadi; D'Asaro, Federico; Pecora, Alessandro Emmanuel; Monopoli, Tommaso; Bottino, Andrea. - In: FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION. - ISSN 1572-2740. - ELETTRONICO. - 18:1(2026), pp. 1-139. [10.1108/FTCGV-03-2026-001]

Promptable image segmentation: a survey of guided input techniques

Hadi Nejabat;Federico D'Asaro;Alessandro Emmanuel Pecora;Andrea Bottino
2026

Abstract

Prompt-based image segmentation has revolutionized computer vision by enabling more adaptive and efficient segmentation through prompts. In the context of image segmentation, the term prompt broadly refers to any auxiliary input, such as clicks, boxes, scribbles, support sets or free-form text that guides a model’s segmentation behavior. These inputs operate as task-specific signals that enable models to adapt their segmentation behavior to different contexts and objectives. This survey categorizes promptable image segmentation into five primary areas: interactive segmentation, referring segmentation, few-shot semantic segmentation, open vocabulary segmentation and foundation models.The authors explore how different prompting strategies improve segmentation performance while enabling few-shot learning and reducing reliance on extensive labeled datasets. The discussion highlights the role of foundation models in advancing segmentation capabilities by integrating separate components of these complex models and leveraging multimodal interactions.By synthesizing state-of-the-art techniques, this study provides a structured taxonomy, identifies key challenges in multimodal fusion and generalization and outlines future directions for developing more intelligent and adaptable segmentation systems.
File in questo prodotto:
File Dimensione Formato  
SEG_paper_NOWP_Manuscript-2.pdf

accesso riservato

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 15.2 MB
Formato Adobe PDF
15.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
SEG_paper_openaccess.pdf

accesso aperto

Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 14.96 MB
Formato Adobe PDF
14.96 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3009210