Prompt-based image segmentation has revolutionized computer vision by enabling more adaptive and efficient segmentation through prompts. In the context of image segmentation, the term prompt broadly refers to any auxiliary input, such as clicks, boxes, scribbles, support sets or free-form text that guides a model’s segmentation behavior. These inputs operate as task-specific signals that enable models to adapt their segmentation behavior to different contexts and objectives. This survey categorizes promptable image segmentation into five primary areas: interactive segmentation, referring segmentation, few-shot semantic segmentation, open vocabulary segmentation and foundation models.The authors explore how different prompting strategies improve segmentation performance while enabling few-shot learning and reducing reliance on extensive labeled datasets. The discussion highlights the role of foundation models in advancing segmentation capabilities by integrating separate components of these complex models and leveraging multimodal interactions.By synthesizing state-of-the-art techniques, this study provides a structured taxonomy, identifies key challenges in multimodal fusion and generalization and outlines future directions for developing more intelligent and adaptable segmentation systems.
Promptable image segmentation: a survey of guided input techniques / Nejabat, Hadi; D'Asaro, Federico; Pecora, Alessandro Emmanuel; Monopoli, Tommaso; Bottino, Andrea. - In: FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION. - ISSN 1572-2740. - ELETTRONICO. - 18:1(2026), pp. 1-139. [10.1108/FTCGV-03-2026-001]
Promptable image segmentation: a survey of guided input techniques
Hadi Nejabat;Federico D'Asaro;Alessandro Emmanuel Pecora;Andrea Bottino
2026
Abstract
Prompt-based image segmentation has revolutionized computer vision by enabling more adaptive and efficient segmentation through prompts. In the context of image segmentation, the term prompt broadly refers to any auxiliary input, such as clicks, boxes, scribbles, support sets or free-form text that guides a model’s segmentation behavior. These inputs operate as task-specific signals that enable models to adapt their segmentation behavior to different contexts and objectives. This survey categorizes promptable image segmentation into five primary areas: interactive segmentation, referring segmentation, few-shot semantic segmentation, open vocabulary segmentation and foundation models.The authors explore how different prompting strategies improve segmentation performance while enabling few-shot learning and reducing reliance on extensive labeled datasets. The discussion highlights the role of foundation models in advancing segmentation capabilities by integrating separate components of these complex models and leveraging multimodal interactions.By synthesizing state-of-the-art techniques, this study provides a structured taxonomy, identifies key challenges in multimodal fusion and generalization and outlines future directions for developing more intelligent and adaptable segmentation systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
SEG_paper_NOWP_Manuscript-2.pdf
accesso riservato
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
15.2 MB
Formato
Adobe PDF
|
15.2 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
|
SEG_paper_openaccess.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
14.96 MB
Formato
Adobe PDF
|
14.96 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3009210
