Recent transformer-based architectures have shown impressive results in the field of image segmentation. Thanks to their flexibility, they obtain outstanding performance in multiple segmentation tasks, such as semantic and panoptic, under a single unified framework. To achieve such impressive performance, these architectures employ intensive operations and require substantial computational resources, which are often not available, especially on edge devices. To fill this gap, we propose Prototype-based Efficient MaskFormer (PEM), an efficient transformer-based architecture that can operate in multiple segmentation tasks. PEM proposes a novel prototype-based cross-attention which leverages the redundancy of visual features to restrict the computation and improve the efficiency without harming the performance. In addition, PEM introduces an efficient multi-scale feature pyramid network, capable of extracting features that have high semantic content in an efficient way, thanks to the combination of deformable convolutions and context-based self-modulation. We benchmark the proposed PEM architecture on two tasks, semantic and panoptic segmentation, evaluated on two different datasets, Cityscapes and ADE20K. PEM demonstrates outstanding performance on every task and dataset, outperforming task-specific architectures while being comparable and even better than computationally-expensive baselines.

PEM: Prototype-Based Efficient MaskFormer for Image Segmentation / Cavagnero, Niccolo; Rosi, Gabriele; Cuttano, Claudia; Pistilli, Francesca; Ciccone, Marco; Averta, Giuseppe; Cermelli, Fabio. - ELETTRONICO. - 34:(2024), pp. 15804-15813. (Intervento presentato al convegno Conference on Computer Vision and Pattern Recognition (CVPR) tenutosi a Seattle WA (USA) nel 16-22 June 2024) [10.1109/cvpr52733.2024.01496].

PEM: Prototype-Based Efficient MaskFormer for Image Segmentation

Cavagnero, Niccolo;Rosi, Gabriele;Cuttano, Claudia;Pistilli, Francesca;Ciccone, Marco;Averta, Giuseppe;Cermelli, Fabio
2024

Abstract

Recent transformer-based architectures have shown impressive results in the field of image segmentation. Thanks to their flexibility, they obtain outstanding performance in multiple segmentation tasks, such as semantic and panoptic, under a single unified framework. To achieve such impressive performance, these architectures employ intensive operations and require substantial computational resources, which are often not available, especially on edge devices. To fill this gap, we propose Prototype-based Efficient MaskFormer (PEM), an efficient transformer-based architecture that can operate in multiple segmentation tasks. PEM proposes a novel prototype-based cross-attention which leverages the redundancy of visual features to restrict the computation and improve the efficiency without harming the performance. In addition, PEM introduces an efficient multi-scale feature pyramid network, capable of extracting features that have high semantic content in an efficient way, thanks to the combination of deformable convolutions and context-based self-modulation. We benchmark the proposed PEM architecture on two tasks, semantic and panoptic segmentation, evaluated on two different datasets, Cityscapes and ADE20K. PEM demonstrates outstanding performance on every task and dataset, outperforming task-specific architectures while being comparable and even better than computationally-expensive baselines.
2024
979-8-3503-5300-6
File in questo prodotto:
File Dimensione Formato  
Cavagnero_PEM_Prototype-based_Efficient_MaskFormer_for_Image_Segmentation_CVPR_2024_paper.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 3.12 MB
Formato Adobe PDF
3.12 MB Adobe PDF Visualizza/Apri
PEM_Prototype-Based_Efficient_MaskFormer_for_Image_Segmentation.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.54 MB
Formato Adobe PDF
2.54 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2993117