In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors incorporating semantic and appear ance information is beneficial, current state-of-the-art methods opt for a top down definition of the significant semantic content. Here we present the first VPR algorithm (Code and dataset are available at: https:// github.com/valeriopaolicelli/SegVPR) that learns robust global embed dings from both visual appearance and semantic content of the data, with the segmentation process being dynamically guided by the recognition of places through a multi-scale attention module. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods. Finally, we propose the first synthetic world dataset suited for both place recognition and segmentation tasks.
Learning Semantics for Visual Place Recognition Through Multi-scale Attention / Paolicelli, Valerio; Tavera, Antonio; Masone, Carlo; Berton, GABRIELE MORENO; Caputo, Barbara. - 2:(2022), pp. 454-466. (Intervento presentato al convegno International Conference on Image Analysis and Processing tenutosi a Lecce nel May 23–27, 2022) [10.1007/978-3-031-06430-2_38].
Learning Semantics for Visual Place Recognition Through Multi-scale Attention
Valerio Paolicelli;Antonio Tavera;Carlo Masone;Gabriele Berton;Barbara Caputo
2022
Abstract
In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors incorporating semantic and appear ance information is beneficial, current state-of-the-art methods opt for a top down definition of the significant semantic content. Here we present the first VPR algorithm (Code and dataset are available at: https:// github.com/valeriopaolicelli/SegVPR) that learns robust global embed dings from both visual appearance and semantic content of the data, with the segmentation process being dynamically guided by the recognition of places through a multi-scale attention module. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods. Finally, we propose the first synthetic world dataset suited for both place recognition and segmentation tasks.File | Dimensione | Formato | |
---|---|---|---|
Paolicelli2022_Chapter_LearningSemanticsForVisualPlac.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
2.48 MB
Formato
Adobe PDF
|
2.48 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2970160