In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors incorporating semantic and appear ance information is beneficial, current state-of-the-art methods opt for a top down definition of the significant semantic content. Here we present the first VPR algorithm (Code and dataset are available at: https:// github.com/valeriopaolicelli/SegVPR) that learns robust global embed dings from both visual appearance and semantic content of the data, with the segmentation process being dynamically guided by the recognition of places through a multi-scale attention module. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods. Finally, we propose the first synthetic world dataset suited for both place recognition and segmentation tasks.

Learning Semantics for Visual Place Recognition Through Multi-scale Attention / Paolicelli, Valerio; Tavera, Antonio; Masone, Carlo; Berton, GABRIELE MORENO; Caputo, Barbara. - 2:(2022), pp. 454-466. (Intervento presentato al convegno International Conference on Image Analysis and Processing tenutosi a Lecce nel May 23–27, 2022) [10.1007/978-3-031-06430-2_38].

Learning Semantics for Visual Place Recognition Through Multi-scale Attention

Valerio Paolicelli;Antonio Tavera;Carlo Masone;Gabriele Berton;Barbara Caputo
2022

Abstract

In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors incorporating semantic and appear ance information is beneficial, current state-of-the-art methods opt for a top down definition of the significant semantic content. Here we present the first VPR algorithm (Code and dataset are available at: https:// github.com/valeriopaolicelli/SegVPR) that learns robust global embed dings from both visual appearance and semantic content of the data, with the segmentation process being dynamically guided by the recognition of places through a multi-scale attention module. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods. Finally, we propose the first synthetic world dataset suited for both place recognition and segmentation tasks.
2022
978-3-031-06429-6
978-3-031-06430-2
File in questo prodotto:
File Dimensione Formato  
Paolicelli2022_Chapter_LearningSemanticsForVisualPlac.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.48 MB
Formato Adobe PDF
2.48 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2970160