Visual Place Recognition is a task that aims to predict the place of an image (called query) based solely on its visual features. This is typically done through image retrieval, where the query is matched to the most similar images from a large database of geotagged photos, using learned global descriptors. A major challenge in this task is recognizing places seen from different viewpoints. To overcome this limitation, we propose a new method, called EigenPlaces, to train our neural network on images from different point of views, which embeds viewpoint robustness into the learned global descriptors. The underlying idea is to cluster the training data so as to explicitly present the model with different views of the same points of interest. The selection of this points of interest is done without the need for extra supervision. We then present experiments on the most comprehensive set of datasets in literature, finding that EigenPlaces is able to outperform previous state of the art on the majority of datasets, while requiring 60% less GPU memory for training and using 50% smaller descriptors. The code and trained models for EigenPlaces are available at https://github.com/gmberton/EigenPlaces, while results with any other baseline can be computed with the codebase at https://github.com/gmberton/auto_VPR.

EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition / Berton, Gabriele; Trivigno, Gabriele; Caputo, Barbara; Masone, Carlo. - ELETTRONICO. - (2023), pp. 11046-11056. (Intervento presentato al convegno IEEE International Conference on Computer Vision (ICCV) tenutosi a Paris (FRA) nel 01-06 October 2023) [10.1109/ICCV51070.2023.01017].

EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition

Gabriele Berton;Gabriele Trivigno;Barbara Caputo;Carlo Masone
2023

Abstract

Visual Place Recognition is a task that aims to predict the place of an image (called query) based solely on its visual features. This is typically done through image retrieval, where the query is matched to the most similar images from a large database of geotagged photos, using learned global descriptors. A major challenge in this task is recognizing places seen from different viewpoints. To overcome this limitation, we propose a new method, called EigenPlaces, to train our neural network on images from different point of views, which embeds viewpoint robustness into the learned global descriptors. The underlying idea is to cluster the training data so as to explicitly present the model with different views of the same points of interest. The selection of this points of interest is done without the need for extra supervision. We then present experiments on the most comprehensive set of datasets in literature, finding that EigenPlaces is able to outperform previous state of the art on the majority of datasets, while requiring 60% less GPU memory for training and using 50% smaller descriptors. The code and trained models for EigenPlaces are available at https://github.com/gmberton/EigenPlaces, while results with any other baseline can be computed with the codebase at https://github.com/gmberton/auto_VPR.
2023
979-8-3503-0718-4
File in questo prodotto:
File Dimensione Formato  
2308.10832.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 6.29 MB
Formato Adobe PDF
6.29 MB Adobe PDF Visualizza/Apri
EigenPlaces_Training_Viewpoint_Robust_Models_for_Visual_Place_Recognition.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.74 MB
Formato Adobe PDF
2.74 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2982368