Visual Place Recognition is a task that aims to predict the place of an image (called query) based solely on its visual features. This is typically done through image retrieval, where the query is matched to the most similar images from a large database of geotagged photos, using learned global descriptors. A major challenge in this task is recognizing places seen from different viewpoints. To overcome this limitation, we propose a new method, called EigenPlaces, to train our neural network on images from different point of views, which embeds viewpoint robustness into the learned global descriptors. The underlying idea is to cluster the training data so as to explicitly present the model with different views of the same points of interest. The selection of this points of interest is done without the need for extra supervision. We then present experiments on the most comprehensive set of datasets in literature, finding that EigenPlaces is able to outperform previous state of the art on the majority of datasets, while requiring 60% less GPU memory for training and using 50% smaller descriptors. The code and trained models for EigenPlaces are available at https://github.com/gmberton/EigenPlaces, while results with any other baseline can be computed with the codebase at https://github.com/gmberton/auto_VPR.
EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition / Berton, Gabriele; Trivigno, Gabriele; Caputo, Barbara; Masone, Carlo. - ELETTRONICO. - (2023), pp. 11046-11056. (Intervento presentato al convegno IEEE International Conference on Computer Vision (ICCV) tenutosi a Paris (FRA) nel 01-06 October 2023) [10.1109/ICCV51070.2023.01017].
EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition
Gabriele Berton;Gabriele Trivigno;Barbara Caputo;Carlo Masone
2023
Abstract
Visual Place Recognition is a task that aims to predict the place of an image (called query) based solely on its visual features. This is typically done through image retrieval, where the query is matched to the most similar images from a large database of geotagged photos, using learned global descriptors. A major challenge in this task is recognizing places seen from different viewpoints. To overcome this limitation, we propose a new method, called EigenPlaces, to train our neural network on images from different point of views, which embeds viewpoint robustness into the learned global descriptors. The underlying idea is to cluster the training data so as to explicitly present the model with different views of the same points of interest. The selection of this points of interest is done without the need for extra supervision. We then present experiments on the most comprehensive set of datasets in literature, finding that EigenPlaces is able to outperform previous state of the art on the majority of datasets, while requiring 60% less GPU memory for training and using 50% smaller descriptors. The code and trained models for EigenPlaces are available at https://github.com/gmberton/EigenPlaces, while results with any other baseline can be computed with the codebase at https://github.com/gmberton/auto_VPR.File | Dimensione | Formato | |
---|---|---|---|
2308.10832.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
6.29 MB
Formato
Adobe PDF
|
6.29 MB | Adobe PDF | Visualizza/Apri |
EigenPlaces_Training_Viewpoint_Robust_Models_for_Visual_Place_Recognition.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
2.74 MB
Formato
Adobe PDF
|
2.74 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2982368