In the last few years, the technological advancement of wearable cameras has led to an increasing interest in egocentric (first-person) vision, thanks to its ability to capture activities from the user’s perspective with applications in a variety of different tasks, from human-object interaction to action prediction and anticipation. In contrast, continuous head movement, variations in lighting conditions and differences in the way humans complete the same task represent a source of bias that strengthens the coupling between the model’s predictions and the training domain, affecting its ability to generalize to new environments. Several Domain Adaptation (DA) techniques have been proposed to make models more robust. Among these, Unsupervised Domain Adaptation (UDA) combines labeled source data and unlabeled target data to close the gap between different domains. However, real-world applications require more flexibility, as target samples are often scarce, unrepresentative or even private. Test Time Adaptation (TTA) appears to be a viable solution to these issues, with adaptation performed directly at test time under the simple assumption that input samples provide clues on the actual distribution of the target domain which could be used to improve predictions. With TTA, models undergo multiple adaptation steps at test time by minimizing an adaptation loss on target data and updating normalization statistics. This work provides a comparative analysis of multiple adaptation techniques on the EPIC-Kitchens dataset. Experiments indicate strong accuracy improvements over the unadapted baselines, suggesting that TTA effectively improves model performance in dynamic environments.

Test Time Adaptation for Egocentric Vision / Peirone, SIMONE ALBERTO; Planamente, Mirco; Caputo, Barbara; Averta, Giuseppe. - ELETTRONICO. - 3486:(2023), pp. 42-47. (Intervento presentato al convegno Ital-IA 2023 Thematic Workshops tenutosi a Pisa nel 29-30 Maggio 2023).

Test Time Adaptation for Egocentric Vision

Simone Alberto Peirone;Mirco Planamente;Barbara Caputo;Giuseppe Averta
2023

Abstract

In the last few years, the technological advancement of wearable cameras has led to an increasing interest in egocentric (first-person) vision, thanks to its ability to capture activities from the user’s perspective with applications in a variety of different tasks, from human-object interaction to action prediction and anticipation. In contrast, continuous head movement, variations in lighting conditions and differences in the way humans complete the same task represent a source of bias that strengthens the coupling between the model’s predictions and the training domain, affecting its ability to generalize to new environments. Several Domain Adaptation (DA) techniques have been proposed to make models more robust. Among these, Unsupervised Domain Adaptation (UDA) combines labeled source data and unlabeled target data to close the gap between different domains. However, real-world applications require more flexibility, as target samples are often scarce, unrepresentative or even private. Test Time Adaptation (TTA) appears to be a viable solution to these issues, with adaptation performed directly at test time under the simple assumption that input samples provide clues on the actual distribution of the target domain which could be used to improve predictions. With TTA, models undergo multiple adaptation steps at test time by minimizing an adaptation loss on target data and updating normalization statistics. This work provides a comparative analysis of multiple adaptation techniques on the EPIC-Kitchens dataset. Experiments indicate strong accuracy improvements over the unadapted baselines, suggesting that TTA effectively improves model performance in dynamic environments.
2023
File in questo prodotto:
File Dimensione Formato  
159.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 514.44 kB
Formato Adobe PDF
514.44 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2982600