In the last few years, the technological advancement of wearable cameras has led to an increasing interest in egocentric (first-person) vision, thanks to its ability to capture activities from the user’s perspective with applications in a variety of different tasks, from human-object interaction to action prediction and anticipation. In contrast, continuous head movement, variations in lighting conditions and differences in the way humans complete the same task represent a source of bias that strengthens the coupling between the model’s predictions and the training domain, affecting its ability to generalize to new environments. Several Domain Adaptation (DA) techniques have been proposed to make models more robust. Among these, Unsupervised Domain Adaptation (UDA) combines labeled source data and unlabeled target data to close the gap between different domains. However, real-world applications require more flexibility, as target samples are often scarce, unrepresentative or even private. Test Time Adaptation (TTA) appears to be a viable solution to these issues, with adaptation performed directly at test time under the simple assumption that input samples provide clues on the actual distribution of the target domain which could be used to improve predictions. With TTA, models undergo multiple adaptation steps at test time by minimizing an adaptation loss on target data and updating normalization statistics. This work provides a comparative analysis of multiple adaptation techniques on the EPIC-Kitchens dataset. Experiments indicate strong accuracy improvements over the unadapted baselines, suggesting that TTA effectively improves model performance in dynamic environments.
Test Time Adaptation for Egocentric Vision / Peirone, SIMONE ALBERTO; Planamente, Mirco; Caputo, Barbara; Averta, Giuseppe. - ELETTRONICO. - 3486:(2023), pp. 42-47. (Intervento presentato al convegno Ital-IA 2023 Thematic Workshops tenutosi a Pisa nel 29-30 Maggio 2023).
Test Time Adaptation for Egocentric Vision
Simone Alberto Peirone;Mirco Planamente;Barbara Caputo;Giuseppe Averta
2023
Abstract
In the last few years, the technological advancement of wearable cameras has led to an increasing interest in egocentric (first-person) vision, thanks to its ability to capture activities from the user’s perspective with applications in a variety of different tasks, from human-object interaction to action prediction and anticipation. In contrast, continuous head movement, variations in lighting conditions and differences in the way humans complete the same task represent a source of bias that strengthens the coupling between the model’s predictions and the training domain, affecting its ability to generalize to new environments. Several Domain Adaptation (DA) techniques have been proposed to make models more robust. Among these, Unsupervised Domain Adaptation (UDA) combines labeled source data and unlabeled target data to close the gap between different domains. However, real-world applications require more flexibility, as target samples are often scarce, unrepresentative or even private. Test Time Adaptation (TTA) appears to be a viable solution to these issues, with adaptation performed directly at test time under the simple assumption that input samples provide clues on the actual distribution of the target domain which could be used to improve predictions. With TTA, models undergo multiple adaptation steps at test time by minimizing an adaptation loss on target data and updating normalization statistics. This work provides a comparative analysis of multiple adaptation techniques on the EPIC-Kitchens dataset. Experiments indicate strong accuracy improvements over the unadapted baselines, suggesting that TTA effectively improves model performance in dynamic environments.File | Dimensione | Formato | |
---|---|---|---|
159.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
514.44 kB
Formato
Adobe PDF
|
514.44 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2982600