Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Planamente, Mirco; Bottino, Andrea; Caputo, Barbara

doi:10.1109/ICPR48806.2021.9411972

Wearable cameras are becoming more and more popular in several applications, increasing the interest of the research community in developing approaches for recognizing actions from the first-person point of view. An open challenge in egocentric action recognition is that videos lack detailed information about the main actor's pose and thus tend to record only parts of the movement when focusing on manipulation tasks. Thus, the amount of information about the action itself is limited, making crucial the understanding of the manipulated objects and their context. Many previous works addressed this issue with two-stream architectures, where one stream is dedicated to modeling the appearance of objects involved in the action, and another to extracting motion features from optical flow. In this paper, we argue that learning features jointly from these two information channels is beneficial to capture the spatio-temporal correlations between the two better. To this end, we propose a single stream architecture able to do so, thanks to the addition of a self-supervised block that uses a pretext motion prediction task to intertwine motion and appearance knowledge. Experiments on several publicly available databases show the power of our approach.

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition / Planamente, Mirco; Bottino, Andrea; Caputo, Barbara. - STAMPA. - (2021), pp. 8751-8758. (Intervento presentato al convegno 25th International Conference on Pattern Recognition, ICPR 2020 tenutosi a Milano nel 10-15 January 2021) [10.1109/ICPR48806.2021.9411972].

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Mirco Planamente;Andrea Bottino;Barbara Caputo

2021

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Anno del prodotto

2021

Appare nelle tipologie

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ICPR2020.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 3.55 MB Formato Adobe PDF Visualizza/Apri	3.55 MB	Adobe PDF	Visualizza/Apri
Self-Supervised_Joint_Encoding_of_Motion_and_Appearance_for_First_Person_Action_Recognition.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 6.85 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	6.85 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2848499

PORTO @ Archivio Istituzionale della Ricerca

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Mirco Planamente;Andrea Bottino;Barbara Caputo

2021

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)