Recognizing human emotions from videos requires a deep understanding of the underlying multimodal sources, including images, audio, and text. Since the input data sources are highly variable across different modality combinations, leveraging multiple modalities often requires ad hoc fusion networks. To predict the emotional arousal of a person reacting to a given video clip we present ViPER, a multimodal architecture leveraging a modality-agnostic transformer-based model to combine video frames, audio recordings, and textual annotations. Specifically, it relies on a modality-agnostic late fusion network which makes ViPER easily adaptable to different modalities. The experiments carried out on the Hume-Reaction datasets of the MuSe-Reaction challenge confirm the effectiveness of the proposed approach.
ViPER: Video-based Perceiver for Emotion Recognition / Vaiani, Lorenzo; LA QUATRA, Moreno; Cagliero, Luca; Garza, Paolo. - ELETTRONICO. - (2022), pp. 67-73. (Intervento presentato al convegno Multimodal Sentiment Analysis Challenge (MuSe 2022) tenutosi a Lisbon (PT) nel October 10-15, 2022) [10.1145/3551876.3554806].
ViPER: Video-based Perceiver for Emotion Recognition
Lorenzo Vaiani;Moreno La Quatra;Luca Cagliero;Paolo Garza
2022
Abstract
Recognizing human emotions from videos requires a deep understanding of the underlying multimodal sources, including images, audio, and text. Since the input data sources are highly variable across different modality combinations, leveraging multiple modalities often requires ad hoc fusion networks. To predict the emotional arousal of a person reacting to a given video clip we present ViPER, a multimodal architecture leveraging a modality-agnostic transformer-based model to combine video frames, audio recordings, and textual annotations. Specifically, it relies on a modality-agnostic late fusion network which makes ViPER easily adaptable to different modalities. The experiments carried out on the Hume-Reaction datasets of the MuSe-Reaction challenge confirm the effectiveness of the proposed approach.File | Dimensione | Formato | |
---|---|---|---|
MuSe2022_reaction_challenge.pdf
accesso aperto
Descrizione: post-print authors version
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
815.2 kB
Formato
Adobe PDF
|
815.2 kB | Adobe PDF | Visualizza/Apri |
3551876.3554806.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
1.25 MB
Formato
Adobe PDF
|
1.25 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2971099