Emotion Recognition from Videos Using Multimodal Large Language Models

Vaiani, Lorenzo; Cagliero, Luca; Garza, Paolo

doi:10.3390/fi16070247

The diffusion of Multimodal Large Language Models (MLLMs) has opened new research directions in the context of video content understanding and classification. Emotion recognition from videos aims to automatically detect human emotions such as anxiety and fear. It requires deeply elaborating multiple data modalities, including acoustic and visual streams. State-of-the-art approaches leverage transformer-based architectures to combine multimodal sources. However, the impressive performance of MLLMs in content retrieval and generation offers new opportunities to extend the capabilities of existing emotion recognizers. This paper explores the performance of MLLMs in the emotion recognition task in a zero-shot learning setting. Furthermore, it presents a state-of-the-art architecture extension based on MLLM content reformulation. The performance achieved on the Hume-Reaction benchmark shows that MLLMs are still unable to outperform the state-of-the-art average performance but, notably, are more effective than traditional transformers in recognizing emotions with an intensity that deviates from the average of the samples.

Emotion Recognition from Videos Using Multimodal Large Language Models / Vaiani, Lorenzo; Cagliero, Luca; Garza, Paolo. - In: FUTURE INTERNET. - ISSN 1999-5903. - 16:7(2024). [10.3390/fi16070247]

Emotion Recognition from Videos Using Multimodal Large Language Models

Lorenzo Vaiani;Luca Cagliero;Paolo Garza

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Codice DOI
	
				https://dx.doi.org/10.3390/fi16070247
			
	Titolo della Rivista
	
				FUTURE INTERNET
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
futureinternet-16-00247-v2.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 2.94 MB Formato Adobe PDF Visualizza/Apri	2.94 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2990937

PORTO @ Archivio Istituzionale della Ricerca

Emotion Recognition from Videos Using Multimodal Large Language Models

Lorenzo Vaiani;Luca Cagliero;Paolo Garza

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)