Predicting Video Memorability Using a Model Pretrained with Natural Language Supervision

Agarla, Mirko; Luigi, Celona; Raimondo, Schettini

Video memorability prediction aims to quantify how much a given video content will be remembered over time. The main attributes affecting the prediction of memorability are not yet fully understood and many of the methods in the literature are based on features extracted from content recognition models. In this paper we demonstrate that features extracted from a model trained with natural language supervision are effective for estimating video memorability. The proposed method exploits a Vision Transformer pretrained using Contrastive Language-Image Pretraining (CLIP) for encoding video frames. A temporal attention mechanism is then used to select and aggregate relevant frame representations into a video-level feature vector. Finally, a multi-layer perceptron maps the video-level features into a score. We test several types of encoding and temporal aggregation modules and submit our best solution to the MediaEval 2022 Predicting Media Memorability task. We achieve a correlation of 0.707 in subtask 1 (i.e. the Memento10k dataset). In task 2 we obtain a Pearson correlation of 0.487 by training on Memento10k and testing on videoMem and of 0.529 by training on videoMem and testing on Memento10k.

Predicting Video Memorability Using a Model Pretrained with Natural Language Supervision / Agarla, Mirko; Celona, Luigi; Schettini, Raimondo. - 3583:(2023). (Intervento presentato al convegno Working Notes Proceedings of the MediaEval 2022 Workshop tenutosi a Bergen (NOR) nel January 13–15, 2023).

Predicting Video Memorability Using a Model Pretrained with Natural Language Supervision

Agarla Mirko;Celona Luigi;Schettini Raimondo

2023

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2023
			
	Titolo della Serie/Collana
	
				CEUR WORKSHOP PROCEEDINGS
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
paper2382.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 4.97 MB Formato Adobe PDF Visualizza/Apri	4.97 MB	Adobe PDF	Visualizza/Apri
paper18.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 5.24 MB Formato Adobe PDF Visualizza/Apri	5.24 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2982306

PORTO @ Archivio Istituzionale della Ricerca

Predicting Video Memorability Using a Model Pretrained with Natural Language Supervision

Agarla Mirko;Celona Luigi;Schettini Raimondo

2023

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)