Zero-Shot Content-Based Crossmodal Recommendation System

D'Asaro, Federico; De Luca Sara,; Lorenzo, Bongiovanni; Rizzo, Giuseppe; Symeon, Papadopoulos; Manos, Schinas; Christos, Koutlis

doi:10.1016/j.eswa.2024.125108

Information Recommendation (IR) systems are conventionally designed to operate within a single modality at a time, such as Text2Text or Image2Image. However, the concept of cross-modality aims to facilitate a versatile recommendation experience across different modalities, such as Text2Image. In recent years, significant strides have been made in developing neural recommender models that are multimodal and capable of generalizing across a broad spectrum of domains in a zero-shot manner, thanks to the robust representation capabilities of neural networks. These architectures enable the generation of embeddings for assets (i.e., content uploaded on a platform by users), presenting a concise representation of their semantics and allowing for comparisons through similarity ranking. In this paper, we present ZCCR, a Zero-shot Content-based Crossmodal Recommendation System that leverages knowledge from large-scale pretrained Vision-Language Models (VLMs) such as CLIP and ALBEF to redefine the recommendation task as a zero-shot retrieval task, eliminating the need for labeled data or prior knowledge about the recommended content. Furthermore, ZCCR performs crossmodal similarity search on an optimized index, such as FAISS, to improve the speed of recommendation. The goal is to recommend to the user assets that are similar to those they have previously uploaded, commonly referred to as their user profile. Within the user profile, we identify "areas of interest"-groups of assets associated with specific user interests, such as cooking, sports, or cars. To identify these areas of interest and construct the search query for the retrieval operation, ZCCR employs an innovative use of Agglomerative Clustering. This technique groups user past assets by similarity without requiring prior knowledge of the number of clusters. Once the areas of interest, or clusters, are identified, the centroid is utilized as the search seed to find similar assets. Experimental results demonstrate the efficiency of the selected components in terms of search time and retrieval performance on modified MSCOCO and FLICKR30k datasets tailored for the recommendation task. Furthermore, ZCCR outperforms both a baseline tagging system (BT) and a more advanced tag system which utilizes a Large Language Model (LLM) to extract embeddings from tags. The results show that, even compared with the latter, embeddings directly extracted from raw assets yield superior outcomes compared to relying on intermediate tags generated by other tools. The code implementation to reproduce all experiments and results shown in this paper is provided at the following link: ZCCR-experiments.

Zero-Shot Content-Based Crossmodal Recommendation System / D'Asaro, Federico; De Luca, Sara; Bongiovanni, Lorenzo; Rizzo, Giuseppe; Papadopoulos, Symeon; Schinas, Manos; Koutlis, Christos. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - 258:(2024). [10.1016/j.eswa.2024.125108]

Zero-Shot Content-Based Crossmodal Recommendation System

D'Asaro Federico;De Luca Sara;Bongiovanni Lorenzo;Rizzo Giuseppe;Papadopoulos Symeon;Schinas Manos;Koutlis Christos

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.eswa.2024.125108
			
	Titolo della Rivista
	
				EXPERT SYSTEMS WITH APPLICATIONS
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0957417424019754-main.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 5.34 MB Formato Adobe PDF Visualizza/Apri	5.34 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2992321

PORTO @ Archivio Istituzionale della Ricerca

Zero-Shot Content-Based Crossmodal Recommendation System

D'Asaro Federico;De Luca Sara;Bongiovanni Lorenzo;Rizzo Giuseppe;Papadopoulos Symeon;Schinas Manos;Koutlis Christos

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)