Seeing the Abstract: Translating the Abstract Language for Vision Language Models

Talon, Davide; Girella, Federico; Liu, Ziyue; Cristani, Marco; Wang, Yiming

doi:10.1109/cvpr52734.2025.00864

Natural language goes beyond dryly describing visual content. It contains rich abstract concepts to express feeling, creativity and properties that cannot be directly perceived. Yet, current research in Vision Language Models (VLMs) has not shed light on abstract-oriented language. Our research breaks new ground by uncovering its wide presence and under-estimated value, with extensive analysis. Particularly, we focus our investigation on the fashion domain, a highly-representative field with abstract expressions. By analyzing recent large-scale multimodal fashion datasets, we find that abstract terms have a dominant presence, rivaling the concrete ones, providing novel information, and being useful in the retrieval task. However, a critical challenge emerges: current general-purpose or fashion-specific VLMs are pre-trained with databases that lack sufficient abstract words in their text corpora, thus hindering their ability to effectively represent abstract-oriented language. We propose a training-free and model-agnostic method, Abstract-to-Concrete Translator (ACT), to shift abstract representations towards well-represented concrete ones in the VLM latent space, using pre-trained models and existing multi-modal databases. On the text-to-image retrieval task, despite being training-free, ACT outperforms the fine-tuned VLMs in both same- and cross-dataset settings, exhibiting its effectiveness with a strong generalization capability. Moreover, the improvement introduced by ACT is consistent with various VLMs, making it a plug-and-play solution.

Seeing the Abstract: Translating the Abstract Language for Vision Language Models / Talon, D., Girella, F., Liu, Z., Cristani, M., Wang, Y.. - (2025), pp. 9253-9262. (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Nashville (USA) 10-17 June 2025) [10.1109/cvpr52734.2025.00864].

Seeing the Abstract: Translating the Abstract Language for Vision Language Models

Talon, Davide;Girella, Federico;Liu, Ziyue;Cristani, Marco;Wang, Yiming

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice ISBN
	
				979-8-3315-4364-8
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Seeing_the_Abstract_Translating_the_Abstract_Language_for_Vision_Language_Models.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 2.17 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.17 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3011882

PORTO @ Archivio Istituzionale della Ricerca

Seeing the Abstract: Translating the Abstract Language for Vision Language Models

Talon, Davide;Girella, Federico;Liu, Ziyue;Cristani, Marco;Wang, Yiming

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)