Recent advancements in generative AI have enabled the integration of Large Language Models (LLMs) into Virtual Environments (VEs). This integration has been particularly useful to create Embodied Conversational Agents (ECAs) capable of engaging in meaningful interactions. However, such ECAs often lack environmental awareness, which may limit interaction and conversation quality. A potential solution could be to use Vision Language Models (VLMs), possibly empowering ECAs with structured environmental knowledge. However, the use of VLMs for such applications is severely underexplored. This paper explores how VLMs can be employed to integrate environmental awareness in ECAs and investigates their impact on player perception. To this end, an architecture leveraging multi-image inference and scene graph generation was designed. A within-subject user study was then conducted in a virtual reality game, comparing ECAs with and without environment-aware capabilities. The evaluation considered several dimensions, including perceived knowledge, intelligence, factuality, willingness of future interaction and conversational abilities.

Investigating player perception of environment-aware embodied conversational agents enabled by vision language models / Fiorenza, Jacopo; Thawonmas, Ruck; Calandra, Davide; Lamberti, Fabrizio. - ELETTRONICO. - (In corso di stampa). ( 2026 IEEE 5th International Conference on Intelligent Reality (ICIR 2026) Pisa (IT) June 25 - 26, 2026).

Investigating player perception of environment-aware embodied conversational agents enabled by vision language models

Fiorenza, Jacopo;Calandra, Davide;Lamberti, Fabrizio
In corso di stampa

Abstract

Recent advancements in generative AI have enabled the integration of Large Language Models (LLMs) into Virtual Environments (VEs). This integration has been particularly useful to create Embodied Conversational Agents (ECAs) capable of engaging in meaningful interactions. However, such ECAs often lack environmental awareness, which may limit interaction and conversation quality. A potential solution could be to use Vision Language Models (VLMs), possibly empowering ECAs with structured environmental knowledge. However, the use of VLMs for such applications is severely underexplored. This paper explores how VLMs can be employed to integrate environmental awareness in ECAs and investigates their impact on player perception. To this end, an architecture leveraging multi-image inference and scene graph generation was designed. A within-subject user study was then conducted in a virtual reality game, comparing ECAs with and without environment-aware capabilities. The evaluation considered several dimensions, including perceived knowledge, intelligence, factuality, willingness of future interaction and conversational abilities.
In corso di stampa
File in questo prodotto:
File Dimensione Formato  
2026158714.pdf

accesso aperto

Descrizione: Pre-print accepted version
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 1.28 MB
Formato Adobe PDF
1.28 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3009647