Recent advancements in generative AI have enabled the integration of Large Language Models (LLMs) into Virtual Environments (VEs). This integration has been particularly useful to create Embodied Conversational Agents (ECAs) capable of engaging in meaningful interactions. However, such ECAs often lack environmental awareness, which may limit interaction and conversation quality. A potential solution could be to use Vision Language Models (VLMs), possibly empowering ECAs with structured environmental knowledge. However, the use of VLMs for such applications is severely underexplored. This paper explores how VLMs can be employed to integrate environmental awareness in ECAs and investigates their impact on player perception. To this end, an architecture leveraging multi-image inference and scene graph generation was designed. A within-subject user study was then conducted in a virtual reality game, comparing ECAs with and without environment-aware capabilities. The evaluation considered several dimensions, including perceived knowledge, intelligence, factuality, willingness of future interaction and conversational abilities.
Investigating player perception of environment-aware embodied conversational agents enabled by vision language models / Fiorenza, Jacopo; Thawonmas, Ruck; Calandra, Davide; Lamberti, Fabrizio. - ELETTRONICO. - (In corso di stampa). ( 2026 IEEE 5th International Conference on Intelligent Reality (ICIR 2026) Pisa (IT) June 25 - 26, 2026).
Investigating player perception of environment-aware embodied conversational agents enabled by vision language models
Fiorenza, Jacopo;Calandra, Davide;Lamberti, Fabrizio
In corso di stampa
Abstract
Recent advancements in generative AI have enabled the integration of Large Language Models (LLMs) into Virtual Environments (VEs). This integration has been particularly useful to create Embodied Conversational Agents (ECAs) capable of engaging in meaningful interactions. However, such ECAs often lack environmental awareness, which may limit interaction and conversation quality. A potential solution could be to use Vision Language Models (VLMs), possibly empowering ECAs with structured environmental knowledge. However, the use of VLMs for such applications is severely underexplored. This paper explores how VLMs can be employed to integrate environmental awareness in ECAs and investigates their impact on player perception. To this end, an architecture leveraging multi-image inference and scene graph generation was designed. A within-subject user study was then conducted in a virtual reality game, comparing ECAs with and without environment-aware capabilities. The evaluation considered several dimensions, including perceived knowledge, intelligence, factuality, willingness of future interaction and conversational abilities.| File | Dimensione | Formato | |
|---|---|---|---|
|
2026158714.pdf
accesso aperto
Descrizione: Pre-print accepted version
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
1.28 MB
Formato
Adobe PDF
|
1.28 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3009647
