Virtual humans are computer-generated characters designed for believable interaction across extended reality applications such as gaming, training, and therapy. Virtual humans’ believability, which contributes to more immersive and engaging virtual experiences, is significantly enhanced by emotions in the form of facial expressions, vocal prosody, and expressive body language. Currently, creating emotional animations with a focus on body language, involves labor-intensive manual processes or costly motion-capture techniques. Text-to-motion synthesis leverages artificial intelligence to generate animations from textual prompts, offering a promising alternative to simplify animation workflows. However, existing models often prioritize physical realism over emotional expressiveness, an aspect that remains underexplored. This study evaluates emotional believability in animations generated by four text-to-motion models (LADiff, MDM, T2MGPT, Muse Animate). A user study, which involved 39 participants, assessed basic emotions portrayed during common actions, revealing that emotions such as anger and sadness were most recognizable via body language alone, while surprise and disgust posed greater challenges. These results align with previous body language research, emphasizing the strengths and limitations of AI generated emotional animations and offering insights to enhance virtual human expressiveness.

From Words to Emotions : Evaluating Text-to-Motion Body Language for Believable Emotions in Virtual Humans / Calzolari, Stefano; Annicchiarico, Ciro; Strada, Francesco; Bottino, Andrea. - STAMPA. - (2025). (Intervento presentato al convegno International Conference on eXtended Reality (XR Salento) 2025 tenutosi a Otranto, Italia).

From Words to Emotions : Evaluating Text-to-Motion Body Language for Believable Emotions in Virtual Humans

Calzolari,Stefano;Annicchiarico,Ciro;Strada,Francesco;Bottino,Andrea
2025

Abstract

Virtual humans are computer-generated characters designed for believable interaction across extended reality applications such as gaming, training, and therapy. Virtual humans’ believability, which contributes to more immersive and engaging virtual experiences, is significantly enhanced by emotions in the form of facial expressions, vocal prosody, and expressive body language. Currently, creating emotional animations with a focus on body language, involves labor-intensive manual processes or costly motion-capture techniques. Text-to-motion synthesis leverages artificial intelligence to generate animations from textual prompts, offering a promising alternative to simplify animation workflows. However, existing models often prioritize physical realism over emotional expressiveness, an aspect that remains underexplored. This study evaluates emotional believability in animations generated by four text-to-motion models (LADiff, MDM, T2MGPT, Muse Animate). A user study, which involved 39 participants, assessed basic emotions portrayed during common actions, revealing that emotions such as anger and sadness were most recognizable via body language alone, while surprise and disgust posed greater challenges. These results align with previous body language research, emphasizing the strengths and limitations of AI generated emotional animations and offering insights to enhance virtual human expressiveness.
File in questo prodotto:
File Dimensione Formato  
paper_final.pdf

accesso riservato

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.63 MB
Formato Adobe PDF
2.63 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3000652