Audio-visual scenes were collected in a medium-sized reverberant conference hall through in-field 3rd-order ambisonics impulse response recordings and 360-degree stereoscopic videos. The visual scenes included cues of the room and the location of the sound sources, without lip-sync-related cues. Speech intelligibility tests based on seven audio-visual scenes were administered inside an immersive virtual 3D environment reproduced through a spherical 16-speaker array synched with a head-mounted display. Forty normal-hearing subjects were engaged to test the effects on speech intelligibility of a talker in front of the listener and amplified by two lateral symmetrical loudspeakers, in the case of (i) different listener-to-talker distances, (ii) one-talker noise at various azimuth angles around the listener, (iii) high reverberation with –5 dB signal-to-noise ratio, (iv) self-motion, and (v) visual cues. We conducted tests in four configurations, that is, audio-visual and audio-only, both with self-motion and in the static condition. The static audio-only tests scored the highest speech intelligibility, followed by a tie between audio-visual with self-motion and in the static condition. Speech intelligibility decreased as the target-to-listener distance increased in all the noisy scenes. Additionally, speech intelligibility increased when the noise azimuth was at 120° compared to both 180° and 0° , with the talker at approximately 8 m from the listener. The advantage of the spatial separation of the noise signal in reverberation is evident in the case of the audio-visual with self-motion test. This suggests a spatial release from masking in the presence of reverberation, one-talker-interfering noise and within an more ecological scene.

Speech intelligibility in reverberation based on audio-visual scenes recordings reproduced in a 3D virtual environment / Guastamacchia, Angela; Riente, Fabrizio; Shtrepi, Louena; Puglisi, Giuseppina Emma; Pellerey, Franco; Astolfi, Arianna. - In: BUILDING AND ENVIRONMENT. - ISSN 0360-1323. - 258:(2024), pp. 1-13. [10.1016/j.buildenv.2024.111554]

Speech intelligibility in reverberation based on audio-visual scenes recordings reproduced in a 3D virtual environment

Guastamacchia, Angela;Riente, Fabrizio;Shtrepi, Louena;Puglisi, Giuseppina Emma;Pellerey, Franco;Astolfi, Arianna
2024

Abstract

Audio-visual scenes were collected in a medium-sized reverberant conference hall through in-field 3rd-order ambisonics impulse response recordings and 360-degree stereoscopic videos. The visual scenes included cues of the room and the location of the sound sources, without lip-sync-related cues. Speech intelligibility tests based on seven audio-visual scenes were administered inside an immersive virtual 3D environment reproduced through a spherical 16-speaker array synched with a head-mounted display. Forty normal-hearing subjects were engaged to test the effects on speech intelligibility of a talker in front of the listener and amplified by two lateral symmetrical loudspeakers, in the case of (i) different listener-to-talker distances, (ii) one-talker noise at various azimuth angles around the listener, (iii) high reverberation with –5 dB signal-to-noise ratio, (iv) self-motion, and (v) visual cues. We conducted tests in four configurations, that is, audio-visual and audio-only, both with self-motion and in the static condition. The static audio-only tests scored the highest speech intelligibility, followed by a tie between audio-visual with self-motion and in the static condition. Speech intelligibility decreased as the target-to-listener distance increased in all the noisy scenes. Additionally, speech intelligibility increased when the noise azimuth was at 120° compared to both 180° and 0° , with the talker at approximately 8 m from the listener. The advantage of the spatial separation of the noise signal in reverberation is evident in the case of the audio-visual with self-motion test. This suggests a spatial release from masking in the presence of reverberation, one-talker-interfering noise and within an more ecological scene.
File in questo prodotto:
File Dimensione Formato  
SI-B&E.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 3.08 MB
Formato Adobe PDF
3.08 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2988843