Multimodal interaction represents a promising direction to enhance accessibility and inclusivity in Educational Music Production (EMP) environments. This work reports on the ongoing design, development, and feasibility evaluation of a microcontroller-based embedded system designed to extend standard audio interfaces with multimodal interaction capabilities. The proposed solution is architecturally composed of two main elements: an external stereo Analog-to-Digital Converter (ADC), specifically the PCM1803AEVM, which digitizes analog audio signals originating from standard audio interfaces; and a NUCLEO-H723ZG evaluation board, based on an STM32 microcontroller, which receives the digital audio stream, manages buffering, and enables data transmission over Ethernet. Once connected to a Local Area Network (LAN), the system operates as a self-contained server, hosting an onboard, browser-accessible Web Audio Interface (WAI) which allows the user to perform core production tasks through voice or facial commands, besides the standard manual ones. A functional evaluation with six users was conducted to assess the accuracy of the multimodal commands across diverse vocal and facial profiles. Results indicate promising recognition accuracy, supporting the case for further validation in representative EMP scenarios.

Toward Multimodal Audio Interfaces in Educational Music Production: A Microcontroller-Based Embedded System with Voice and Facial Control / Buccellato, Pietro; Rottondi, Cristina. - ELETTRONICO. - (2025), pp. 1-5. ( 2025 IEEE 6th International Symposium on the Internet of Sounds (IS2) L'Aquila (Ita) 29-31 October 2025) [10.1109/is264627.2025.11284585].

Toward Multimodal Audio Interfaces in Educational Music Production: A Microcontroller-Based Embedded System with Voice and Facial Control

Buccellato, Pietro;Rottondi, Cristina
2025

Abstract

Multimodal interaction represents a promising direction to enhance accessibility and inclusivity in Educational Music Production (EMP) environments. This work reports on the ongoing design, development, and feasibility evaluation of a microcontroller-based embedded system designed to extend standard audio interfaces with multimodal interaction capabilities. The proposed solution is architecturally composed of two main elements: an external stereo Analog-to-Digital Converter (ADC), specifically the PCM1803AEVM, which digitizes analog audio signals originating from standard audio interfaces; and a NUCLEO-H723ZG evaluation board, based on an STM32 microcontroller, which receives the digital audio stream, manages buffering, and enables data transmission over Ethernet. Once connected to a Local Area Network (LAN), the system operates as a self-contained server, hosting an onboard, browser-accessible Web Audio Interface (WAI) which allows the user to perform core production tasks through voice or facial commands, besides the standard manual ones. A functional evaluation with six users was conducted to assess the accuracy of the multimodal commands across diverse vocal and facial profiles. Results indicate promising recognition accuracy, supporting the case for further validation in representative EMP scenarios.
2025
979-8-3315-7294-5
File in questo prodotto:
File Dimensione Formato  
Toward_Multimodal_Audio_Interfaces_in_Educational_Music_Production_A_Microcontroller-Based_Embedded_System_with_Voice_and_Facial_Control.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 5.1 MB
Formato Adobe PDF
5.1 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Soundy_Module___IS2 (2).pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 4.41 MB
Formato Adobe PDF
4.41 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3006480