Participation in orchestral contexts presents challenges for blind and visually impaired (BVI) musicians due to the inherently visual nature of conducting gestures. This work presents a browser-based, markerless conducting gesture recognition architecture integrated with an embedded Bluetooth Low Energy (BLE) device, aimed at enabling accessible gesture translation in educational orchestral settings. MediaPipe Hands is employed for real-time landmark extraction within a standard web application, while gesture classification is performed through a deterministic geometric and kinematic algorithmic approach applied directly to landmark trajectories. Recognized gestures are encoded and transmitted to a resource-constrained embedded node, forming a complete browser-to-embedded interaction pipeline. Recognition accuracy was evaluated using gestures performed by a professional orchestra conductor under realistic execution conditions. Results demonstrate very high accuracy for isolated metric and dynamic gestures and robust performance under bimanual execution. In addition, a hardware-timed methodology was introduced to measure physical end-to-end latency from gesture occurrence to BLE command reception at the embedded device. The measured mean latency of 125.61 ms is compatible with human sensorimotor response times and established conducting practice.

Browser-Based Conducting Gesture Recognition for Educational Orchestras: System-Level Accuracy and Latency Evaluation / Buccellato, P., Rottondi, C.. - ELETTRONICO. - (2026). (The 12th International Conference on Frontiers of Educational Technologies (ICFET 2026) Tokyo (Jpn) 12-14 June 2026).

Browser-Based Conducting Gesture Recognition for Educational Orchestras: System-Level Accuracy and Latency Evaluation

buccellato, pietro;rottondi, cristina
2026

Abstract

Participation in orchestral contexts presents challenges for blind and visually impaired (BVI) musicians due to the inherently visual nature of conducting gestures. This work presents a browser-based, markerless conducting gesture recognition architecture integrated with an embedded Bluetooth Low Energy (BLE) device, aimed at enabling accessible gesture translation in educational orchestral settings. MediaPipe Hands is employed for real-time landmark extraction within a standard web application, while gesture classification is performed through a deterministic geometric and kinematic algorithmic approach applied directly to landmark trajectories. Recognized gestures are encoded and transmitted to a resource-constrained embedded node, forming a complete browser-to-embedded interaction pipeline. Recognition accuracy was evaluated using gestures performed by a professional orchestra conductor under realistic execution conditions. Results demonstrate very high accuracy for isolated metric and dynamic gestures and robust performance under bimanual execution. In addition, a hardware-timed methodology was introduced to measure physical end-to-end latency from gesture occurrence to BLE command reception at the embedded device. The measured mean latency of 125.61 ms is compatible with human sensorimotor response times and established conducting practice.
File in questo prodotto:
File Dimensione Formato  
ICFET2026_CT-1093.pdf

accesso riservato

Descrizione: Accepted manuscript / camera-ready version submitted for the ICFET 2026 proceedings
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 432.73 kB
Formato Adobe PDF
432.73 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3012431