The evaluation of spoken language understanding (SLU) systems is often restricted to assessing their global performance or examining predefined subgroups of interest. However, a more detailed analysis at the subgroup level has the potential to uncover valuable insights into how speech system performance differs across various subgroups. In this work, we identify biased data subgroups and describe them at the level of user demographics, recording conditions, and speech targets. We propose a new task-, model- and dataset-agnostic approach to detect significant intra- and cross-model performance gaps. We detect problematic data subgroups in SLU models by leveraging the notion of subgroup divergence. We also compare the outcome of different SLU models on the same dataset and task at the subgroup level. We identify significant gaps in subgroup performance between models different in size, architecture, or pre-training objectives, including multi-lingual and mono-lingual models, yet comparable to each other in overall performance. The results, obtained on two SLU models, four datasets, and three different tasks–intent classification, automatic speech recognition, and emotion recognition–confirm the effectiveness of the proposed approach in providing a nuanced SLU model assessment.

Towards Comprehensive Subgroup Performance Analysis in Speech Models / Koudounas, Alkis; Pastor, Eliana; Attanasio, Giuseppe; Mazzia, Vittorio; Giollo, Manuel; Gueudre, Thomas; Reale, Elisa; Cagliero, Luca; Cumani, Sandro; de Alfaro, Luca; Baralis, Elena; Amberti, Daniele. - In: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. - ISSN 2329-9290. - 32:(2024), pp. 1468-1480. [10.1109/taslp.2024.3363447]

Towards Comprehensive Subgroup Performance Analysis in Speech Models

Koudounas, Alkis;Pastor, Eliana;Cagliero, Luca;Cumani, Sandro;Baralis, Elena;
2024

Abstract

The evaluation of spoken language understanding (SLU) systems is often restricted to assessing their global performance or examining predefined subgroups of interest. However, a more detailed analysis at the subgroup level has the potential to uncover valuable insights into how speech system performance differs across various subgroups. In this work, we identify biased data subgroups and describe them at the level of user demographics, recording conditions, and speech targets. We propose a new task-, model- and dataset-agnostic approach to detect significant intra- and cross-model performance gaps. We detect problematic data subgroups in SLU models by leveraging the notion of subgroup divergence. We also compare the outcome of different SLU models on the same dataset and task at the subgroup level. We identify significant gaps in subgroup performance between models different in size, architecture, or pre-training objectives, including multi-lingual and mono-lingual models, yet comparable to each other in overall performance. The results, obtained on two SLU models, four datasets, and three different tasks–intent classification, automatic speech recognition, and emotion recognition–confirm the effectiveness of the proposed approach in providing a nuanced SLU model assessment.
File in questo prodotto:
File Dimensione Formato  
Towards_Comprehensive_Subgroup_Performance_Analysis_in_Speech_Models.pdf

non disponibili

Descrizione: Towards Comprehensive Subgroup Performance Analysis in Speech Models
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.43 MB
Formato Adobe PDF
2.43 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2986497