This paper presents an experimental implementation of a low-complexity speaker recognition algorithm working in the compressed speech domain. The goal is to perform speaker modeling and identication without decoding the speech bitstream to extract speaker dependent features, thus saving important system resources, for instance, in mobile devices. The compressed bitstream values of the widely used GSM AMR speech coding standard are studied to identify statistics enabling fair recognition after a few seconds of speech. Using euclidean distance measures on elementary statistical values such as coefficient of variation and skewness of nine standard GSM AMR parameters delivers recognition accuracies close to 100% after about 20 seconds of active speech for a database of 14 speakers recorded in a normal room environment.
Low-Complexity Automatic Speaker Recognition in the Compressed GSM-AMR Domain / Petracca, Matteo; Servetti, Antonio; DE MARTIN, JUAN CARLOS. - (2005), pp. 662-665. (Intervento presentato al convegno IEEE International Conference on Multimedia & Expo (ICME) tenutosi a Amsterdam, The Netherlands nel 6-8 July 2005) [10.1109/ICME.2005.1521510].
Low-Complexity Automatic Speaker Recognition in the Compressed GSM-AMR Domain
PETRACCA, Matteo;SERVETTI, Antonio;DE MARTIN, JUAN CARLOS
2005
Abstract
This paper presents an experimental implementation of a low-complexity speaker recognition algorithm working in the compressed speech domain. The goal is to perform speaker modeling and identication without decoding the speech bitstream to extract speaker dependent features, thus saving important system resources, for instance, in mobile devices. The compressed bitstream values of the widely used GSM AMR speech coding standard are studied to identify statistics enabling fair recognition after a few seconds of speech. Using euclidean distance measures on elementary statistical values such as coefficient of variation and skewness of nine standard GSM AMR parameters delivers recognition accuracies close to 100% after about 20 seconds of active speech for a database of 14 speakers recorded in a normal room environment.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/1410962
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo