This paper presents an experimental implementation of a low-complexity speaker recognition algorithm working in the compressed speech domain. The goal is to perform speaker modeling and identication without decoding the speech bitstream to extract speaker dependent features, thus saving important system resources, for instance, in mobile devices. The compressed bitstream values of the widely used GSM AMR speech coding standard are studied to identify statistics enabling fair recognition after a few seconds of speech. Using euclidean distance measures on elementary statistical values such as coefficient of variation and skewness of nine standard GSM AMR parameters delivers recognition accuracies close to 100% after about 20 seconds of active speech for a database of 14 speakers recorded in a normal room environment.

Low-Complexity Automatic Speaker Recognition in the Compressed GSM-AMR Domain / Petracca, Matteo; Servetti, Antonio; DE MARTIN, JUAN CARLOS. - (2005), pp. 662-665. (Intervento presentato al convegno IEEE International Conference on Multimedia & Expo (ICME) tenutosi a Amsterdam, The Netherlands nel 6-8 July 2005) [10.1109/ICME.2005.1521510].

Low-Complexity Automatic Speaker Recognition in the Compressed GSM-AMR Domain

PETRACCA, Matteo;SERVETTI, Antonio;DE MARTIN, JUAN CARLOS
2005

Abstract

This paper presents an experimental implementation of a low-complexity speaker recognition algorithm working in the compressed speech domain. The goal is to perform speaker modeling and identication without decoding the speech bitstream to extract speaker dependent features, thus saving important system resources, for instance, in mobile devices. The compressed bitstream values of the widely used GSM AMR speech coding standard are studied to identify statistics enabling fair recognition after a few seconds of speech. Using euclidean distance measures on elementary statistical values such as coefficient of variation and skewness of nine standard GSM AMR parameters delivers recognition accuracies close to 100% after about 20 seconds of active speech for a database of 14 speakers recorded in a normal room environment.
2005
0780393317
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/1410962
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo