This paper presents an experimental implementation of a low-complexity speaker recognition algorithm working in the compressed speech domain. The goal is to perform speaker modeling and identication without decoding the speech bitstream to extract speaker dependent features, thus saving important system resources, for instance, in mobile devices. The compressed bitstream values of the widely used GSM AMR speech coding standard are studied to identify statistics enabling fair recognition after a few seconds of speech. Using euclidean distance measures on elementary statistical values such as coefficient of variation and skewness of nine standard GSM AMR parameters delivers recognition accuracies close to 100% after about 20 seconds of active speech for a database of 14 speakers recorded in a normal room environment.

Low-Complexity Automatic Speaker Recognition in the Compressed GSM-AMR Domain / Petracca, Matteo; Servetti, Antonio; DE MARTIN, JUAN CARLOS. - (2005), pp. 662-665. ((Intervento presentato al convegno IEEE International Conference on Multimedia & Expo (ICME) tenutosi a Amsterdam, The Netherlands nel 6-8 July 2005 [10.1109/ICME.2005.1521510].

Low-Complexity Automatic Speaker Recognition in the Compressed GSM-AMR Domain

PETRACCA, Matteo;SERVETTI, Antonio;DE MARTIN, JUAN CARLOS
2005

Abstract

This paper presents an experimental implementation of a low-complexity speaker recognition algorithm working in the compressed speech domain. The goal is to perform speaker modeling and identication without decoding the speech bitstream to extract speaker dependent features, thus saving important system resources, for instance, in mobile devices. The compressed bitstream values of the widely used GSM AMR speech coding standard are studied to identify statistics enabling fair recognition after a few seconds of speech. Using euclidean distance measures on elementary statistical values such as coefficient of variation and skewness of nine standard GSM AMR parameters delivers recognition accuracies close to 100% after about 20 seconds of active speech for a database of 14 speakers recorded in a normal room environment.
File in questo prodotto:
File Dimensione Formato  
20.Servetti_best-paper-award.pdf

non disponibili

Tipologia: Altro materiale allegato
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 11.72 kB
Formato Adobe PDF
11.72 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
01521510.pdf

non disponibili

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 396.73 kB
Formato Adobe PDF
396.73 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11583/1410962
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo