This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity detection (VAD) and feature extraction. We show that (a) unsupervised VAD performs as well supervised methods in terms of downstream SID performance, (b) noise-robust feature extraction methods such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id.

Developing A Speaker Identification System For The DARPA RATS Project / Plchot, O.; Matsoukas, S.; Matejka, P.; Dehak, N.; Ma, J.; Cumani, Sandro; Glembek, O.; Hermansky, H.; Mallidi, S.; Mesgarami, N.; Schwartz, R.; Soufifar, M.; Tan, Z.; Thomas, S.; Zhang, B.; Zhou, X.. - (2013), pp. 6768-6772. ((Intervento presentato al convegno ICASSP 2013 tenutosi a Vancouver, Canada nel 26 Maggio - 31 Maggio 2013.

Developing A Speaker Identification System For The DARPA RATS Project

CUMANI, SANDRO;
2013

Abstract

This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity detection (VAD) and feature extraction. We show that (a) unsupervised VAD performs as well supervised methods in terms of downstream SID performance, (b) noise-robust feature extraction methods such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id.
File in questo prodotto:
File Dimensione Formato  
plchot_icassp2013.pdf

non disponibili

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 148.22 kB
Formato Adobe PDF
148.22 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2521502
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo