Nonlinear i-vector transformations for PLDA-based speaker recognition

Cumani, Sandro; Laface, Pietro

doi:10.1109/TASLP.2017.2674966

This paper proposes to estimate parametric nonlinear transformations of i–vectors for speaker recognition systems based on Probabilistic Linear Discriminant Analysis (PLDA) classification. The Gaussian PLDA model assumes that the i-vectors are distributed according to the standard normal distribution. However it has been shown that the i–vectors are better modeled, for example, by Heavy–Tailed distributions, and that significant improvement of the classification performance can be obtained by whitening and length normalizing the i-vectors. In this work we propose to transform the i–vectors so that their distribution becomes more suitable to discriminate speakers using the PLDA model. This is performed by means of a sequence of affine and non–linear transformations whose parameters are obtained by Maximum Likelihood (ML) estimation on the development set. Another contribution of this work is the reduction of the mismatch between the development and evaluation i–vector length distributions by means of a scaling factor tuned for the estimated i–vector distribution, rather than by means of a blind length normalization. Relative improvement between 7% and 14% of the Detection Cost Function was obtained with the proposed technique on the NIST SRE-2010 and SRE-2012 evaluation datasets, using both the traditional GMM/UBM and the hybrid DNN/GMM based systems.

Nonlinear i-vector transformations for PLDA-based speaker recognition / Cumani, S., Laface, P.. - In: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. - ISSN 2329-9290. - STAMPA. - 25:4(2017), pp. 908-919. [10.1109/TASLP.2017.2674966]

Nonlinear i-vector transformations for PLDA-based speaker recognition

CUMANI, SANDRO;LAFACE, Pietro

2017

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2017
			
	Codice DOI
	
				https://dx.doi.org/10.1109/TASLP.2017.2674966
			
	Titolo della Rivista
	
				IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Preprints-07864395.pdf accesso riservato Descrizione: TALP-SAS Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 788.15 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	788.15 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2665443

PORTO @ Archivio Istituzionale della Ricerca

Nonlinear i-vector transformations for PLDA-based speaker recognition

CUMANI, SANDRO;LAFACE, Pietro

2017

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)