The Gaussian probabilistic linear discriminant anal-ysis (PLDA) model assumes Gaussian distributed priors for the latent variables that represent the speaker and channel factors. Assuming that each training i-vector belongs to a different speaker, as is usually done in i-vector extraction, i-vectors generated by a PLDA model can be considered independent and identically distributed with Gaussian distribution. Thus, we have recently proposed to transform the development i-vectors so that their distribution becomes more Gaussian-like. This is obtained by means of a sequence of affine and nonlinear transformations whose parameters are trained by maximum likelihood (ML) estimation on the development set. The evaluation i-vectors are then subject to the same transformation. Although the i-vector “gaussianization” has shown to be effective, since the i-vectors extracted from segments of the same speaker are not independent, the original assumption is not satisfactory. In this work, we show that the model can be improved by properly exploiting the information about the speaker labels, which was ignored in the previous model. In particular, a more effective PLDA model can be obtained by jointly estimating the PLDA parameters and the parameters of the nonlinear transformation of the i-vectors. In other words, while the goal of the previous approach was to “gaussianize” the training i-vectors distribution, the objective of this work is to embed the estimation of the nonlinear i-vector transformation in the PLDA model estimation. We will thus refer to this model as the nonlinear PLDA model. We show that this new approach provides significant gain with respect to PLDA, and a small, yet consistent, improvement with respect to our former i-vector “gaussianization” approach, without further additional costs.

Joint estimation of PLDA and nonlinear transformations of speaker vectors / Cumani, Sandro; Laface, Pietro. - In: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. - ISSN 2329-9290. - STAMPA. - 25:10(2017), pp. 1890-1900. [10.1109/TASLP.2017.2724198]

Joint estimation of PLDA and nonlinear transformations of speaker vectors

CUMANI, SANDRO;LAFACE, Pietro
2017

Abstract

The Gaussian probabilistic linear discriminant anal-ysis (PLDA) model assumes Gaussian distributed priors for the latent variables that represent the speaker and channel factors. Assuming that each training i-vector belongs to a different speaker, as is usually done in i-vector extraction, i-vectors generated by a PLDA model can be considered independent and identically distributed with Gaussian distribution. Thus, we have recently proposed to transform the development i-vectors so that their distribution becomes more Gaussian-like. This is obtained by means of a sequence of affine and nonlinear transformations whose parameters are trained by maximum likelihood (ML) estimation on the development set. The evaluation i-vectors are then subject to the same transformation. Although the i-vector “gaussianization” has shown to be effective, since the i-vectors extracted from segments of the same speaker are not independent, the original assumption is not satisfactory. In this work, we show that the model can be improved by properly exploiting the information about the speaker labels, which was ignored in the previous model. In particular, a more effective PLDA model can be obtained by jointly estimating the PLDA parameters and the parameters of the nonlinear transformation of the i-vectors. In other words, while the goal of the previous approach was to “gaussianize” the training i-vectors distribution, the objective of this work is to embed the estimation of the nonlinear i-vector transformation in the PLDA model estimation. We will thus refer to this model as the nonlinear PLDA model. We show that this new approach provides significant gain with respect to PLDA, and a small, yet consistent, improvement with respect to our former i-vector “gaussianization” approach, without further additional costs.
File in questo prodotto:
File Dimensione Formato  
FINAL VERSION.PDF

accesso aperto

Descrizione: Articolo principale
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 2.12 MB
Formato Adobe PDF
2.12 MB Adobe PDF Visualizza/Apri
07971950.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 447.38 kB
Formato Adobe PDF
447.38 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2685014
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo