Most current state-of-the-art text-independent speaker recognition systems are based on i-vectors, and on probabilistic linear discriminant analysis (PLDA). PLDA assumes that the i-vectors of a trial are homogeneous, i.e., that they have been extracted by the same system. In other words, the enrollment and test i-vectors belong to the same class. However, it is sometimes important to score trials including “heterogeneous” i-vectors, for instance, enrollment i-vectors extracted by an old system, and test i-vectors extracted by a newer, more accurate, system. In this paper, we introduce a PLDA model that is able to score heterogeneous i-vectors independent of their extraction approach, dimensions, and any other characteristics that make a set of i-vectors of the same speaker belong to different classes. The new model, which will be referred to as nonlinear tied-PLDA (NL-Tied-PLDA), is obtained by a generalization of our recently proposed nonlinear PLDA approach, which jointly estimates the PLDA parameters and the parameters of a nonlinear transformation of the i-vectors. The generalization consists of estimating a class-dependent nonlinear transformation of the i-vectors, with the constraint that the transformed i-vectors of the same speaker share the same speaker factor. The resulting model is flexible and accurate, as assessed by the results of a set of experiments performed on the extended core NIST SRE 2012 evaluation. In particular, NL-Tied-PLDA provides better results on heterogeneous trials with respect to the corresponding homogeneous trials scored by the old system, and, in some configurations, it also reaches the accuracy of the new system. Similar results were obtained on the female-extended core NIST SRE 2010 telephone condition.
|Titolo:||Scoring heterogeneous speaker vectors using nonlinear transformations and tied PLDa models|
|Data di pubblicazione:||2018|
|Digital Object Identifier (DOI):||10.1109/TASLP.2018.2806305|
|Appare nelle tipologie:||1.1 Articolo in rivista|