Speaker verification systems that compute log-likelihood ratios (LLR) between the same and different speaker hypotheses allow for cost-effective decisions that depend only on prior information. Domain mismatch, inaccurate model assumptions or the intrinsic nature of non-probabilistic classifiers often result in mis-calibrated scores, and a re-calibration step is required to map the classifier outputs to well-calibrated LLRs. Standard calibration is based on Logistic Regression, often paired with quality measures to provide trial-dependent calibration transformations. More recently, generative methods have been proposed as an alternative to discriminative approaches, which, however, are not yet able to exploit additional side information. In this work we introduce a novel generative approach based on the analysis of the effects of speaker vector distribution mismatch on the distribution of verification scores for PLDA and PLDA-based classifiers. We show that target and non-target scores can be modeled by Variance-Gamma distributions, whose parameters represent effective between and within-class variability. This allows us to introduce utterance-dependent variability models that can incorporate both explicit quality measures, such as the utterance duration, or implicit measures, such as the norm of a speaker embedding. Experimental results on different test sets with different front-ends and classifiers show that the proposed approach improves both calibration and verification accuracy with respect to state-of-the-art calibration models.
The Distributions of Uncalibrated Speaker Verification Scores: A Generative Model for Domain Mismatch and Trial-Dependent Calibration / Cumani, S; Sarni, S. - In: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. - ISSN 2329-9290. - ELETTRONICO. - 31:(2023), pp. 2204-2219. [10.1109/TASLP.2023.3282096]
The Distributions of Uncalibrated Speaker Verification Scores: A Generative Model for Domain Mismatch and Trial-Dependent Calibration
Cumani, S;Sarni, S
2023
Abstract
Speaker verification systems that compute log-likelihood ratios (LLR) between the same and different speaker hypotheses allow for cost-effective decisions that depend only on prior information. Domain mismatch, inaccurate model assumptions or the intrinsic nature of non-probabilistic classifiers often result in mis-calibrated scores, and a re-calibration step is required to map the classifier outputs to well-calibrated LLRs. Standard calibration is based on Logistic Regression, often paired with quality measures to provide trial-dependent calibration transformations. More recently, generative methods have been proposed as an alternative to discriminative approaches, which, however, are not yet able to exploit additional side information. In this work we introduce a novel generative approach based on the analysis of the effects of speaker vector distribution mismatch on the distribution of verification scores for PLDA and PLDA-based classifiers. We show that target and non-target scores can be modeled by Variance-Gamma distributions, whose parameters represent effective between and within-class variability. This allows us to introduce utterance-dependent variability models that can incorporate both explicit quality measures, such as the utterance duration, or implicit measures, such as the norm of a speaker embedding. Experimental results on different test sets with different front-ends and classifiers show that the proposed approach improves both calibration and verification accuracy with respect to state-of-the-art calibration models.File | Dimensione | Formato | |
---|---|---|---|
Trans_ScoreCovCal_accepted.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
1.03 MB
Formato
Adobe PDF
|
1.03 MB | Adobe PDF | Visualizza/Apri |
The_Distributions_of_Uncalibrated_Speaker_Verification_Scores_A_Generative_Model_for_Domain_Mismatch_and_Trial-Dependent_Calibration.pdf
non disponibili
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
2.29 MB
Formato
Adobe PDF
|
2.29 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2979926