Speaker verification systems that compute log-likelihood ratios (LLR) between the same and different speaker hypotheses allow for cost-effective decisions that depend only on prior information. Domain mismatch, inaccurate model assumptions or the intrinsic nature of non-probabilistic classifiers often result in mis-calibrated scores, and a re-calibration step is required to map the classifier outputs to well-calibrated LLRs. Standard calibration is based on Logistic Regression, often paired with quality measures to provide trial-dependent calibration transformations. More recently, generative methods have been proposed as an alternative to discriminative approaches, which, however, are not yet able to exploit additional side information. In this work we introduce a novel generative approach based on the analysis of the effects of speaker vector distribution mismatch on the distribution of verification scores for PLDA and PLDA-based classifiers. We show that target and non-target scores can be modeled by Variance-Gamma distributions, whose parameters represent effective between and within-class variability. This allows us to introduce utterance-dependent variability models that can incorporate both explicit quality measures, such as the utterance duration, or implicit measures, such as the norm of a speaker embedding. Experimental results on different test sets with different front-ends and classifiers show that the proposed approach improves both calibration and verification accuracy with respect to state-of-the-art calibration models.

The Distributions of Uncalibrated Speaker Verification Scores: A Generative Model for Domain Mismatch and Trial-Dependent Calibration / Cumani, S; Sarni, S. - In: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. - ISSN 2329-9290. - ELETTRONICO. - 31:(2023), pp. 2204-2219. [10.1109/TASLP.2023.3282096]

The Distributions of Uncalibrated Speaker Verification Scores: A Generative Model for Domain Mismatch and Trial-Dependent Calibration

Cumani, S;Sarni, S
2023

Abstract

Speaker verification systems that compute log-likelihood ratios (LLR) between the same and different speaker hypotheses allow for cost-effective decisions that depend only on prior information. Domain mismatch, inaccurate model assumptions or the intrinsic nature of non-probabilistic classifiers often result in mis-calibrated scores, and a re-calibration step is required to map the classifier outputs to well-calibrated LLRs. Standard calibration is based on Logistic Regression, often paired with quality measures to provide trial-dependent calibration transformations. More recently, generative methods have been proposed as an alternative to discriminative approaches, which, however, are not yet able to exploit additional side information. In this work we introduce a novel generative approach based on the analysis of the effects of speaker vector distribution mismatch on the distribution of verification scores for PLDA and PLDA-based classifiers. We show that target and non-target scores can be modeled by Variance-Gamma distributions, whose parameters represent effective between and within-class variability. This allows us to introduce utterance-dependent variability models that can incorporate both explicit quality measures, such as the utterance duration, or implicit measures, such as the norm of a speaker embedding. Experimental results on different test sets with different front-ends and classifiers show that the proposed approach improves both calibration and verification accuracy with respect to state-of-the-art calibration models.
File in questo prodotto:
File Dimensione Formato  
Trans_ScoreCovCal_accepted.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.03 MB
Formato Adobe PDF
1.03 MB Adobe PDF Visualizza/Apri
The_Distributions_of_Uncalibrated_Speaker_Verification_Scores_A_Generative_Model_for_Domain_Mismatch_and_Trial-Dependent_Calibration.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.29 MB
Formato Adobe PDF
2.29 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2979926