Domain and trial-dependent mismatch between training and evaluation data can severely affect the performance of speaker verification systems, and are usually addressed either at embedding level, with methods that try matching the distribution of in-domain and out-of-domain data, or at score level by means of calibration and score normalization approaches. In this work we propose an alternative to score normalization that leverages the adaptive cohort selection of Adaptive S-norm (AS-norm), but performs normalization at embedding rather than at score level. Experimental results on SRE 2016 and SRE 2019 show that the proposed method is able to outperform other approaches in presence of severe mismatch, and achieves similar performance in scenarios where score normalization is less important. Furthermore, in contrast with AS-norm, our approach allows independently normalizing the enrollment and test segments, and has negligible computational cost at scoring time.Index Terms: speaker recognition, score normalization, adaptive score normalization, speaker embeddings

From adaptive score normalization to adaptive data normalization for speaker verification systems / Cumani, Sandro; Sarni, Salvatore. - ELETTRONICO. - (2023), pp. 5296-5300. (Intervento presentato al convegno INTERSPEECH 2023 tenutosi a Dublin (IE) nel 20th - 24th August 2023) [10.21437/Interspeech.2023-266].

From adaptive score normalization to adaptive data normalization for speaker verification systems

Cumani,Sandro;Sarni,Salvatore
2023

Abstract

Domain and trial-dependent mismatch between training and evaluation data can severely affect the performance of speaker verification systems, and are usually addressed either at embedding level, with methods that try matching the distribution of in-domain and out-of-domain data, or at score level by means of calibration and score normalization approaches. In this work we propose an alternative to score normalization that leverages the adaptive cohort selection of Adaptive S-norm (AS-norm), but performs normalization at embedding rather than at score level. Experimental results on SRE 2016 and SRE 2019 show that the proposed method is able to outperform other approaches in presence of severe mismatch, and achieves similar performance in scenarios where score normalization is less important. Furthermore, in contrast with AS-norm, our approach allows independently normalizing the enrollment and test segments, and has negligible computational cost at scoring time.Index Terms: speaker recognition, score normalization, adaptive score normalization, speaker embeddings
File in questo prodotto:
File Dimensione Formato  
cumani23_interspeech.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 305.75 kB
Formato Adobe PDF
305.75 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2979929