In this work we give an overview of different state–of–the–art speaker and language recognition systems. We analyze some techniques to extract and model features from the acoustic signal and to model the speech content by means of phonetic decoding. We then present state–of–the–art generative systems based on latent variable models and discriminative techniques based on Support Vector Machines. We also present the author’s contributions to the field. These contributions cover the different topics presented in this work. First we propose an improvement to Neural Network training for speech decoding which is based on the use of General Purpose Graphic Processing Units computational framework. We also propose adaptations of latent variable models developed for speaker recognition to the field of language identification. A novel technique which enhances the generation of low–dimensional utterance representations for speaker verification is also presented. Finally, we give a detailed analysis of different training algorithms for SVM–based speaker verification and we propose a novel discriminative framework for speaker verification, the Pairwise SVM approach, which allows for fast utterance testing and allows to achieve very good recognition performance.
Speaker and Language Recognition Techniques / Cumani, Sandro. - (2012). [10.6092/polito/porto/2496928]
Speaker and Language Recognition Techniques
CUMANI, SANDRO
2012
Abstract
In this work we give an overview of different state–of–the–art speaker and language recognition systems. We analyze some techniques to extract and model features from the acoustic signal and to model the speech content by means of phonetic decoding. We then present state–of–the–art generative systems based on latent variable models and discriminative techniques based on Support Vector Machines. We also present the author’s contributions to the field. These contributions cover the different topics presented in this work. First we propose an improvement to Neural Network training for speech decoding which is based on the use of General Purpose Graphic Processing Units computational framework. We also propose adaptations of latent variable models developed for speaker recognition to the field of language identification. A novel technique which enhances the generation of low–dimensional utterance representations for speaker verification is also presented. Finally, we give a detailed analysis of different training algorithms for SVM–based speaker verification and we propose a novel discriminative framework for speaker verification, the Pairwise SVM approach, which allows for fast utterance testing and allows to achieve very good recognition performance.File | Dimensione | Formato | |
---|---|---|---|
phd_thesis.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Licenza:
Creative commons
Dimensione
1.43 MB
Formato
Adobe PDF
|
1.43 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2496928
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo