Disclosed herein is a method for compensating intersession variability for automatic extraction of information from an input voice signal representing an utterance of a speaker, comprising: processing the input voice signal to provide feature vectors each formed by acoustic features extracted from the input voice signal at a time frame; computing an intersession variability compensation feature vector; and computing compensated feature vectors based on the extracted feature vectors and the intersession variability compensation feature vector; wherein computing an intersession variability compensation feature vector includes: creating a Universal Background Model (UBM) based on a training voice database, the Universal Background Model (UBM) including a number of Gaussians and probabilistically modeling an acoustic model space, creating a voice recording database related to different speakers and containing, for each speaker, a number of voice recordings acquired under different conditions; computing an intersession variability subspace matrix (U) based on the voice recording database, the intersession variability subspace matrix (U) defining a transformation from an acoustic model space to an intersession variability subspace representing intersession variability for all the speakers; computing an intersession factor vector (xi) based on the intersession variability subspace matrix (U), the intersession factor vector representing the intersession variability of the input speech signal in the intersession variability subspace; and computing the intersession variability compensation feature vector based on the intersession variability subspace matrix (U), the intersession factor vector (xi) and the Universal Background Model (UBM).

INTERSESSION VARIABILITY COMPENSATION FOR AUTOMATIC EXTRACTION OF INFORMATION FROM VOICE / C., Vair; D., Colibro; Laface, Pietro. - (2007).

INTERSESSION VARIABILITY COMPENSATION FOR AUTOMATIC EXTRACTION OF INFORMATION FROM VOICE

LAFACE, Pietro
2007

Abstract

Disclosed herein is a method for compensating intersession variability for automatic extraction of information from an input voice signal representing an utterance of a speaker, comprising: processing the input voice signal to provide feature vectors each formed by acoustic features extracted from the input voice signal at a time frame; computing an intersession variability compensation feature vector; and computing compensated feature vectors based on the extracted feature vectors and the intersession variability compensation feature vector; wherein computing an intersession variability compensation feature vector includes: creating a Universal Background Model (UBM) based on a training voice database, the Universal Background Model (UBM) including a number of Gaussians and probabilistically modeling an acoustic model space, creating a voice recording database related to different speakers and containing, for each speaker, a number of voice recordings acquired under different conditions; computing an intersession variability subspace matrix (U) based on the voice recording database, the intersession variability subspace matrix (U) defining a transformation from an acoustic model space to an intersession variability subspace representing intersession variability for all the speakers; computing an intersession factor vector (xi) based on the intersession variability subspace matrix (U), the intersession factor vector representing the intersession variability of the input speech signal in the intersession variability subspace; and computing the intersession variability compensation feature vector based on the intersession variability subspace matrix (U), the intersession factor vector (xi) and the Universal Background Model (UBM).
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11583/2584424
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo