A large vocabulary isolated word recognition system based on the hypothesize-and-test paradigm is described. The system has been, however, devised as a word hypothesizer for a continuous speech understanding system able to answer to queries put to a geographical database. Word preselection is achieved by segmenting and classifying the input signal in terms of broad phonetic classes. Due to low redundancy of this phonetic code for lexical access, to achieve high performance, a lattice of phonetic segments is generated, rather than a single sequence of hypotheses. It can be organized as a graph, and word hypothesization is obtained by matching this graph against the models of all vocabulary words. A word model is itself a phonetic representation made in terms of a graph accounting for deletion, substitution, and insertion errors. A modified Dynamic Programming (DP) matching procedure gives an efficient solution to this graph-to-graph matching problem. Hidden Markov Models (HMM's) of subword units are used as a more detailed knowledge in the verification step. The word candidates generated by the previous step are represented as sequences of diphone-like subword units, and the Viterbi algorithm is used for evaluating their likelihood. To reduce storage and computational costs, lexical knowledge is organized in a tree structure where the initial common subsequences of word descriptions are shared, and a beam-search strategy carries on the most promising paths only. The results show that a complexity reduction of about 73 percent can be achieved by using the two pass approach with respect to the direct approach, while the recognition accuracy remains comparable.

Lexical access to large vocabularies for speech recognition / L., Fissore; Laface, Pietro; G., Micca; R., Pieraccini. - In: IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. - ISSN 0096-3518. - STAMPA. - 37:8(1989), pp. 1197-1213. [10.1109/29.31268]

Lexical access to large vocabularies for speech recognition

LAFACE, Pietro;
1989

Abstract

A large vocabulary isolated word recognition system based on the hypothesize-and-test paradigm is described. The system has been, however, devised as a word hypothesizer for a continuous speech understanding system able to answer to queries put to a geographical database. Word preselection is achieved by segmenting and classifying the input signal in terms of broad phonetic classes. Due to low redundancy of this phonetic code for lexical access, to achieve high performance, a lattice of phonetic segments is generated, rather than a single sequence of hypotheses. It can be organized as a graph, and word hypothesization is obtained by matching this graph against the models of all vocabulary words. A word model is itself a phonetic representation made in terms of a graph accounting for deletion, substitution, and insertion errors. A modified Dynamic Programming (DP) matching procedure gives an efficient solution to this graph-to-graph matching problem. Hidden Markov Models (HMM's) of subword units are used as a more detailed knowledge in the verification step. The word candidates generated by the previous step are represented as sequences of diphone-like subword units, and the Viterbi algorithm is used for evaluating their likelihood. To reduce storage and computational costs, lexical knowledge is organized in a tree structure where the initial common subsequences of word descriptions are shared, and a beam-search strategy carries on the most promising paths only. The results show that a complexity reduction of about 73 percent can be achieved by using the two pass approach with respect to the direct approach, while the recognition accuracy remains comparable.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2584380
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo