Go to Laboratory Home Go to Laboratory Home PageGo to Laboratory PhoneGo to Laboratory Search
Abstract

Although stochastic models of speech signals (e.g. hidden Markov models, trigrams, etc) have lead to impressive improvements in speech recognition accuracy, it has been noted that these models have little relationship to speech production (Lee, 1989), and their recognition performance on some important tasks is far from perfect. However, there have been recent attempts to bridge the gap between speech production and speech recognition using models that are stochastic and yet make more reasonable assumptions about the mechanisms underlying speech production (Bakis, 1991; Deng, 1998; Hogden, 1998; Picone et al., 1999). One of these models, Multiple Observable Maximum Likelihood Continuity Mapping (MO-MALCOM) is described in this paper.

There are theoretical and experimental reasons to believe that MO-MALCOM learns a stochastic mapping between articulator positions and speech acoustics. Furthermore, MO-MALCOM can be combined with standard speech recognition algorithms to form a speech recognition approach based on a production model. Results of experiments related to MO-MALCOM are summarized, and some implications for theories of speech production and speech perception are discussed.

J. Hogden and P. Valdez. Bridging the gap between speech production and speech recognition. Presented at the 5th Seminar on Speech Production: Models and Data , Kloster Seeon, Germany, May 1 - 4, 2000. Los Alamos National Laboratory Technical Report LA-UR-00-1084.   [   Abstract   |   PDF (1.05 MB)   ]