RSS

Speech Sound Representation in Automatic Speech Recognition

Investigator: Esmeralda Uraga-Serratos Supervisors: Thomas Hain

Speech recognition is concerned with obtaining the word sequence of an utterance from the corresponding acoustic speech signal. Currently, the most successful approaches for acoustic modelling in automatic speech recognition are based on hidden Markov models (HMM). However, the modelling of speech signals using the standard HMM approach has many limitations from the perspective of human speech modelling. One of these limitations is that HMMs are used to directly relate acoustic features to discrete symbols, such as phonemes or other subword units.

The Motor Theory of Speech Perception proposed originally by [Lieberman, 1957] and extended by [Lieberman and Mattingley, 1985] states that speech perception involves the motor system in a process of auditory-to-articulatory mapping to access a phonetic code with motor properties. Some ideas of this theory are being considered in the representation scheme that is being developed in this research work.

In order to improve the current performance of speech recognition systems and with the aim of modelling some aspects of human speech production and perception in automatic speech recognition, a new framework for speech sound representation is being developed. In this work, the benefits of including articulatory information and acoustic feedback mechanisms into speech recognition will be investigated.

References
Lieberman, A.M. (1957). Some results of research on speech perception. Journal of the Acoustical Society of America, 29:117-123.
Lieberman, A.M. and Mattingley, I.G. (1985). The motor theory of speech perception revised. Cognition, 21:1-36.