Speech Recognition from Occluded Data

Principal Investigator: Martin Cooke, Co-investigator: Phil Green

The speech signal contains a great deal of redundancy (e.g. neighbouring time frames are highly correlated). This is fortunate, since listeners often have to cope with speech uttered against a background of other acoustic sources. Amongst other cues, listeners appear to use perceptual grouping to determine which parts of the signal originate from the same source. However, because of the redundancy of speech, it is not necessary to recover 100% of the speech signal from the mixture. Indeed, it is probably not practical to do so computationally given current algorithms. Hence, there is a need to develop recognition strategies which can handle partial input data. Partial here can mean that information is available only at certain known time/frequency points. Ongoing work is developing modified algorithms for speech recognition to handle partial data. In addition, the problem of learning in the presence of partial data is currently being addressed.

Supported by a research grant from EPSRC.