Recognition of Distorted Speech

Investigator: Jeremy Goslin Supervisor: Martin Cooke

Recently a number of studies have been made taking advantage the work of Harvey Fletcher and his colleagues during the early nineteen hundreds on the intelligibility of nonsense CVC's filtered through a variety of high and low pass filters. This work has surfaced again only recently due Allen's paper of 1994 [1] drawing attention to this work suggesting that speech perception decisions within narrow frequency sub-bands maybe processed independently of each other. This also seems to tie in with Warren et al's [2] study into spectral redundancy where speech has been filtered through narrow spectral slits down to 1/20 octave bandwidth. This found that the intelligibility of speech through these filters was extremely high, reaching a maximum of nearly 80% with a filter cf of 1500Hz.

Recently, two recognition systems have been designed which take advantage of these new findings, dividing the frequency spectrum into a number of sub-bands to be analysed by separate ANN/HMM recognisers [3,4]. However, one feature that both of these recognisers have in common is their continued use of cross-frequency information in the recognition process. However, if recognition is possible using only very narrow frequency bands (say using a single cochlea filter) perhaps information could be extracted from analyses of the speech in the temporal domain. By submitting these representations to both human and artificial recognition, whilst removing a variety of features from the speech waveform it is hoped that an insight might be gained on the differences in speech recognition at different frequencies.


