RSS

Recognition of Distorted Speech

Investigator: Jeremy Goslin Supervisor: Martin Cooke

Recently a number of studies have been made taking advantage the work of Harvey Fletcher and his colleagues during the early nineteen hundreds on the intelligibility of nonsense CVC's filtered through a variety of high and low pass filters. This work has surfaced again only recently due Allen's paper of 1994 [1] drawing attention to this work suggesting that speech perception decisions within narrow frequency sub-bands maybe processed independently of each other. This also seems to tie in with Warren et al's [2] study into spectral redundancy where speech has been filtered through narrow spectral slits down to 1/20 octave bandwidth. This found that the intelligibility of speech through these filters was extremely high, reaching a maximum of nearly 80% with a filter cf of 1500Hz.

Recently, two recognition systems have been designed which take advantage of these new findings, dividing the frequency spectrum into a number of sub-bands to be analysed by separate ANN/HMM recognisers [3,4]. However, one feature that both of these recognisers have in common is their continued use of cross-frequency information in the recognition process. However, if recognition is possible using only very narrow frequency bands (say using a single cochlea filter) perhaps information could be extracted from analyses of the speech in the temporal domain. By submitting these representations to both human and artificial recognition, whilst removing a variety of features from the speech waveform it is hoped that an insight might be gained on the differences in speech recognition at different frequencies.

References

  1. J.B. Allen, 'How do humans process and recognise speech', IEEE Transactions on Speech and Audio Processing, Vol.2, No.4, October 1994.
  2. R.M. Warren, K.R. Riener, J.A. Bashford and B.S. Brubaker, 'Spectral Redundancy: Intelligibility of sentences heard through narrow spectral slits', Perception and Psychophysics 1995, 57(2), 175-182.
  3. H. Bourlard and S. Dupont (1996), 'A new ASR approach based on independent processing and recombination of partial frequency bands', to be published in proc. ICSLP-96.
  4. H. Hermansky, S. Tibrewala and M. Pavel (1996), 'Towards ASR on partially corrupted speech', to be published in proc. ICSLP-96.