|RESPITE:Events : Meeting, Sep 2000:Presentations: Simpson and Meyer|
In the presence of background noise, the initial processing stage of most automatic speech recognisers produces a signal representation that is substantially degraded. One pre-processing strategy that has been shown to be more robust in noisy conditions, compared to conventional approaches, is the use of Amplitude Modulation (AM) maps. We show how this representation can be used to develop an algorithm, based on classic spectral subtraction techniques, which further enhances recognition scores in an additive noise environment. Spectral subtraction algorithms are usually implemented by, first estimating the noise during non-speech periods and then subtracting this from the segments that contain speech. This approach assumes that the noise signal will not vary. By using AM maps, we were able to produce a reliable estimate of the corrupting noise spectrum as well as the observed speech spectrum for each frame of voiced speech. Consequently, the resulting algorithm should be more tolerant to temporal changes in the noise spectra. Three methods were investigated: Generalised Spectral Subtraction (GSS), Modified Spectral Subtraction (MSS) and Non-linear Spectral Subtraction (NSS). To find the best scheme digits were presented in clean, white noise, time-varying wide-band noise and narrow-band time varying noise conditions. Recognition performance was based on an 8 state left-to-right hidden Markov model and each of the three strategies was compared with using no spectral subtraction, linear spectral subtraction (LSS) and RASTA pre-processing. Overall, the optimum algorithm was NSS. By using LSS when the SNR is below 5 dB and NSS when it is between 5 and 15 dB recognition performance could be improved, compared to just using AM maps, by up to a 5 dB increase in SNR.