Robust speech recognition with missing data

Investigator: Ljubomir Josifovski Supervisor: Phil Green

The current state-of-the-art automatic speech recognition (ASR) technology completely fail when confronted with wide variety of distortions occurring in less then completely controlled (close talking - noise cancelling microphone, no noise) environment. The "missing data" approach to robust ASR accepts the fact that some spectro-temporal regions will be dominated by noise, and therefore lost for subsequent processing. The problem of ASR then decomposes into two subproblems:

  • identification of the speech dominated regions (the problem of separation)
  • ASR from this partial description of the speech (the problem of recognition)
Any technique that estimates the "noise" (like various spectral subtraction schemes), plus computational auditory scene analysis (CASA) and blind source separation (BSS) can be used to address the first subproblem.

We are investigating various techniques that address the second subproblem in the context of an standard HMM based ASR system.