RESPITE:Events : Meeting, Jan 2001:Presentations: Jon Barker

Improving Soft Decisions in Missing Data ASR:
Using Harmonicity in Conjunction with Local SNR

Jon Barker, Martin Cooke and Phil Green

University of Sheffield, UK

The performance of Missing Data ASR systems is largely dependent on the reliability with which we can estimate the probability that each spectro-temporal `pixel' is uncontaminated by noise. In the past we have based these probability estimates on simple local SNR estimates derived from a stationary noise assumption.

In the current work we show how the `pixel is uncontaminated' probability can be better estimated by introducing harmonicity information derived from an autocorrelogram representation of the speech signal. The basic strategy can be described as follows:

As a refinement, the discrete voiced/unvoiced decision (i) can be replaced with a soft decision, which is then used as a weighting term to interpolate between ii) and iii).

This simple strategy works well under the assumption that little of the speech utterance is dominated by harmonic noise. Refinements would be necessary if this is not the case.

Experiments with the Aurora 2000 database show that introducing harmonicity information in this way leads to consistent reductions in WER compared to a baseline system using local SNR estimates alone:
 
 

Aurora 2000, Test Set A, Average over noise conditions
SNR/ WER
-5
0
5
10
15
20
clean
Local SNR (Discrete)
83.8
56.6
34.0
17.2
8.5
4.1
1.2
Local SNR (Soft)
69.7
41.2
20.1
10.1
5.7
3.4
1.5
As Above + Harmonicity
66.6
36.4
16.9
8.3
4.3
2.5
1.4

We anticipate further improvements may be gained through a more sophisticated treatment of harmonicity e.g. Glotin and Berthommier, ICSLP 2000.


Jon Barker
Last modified: Mon Jan 29 15:59:06 GMT 2001