|RESPITE: Annual Report 2000: Scientific Highlights: Identifying Reliable Information|
In the first experiment the fullband data likelihood (maximised) for each phoneme was expressed as a linear combination of the 4 likelihoods from 1-subband likelihoods and the fullband data likelihood. Separate combination weights are estimated for each of 27 phonemes.
In recognition these weights are estimated dynamically, but in the figure below they are estimated over the full Numbers95 connected-digits training data set.
The figure shows the estimated weights for each phoneme in clean speech. By clicking on the buttons above you can see the effect on the weights when they are reestimated with noise added to each frequency band in turn. The weights for the noisy band are reduced.
The same approach will be extended in future to use more accurate subband combination likelihood combination rules.
The 2-dimensional histogram below shows the distribution obtained when the observable is the harmonicity index (see Berthommier and Glotin, 99).
The technique is quite general, and can, for example, also be successfully applied using a localisation cue (see Glotin et al., 1999).
The goal of this experiment was to extend Shannon's experiment (R.V. Shannon and al. Speech Recognition with Primarily Temporal Cues, Science, 1995). In their study, they showed that, by varying spectral and temporal resolution : two consonant features -voicing and manner- were preserved at very low spectral resolution, information transmission of consonantal place of articulation was increased with spectral resolution. By adding a temporal masker on Shannon's residual signal, we aimed to understand the transmission of residual temporal and spectral information.
We confirm the main findings of Shannon et al.:
Our experiment further suggests that:
This experiment supports the hypothesis that consonant identification is a complex process which can compensate for the reduction of temporal or spectral information by the use residual information: consonant perception is a robust process which can make use both spectral and temporal cues.
For further details of this work see Grosgeorges et al., 2000.