|RESPITE:Events : Meeting, Sep 2000:Presentations: Martin Heckman|
We present a method to label an audio-visual database and to setup a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. The multi-stage labeling process is presented on a new audio-visual database recorded at the Institute de la Communication Parlee (ICP). The database was generated via transposition of the audio database NUMBERS95. For the labeling first a large subset of NUMBERS95 is used to achieve a bootstrap training of an ANN, which can then be employed to label the audio part of the audio-visual database. This initial labeling is further improved via readapting the ANN to the new database and reperforming the labeling. From the audio labeling then the video labeling is derived. Tests at different Signal to Noise Ratios (SNR) are performed to demonstrate the efficiency of the labeling process. Furthermore ways to incorporate information from a large audio database into the final audio-visual recognition system were investigated. To develop our system we used the STRUT tool from TCTS lab. This allows to introduce independent weights for the audio and video posteriors.