There are now a number of corpora (e.g. TIMIT, Resource Management) available for researchers in the field of automatic speech recognition (ASR). However, few of these contain material suitable for training/testing speech recognition systems designed to tackle the wider problems of hearing - processing conversational speech in natural conditions - with the concomitant increase in overlapping speech material, the presence of other acoustic sources, the use of extra-linguistic cues, and so forth.
Our own requirement is for multiple simultaneous source material, which we would use to test our auditory scene analysis algorithms. To address this deficit, we have collected both audio and video data from 5 speakers engaged in co-operative activity in a constrained domain. Additional high-quality binaural and monaural recordings have been made from, respectively, a mannikin and an omnidirectional microphone. Collection took place at ATR labs in Japan during September, 1994. This corpus will be source-tagged and made available to the international speech community.
See the home page of the ShATR multi-simultaneous-speaker corpus
Supported by grants from EPSRC, ATR and The Royal Academy of Engineering.