(N.B the PASCAL challenge is now officially closed and the results are available here. Instructions and data are being left on-line so that the benefit of groups wishing to compare their algoriths with those that have been submitted.)

Datasets

You are welcome to download the data, but if you are considering attempting the challenge please mail us so we can monitor interest. If you eventually use the data in any published research please cite,
  • Barker, J., Vincent, E., Ma, N., Christensen, C. and Green, P. The PASCAL CHiME speech separation and recognition challenge. Computer Speech and Language (submitted),

All audio data is in stereo, 16 bit WAV format and is available at sampling rates of either 16 kHz or 48 kHz.


Training set

Reverberant utterances

17,000 reverberated Grid utterances. Note, we do not anticipate that participants will require the training set at the 48 kHz sampling rate, but it is being made available for the sake of completeness.

16 kHz [1.5 GB], 48 kHz (part1 | part2 | part3) [4.5 GB total]

Noise background

6 hours of background noise data.

16 kHz [1.1 GB], 48 kHz (part1 | part2 | part3) [3.3 GB total]

Adaptation data

Training set utterances in noise at -6 to 9 dB

16 kHz [193 MB], 48 kHz [572 MB]

(Note, these adaptation utterances are segmented and unfortunately it is not possible to provide them in continuous audio. If your separation system relies on continuous audio then please use the development test set as adaptation data.).


Development test set

Isolated utterances in noise:

600 utterances at 6 different SNRs:

16 kHz [327 MB], 48 kHz [970 MB]

Utterances without background noise ("clean"):

16 kHz [52 MB], 48 kHz [151 MB]

Background noise without utterances (i.e. utterance backgrounds):

16 kHz [327 MB], 48 kHz [970 MB]

Utterances in continuous audio:

16 kHz (part1 | part2) [2.4 GB total], 48 kHz (part1 | part2 | part3 | part4) [7.0 GB total]

Each background recording session has been split up into 5 minute segments and the files are named as "CR_lounge_$Date_$Time.$SegmentNumber" where $SegmenNumber = "s1200" means a segment start time of 1200 seconds.

Annotation files:

Indicating where in the embedded background recordings the Grid data is mixed in. The tar-file contains all the annotation files for both 16 kHz and 48 kHz as well as all SNR conditions:

The format is:

"Grid_Utt_ID Embedded_Segment_ID Start_Sample Length_Of_Utt"

E.g. "s10_bgakzn CR_lounge_150310_1629.s2100 5160840 62880"


Final test set

Isolated utterances in noise:

600 utterances at 6 different SNRs:

16 kHz [325 MB], 48 kHz [966 MB]

Utterances in continuous audio

16 kHz (part1 | part2) [2.4 GB total], 48 kHz (part1 | part2 | part3 | part4) [7.0 GB total]

Annotation files:

Annotations for both the devel and test set: annotation_files_test_and_devel.tar.gz