The CHiME-5 data consists of 20 parties each recorded in a different home.

To refer to these data in a publication, please cite:

The data have been split into training, development test, and evaluation test sets as follows.

Dataset Parties Speakers Hours Utterances
Train 16 32 40:33 79,980
Dev 2 8 4:27 7,440
Eval 2 8 5:12 11,028

The audio data and the transcriptions follow this directory structure:

├── audio
│  ├── dev
│  ├── eval
│  └── train
└── transcriptions
   ├── dev
   ├── eval
   └── train

Each audio/transcription directory has subdirectories for training, development, and evaluation sets. The evaluation data will be released later.


All audio data are distributed as WAV files with a sampling rate of 16 kHz. Each session consists of the recordings made by the binaural microphones worn by each participant (4 participants per session), and by 6 microphone arrays with 4 microphones each. Therefore, the total number of microphones per session is 32 (2 x 4 + 4 x 6). These WAV files are named as follows:


The following tables provide more detailed statistics and notes about each session:

Training sessions

Session IDParticipants (Bold=Male)Duration#UttsNotes
S03P09, P10, P11, P122:11:224,090P11 dropped from min ~15 to ~30
S04P09, P10, P11, P122:29:365,563
S05P13, p14, p15, P162:31:444,939U03 missing (crashed)
S06P13, p14, p15, P162:30:065,097
S07p17, P18, p19, P202:26:533,656
S17p17, P18, p19, P202:32:165,892
S08P21, P22, P23, P242:31:356,175
S16P21, P22, P23, P242:32:195,004
S12P33, P34, P35, p362:29:243,300Last 15 minutes of U05 missing
(Kinect was accidentally turned off)
S13P33, P34, P35, p362:30:114,193
S19p49, P50, P51, p522:32:384,292P52 mic unreliable
S20p49, P50, P51, p522:18:045,365
S18p41, P42, p43, p442:42:234,907
S22p41, P42, p43, p442:35:444,758U03 missing
S23p53, P54, P55, p562:58:437,054Neighbour interrupts
S24p53, P54, P55, p562:37:095,695P54 mic unreliable,
P53 disconnects for bathroom

Development sessions

Session IDParticipants (Bold=Male)Duration#UttsNotes
S02p05, P06, P07, p082:28:243,822
S09p25, p26, p27, p281:59:213,618U05 missing

Evaluation sessions

Session IDParticipants (Bold=Male)Duration#UttsNotes
S01p01, p02, P03, p042:39:045,797No registration tone
S21P45, P46, P47, p482:33:205,231


The transcriptions are provided in JSON format for each session as <session ID>.json. The JSON file includes the following pieces of information for each utterance:

The following is an example annotation of one utterance in a JSON file:

        "end_time": {
            "original": "0:00:43.82",
            "U01": "0:00:43.85",
            "U02": "0:00:43.84",
            "U03": "0:00:43.83",
            "U04": "0:00:43.83",
            "U05": "0:00:43.82",
            "U06": "0:00:43.82",
            "P05": "0:00:43.82",
            "P06": "0:00:43.82",
            "P07": "0:00:43.82",
            "P08": "0:00:43.82"
        "start_time": {
            "original": "0:00:40.60",
            "U01": "0:00:40.63",
            "U02": "0:00:40.62",
            "U03": "0:00:40.61",
            "U04": "0:00:40.61",
            "U05": "0:00:40.60",
            "U06": "0:00:40.60",
            "P05": "0:00:40.60",
            "P06": "0:00:40.60",
            "P07": "0:00:40.60",
            "P08": "0:00:40.60"
        "words": "[laughs] It's the blue, I think. I think.",
        "speaker": "P05",
        "ref": "U02",
        "location": "kitchen",
        "session_id": "S02"


All data is available under licence via the download page.