Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint computational-behavioral studies in speech perception. The corpus includes 54 talkers, with 100 utterances per talker (50 Lombard and 50 plain utterances). This dataset follows the same sentence format as the audiovisual Grid corpus, and can thus be considered as an extension of that corpus. The sentence sets used in the Lombard Grid corpus are unique, however, and have not been utilized by the Grid corpus.
It offers two synchronised views of the talkers (front and side) to facilitate analysis of speech from different angles. A bespoke head-mounted camera system was used to collect both front and profile views of the talkers (as shown in Figure 1).
Statistics: 54 talkers: 30 female talkers and 24 male talkers; 5,400 (audio, front video and side video) utterances (16,200 files in total): 50% Lombard utterances, 50% plain reference utterances.
License and Citation
License
The corpus is being made freely available for download under a Creative Commons Attribution 4.0 International license.
Citing the corpus
Najwa Alghamdi, Steve Maddock, Ricard Marxer, Jon Barker and Guy J. Brown,
A corpus of audio-visual Lombard speech with frontal and profile views,
The Journal of the Acoustical Society of America 143, EL523 (2018); https://doi.org/10.1121/1.5042758
Download
Some samples of the data
Audio | Front Video | Side Video | ||||
Male | P | L | P | L | P | L |
Female | P | L | P | L | P | L |
* P = plain, L = Lombard
All data (IDs #2-#55, although see notes below)
Audio (651.4 MB) |
Front Video (837.1 MB) |
Side Video (870 MB) |
Alignment (2 MB) |
Metadata (62 kB) |
download | download | download | download | download |
Notes
Each talker in the corpus produced a unique sentence list, except for talkers #6 and #29 and talkers #25 and #26, where, in each of these pairs, the pair read the same sentence list.
By talker
Format
Filename format:
SPKR_COND_UTTERANCE.wav|.mov - e.g., s8_p_sbbi9p.wav
*SPKR = s1 to s55
*COND = l or p, where l=> Lombard, p=> plain (i.e. non-Lombard)
*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'
Metadata format:
*SPKR = s1 to s55
*SESSION = 1 or 2
*INDEX = 1 to 10 for ordering of the recording blocks
*SUBINDEX = 1 to 10 for ordering of utterance in a 10-utterance block.
*COND = l or r, where l=> Lombard, p=> plain (i.e. non-Lombard)
*UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again'
If a sentence is spoken incorrectly then the filename will be
*TRANS = the Grid utterance code for what was actually said.
Credits
The following people contributed to the planning, development, collection, and annotation of the Lombard GRID: Najwa Alghamdi, Steve Maddock, Jon Barker, Ricard Marxer, And Guy Brown.
This research was funded by the UK Engineering and Physical Sciences Research Council (EPSRC project AV-COGHEAR, EP/M026981/1) and by the Saudi Ministry of Education, King Saud University .