The GRID audiovisual sentence corpus |
What is GRID? | Examples | Downloading | Documentation | Credits
GRID is a large multitalker audiovisual sentence corpus to support joint computational-behavioral studies in speech perception. In brief, the corpus consists of high-quality audio and video (facial) recordings of 1000 sentences spoken by each of 34 talkers (18 male, 16 female). Sentences are of the form "put red at G9 now". The corpus, together with transcriptions, is freely available for research use. GRID is described in more detail in this paper.
talker | audio only | video (normal) | video (high) | transcriptions |
---|---|---|---|---|
male | download | download | download | download |
female | download | download | download | download |
Audio, video and other associated information such as word transcriptions are available separately for each talker.
Audio files were scaled on collection to have an absolute maximum amplitude value of 1 and downsampled to 25 kHz. These signals have been endpointed. In addition, the raw original 50 kHz signals are included below.
Video files are provided in two formats: normal quality (360x288; ~1kbit/s) and high quality (720x576; ~6kbit/s). Due to a technical oversight, video for speaker 21 is not available.
This paper describes the motivation for GRID and details of its collection. Behavioural results for subsets of the Grid Corpus are described in Barker and Cooke (2007) and Cooke et al. (2008). In addition, Grid sentences were used in the making of the 1st Speech Separation Challenge.
The following people contributed to the planning, development, collection, annotation and subsequent web-release of GRID: Jon Barker, Martin Cooke, Stuart Cunningham and Xu Shao.
This work was supported a grant from the University of Sheffield Research Fund.
Last update: 18th March 2013 by Jon Barker