This tool allows users to display speech signals and their associated spectrograms and transcriptions. Users can create new transcriptions, edit existing ones, and view reference transcriptions. Narrow, medium and broadband spectrographic displays are supported, as is playback of signal segments. A hierarchical transcription browser is incorporated.
Type 'slt' to launch the demo. When the window appears, use the load menu to load a speech signal file. Supported formats currently include .wav, .snd and .au sound files. The signal and its spectrogram will appear. The type and quality of the spectrogram can be altered via the controls (7-9) in the lower right of the tool. Medium, broad and narrowband spectrograms can be produced, and the quality of the display can be varied. Initially, the spectrogram quality is set to rough (the fastest to compute), but is probably best viewed at good quality. The brightnesss/contrast of the spectrogram can be altered using the buttons at (9). Increasing the contrast will also remove the lower amplitude regions from the display, which can be useful when looking for speech-related events.
If any existing transcriptions are available, the menu (4) will appear. It defaults to 'hide transcription', but the user can select any of the available transcriptions for this utterance to be displayed in the waveform window. Note that any transcriptions displayed here cannot be edited, and are for 'reference' or comparison purposes.
To listen to a the signal, click the mouse anywhere in the signal or spectrogram windows. If the mouse click is between the cursors (shown as red vertical lines -- (1) above), only that segment will be heard. If a user transcription (as opposed to a reference transcription) is loaded, clicking between pairs of transcription markers will play out just that segment. This can be useful during labelling.
Us the control pad (6) to modify which segment is displayed. The '-->' and '<--' buttons move the portion of the signal displayed left and right (by 40%).
As you move the cursor over the displays, the time and frequency under the current mouse location is shown at the lower left of the display (5). The frequency varies when the mouse is over the spectrogram (whose upper frequency is set to 6 kHz). This can be useful for extracting measurements (eg of formant frequencies) from the spectrogram.
Transcription is performed in the user transcription window (2). Transcriptions can be loaded via the file menu, or developed from scratch. To insert a new transcription marker, double-click near one of the two cursors. A new boundary marker, and '?' label will appear in the transcription window at the location of the cursor clicked. The new segment can be selected, moved, labelled or deleted:
- A label is selected by clicking near the boundary or label. The boundary will be highlighted.
- Deletion is performed by selecting the object to the deleted, then hitting the delete key.
- Labels are moved by dragging with the mouse.
- To add or modify a label, the label browser (3) is used to select a particular label, whereupon double-clicking in the label browser will modify the selected transcription label. Any label in the browser can be selected, whether it be a fine phoneme label or a broad class or other. This allows various types of spectrogram labelling.
Mystery sounds (those whose name starts with 'myst') disable the play commands.
- If you are new to spectrogram reading, start off by identifying voiced, unvoiced and silent regions of the signal. Use the top-level of the label browser to apply these labels. Compare the cues available in the broad and narrowband spectrogram displays. Choose an example to load in which reference transcriptions at the voiced-unvoiced-silence level are available (the annotation file will probably have the .suv extension), and reveal them when you've had a go yourself.
- How easy is it to spot word boundaries from the signal alone?
- Load an example which has phoneme labels, and compare the spectrographic display of these with those contained in any textbook on acoustic-phonetics (e.g. Ladefoged; Rabiner & Juang).
- Find the vowel portions and measure the frequencies of the first 3 formants. Compare these with tabulated values in the above texts.
- Explore co-articulation.
- Load one of the mystery spectrograms. In the current release, these are digit sequences containing 4 or 5 digits. Have a go at identifying them.
- You will get most out of this tool if you have a suitable acoustics-phonetics text handy. Ladefoged's course in phonetics or any speech technology book (eg Deller, Proakis & Hanson; Rabiner & Juang) will suffice.
- For other visualisations of speech signals, see the edited collection in Cooke, Beet & Crawford (1993). Visual Representations of Speech Signals. Wiley.
Produced by: Martin Cooke, based on a prototype by Stuart Wrigley
Release date: October 5th 1998
Permissions: This demonstration may be used and modified freely by anyone. It may be distributed in unmodified form.