Strands (Cooke, 1991) are time-frequency descriptions which attempt to capture the important features of speech signals in a form amenable to the application of auditory grouping principles (Darwin & Carlyon, 1995; Bregman, 1990). Individual strands attempt to track individual spectral dominances through time. Such dominances can correspond to resolved harmonics, formants, or other relatively-narrowband components of signals. Strands can be converted to sound ('resynthesised'), individually or collectively.
Type 'strands' to launch the demo. When the window appears, use the load menu (1) to load a strands file. If the associated sound file exists, the 'play original' button (4) will be enabled. Once the strands are loaded, they can be switched on or off by clicking. Clicking elsewhere in the display panel (2) plays out the sound resulting from adding the selected strands resyntheses together. Button (3) makes selection or unselection of all displayed strands easier.
Shift-clicking (on the Mac, at least - try right buttons on other platforms?) on individual strands plays them out.
- The starting point for strand production is a model of the auditory periphery based on gammatone filters (Patterson & Holdsworth, 1990). A consequence of the frequency-dependent bandwidth of the modelled auditory filterbank is the resolution of harmonic components at low-mid frequencies, and the lack of resolution (leading to the appearance of formants) at mid-high frequencies. Load any strands file to observe this distinction.
- One set of strand examples provided with the distribution come from the "ru"-"li" stimuli generated by Chris Darwin (Darwin, 1981; Gardner et al, 1989) and examined computationally in Cooke (1991). These are synthetic syllables with 4 formants. In one condition, the formants are synthesised on the same F0 (110 Hz). In the other conditions, the second formant has a different fundamental from the other 3 (which remain on 110 Hz). The effect of this F0 difference can be seen by loading the associated strand files, which have names like ru112 (signifying an F2 on 112 Hz). By manually removing the F2 strand(s), you can hear "li" instead of "ru".
- A subset of the corpus used to evaluate speech separation in Cooke (1991) is available in the standard distribution. Relevant files have names starting with v0. File v0 itself is a single-source utterance, while v0n* denote v0 with added 'noise'. By manually attempting to group speech strands, you can get an idea of what performance might be possible by a system for grouping based on strands. [It is hoped that something similar to Cooke's original system will be implemented as a demo in the next release].
- Bregman (1990). Auditory Scene Analysis. MIT Press.
- Cooke (1991). Modelling auditory processing and organisation. PhD Thesis. Published by Cambridge University Press, 1993.
- Darwin (1981). Q. Jnl. Exp. Psych, 33(1), 185-207.
- Darwin & Carlyon (1995). Auditory Grouping. In: Hearing, Academic Press, 387-424.
- Gardner et al (1989). JASA, 85(3), 1329-1337.
- Patterson & Holdsworth (1990). In: Adv. in Speech, Hearing & Language Proc., Vol. 3 (ed: Ainsworth), JAI Press.
- For other visualisations of speech signals, see the edited collection in Cooke, Beet & Crawford (1993). Visual Representations of Speech Signals. Wiley.
Produced by: Martin Cooke
Release date: June 22 1998
Permissions: This demonstration may be used and modified freely by anyone. It may be distributed in unmodified form.