Any periodic waveform can be constructed by adding together sinusoids at integer multiples of the fundamental frequency corresponding to the period, known as harmonics. Frequency selectivity in the auditory system separates out these Fourier components at least for the lower portion of the spectrum, yet we perceive a single complex tone whose pitch corresponds to the fundamental frequency and whose timbre depends principally on the relative strength of the harmonics. The theory of Auditory Scene Analysis (Bregman 1990) interprets this integrated percept as the fusion of simultaneous tones on the basis of their common fundamental frequency (i.e. their harmonicity) as well as the common onset (and offset) times introduced when the sinusoids are turned on simultaneously.
This naturally leads to the question of what happens if these cues of harmonicity and common onset are degraded or absent. In broad terms, shifting the frequency of one harmonic -- i.e. detuning it -- will cause it to be segregated from the rest of the complex, and heard as a separate tone; shifting it in time so that it starts earlier or later than the remaining tones will similarly inhibit fusion. These manipulations are illustrated in the figure below, where the red harmonic can be made to stand out from the remaining blue harmonics by moving it in either or both of the dimensions indicated by the small arrows.
Rather than simply asking subjects if they hear one sound or two, a more sensitive measure of integration can be obtained by asking subjects to match the pitch of the residual complex to a strictly-harmonic tone. Very small mistunings of one harmonic leave the complex whole but produce a measurable shift in the overall pitch; we might also expect that once the mistuned harmonic is perceived as a separate element, it will make no contribution to the pitch of the residual complex.
This situation was investigated by Moore et al. (1985). They found that although subjects were aware of a detuned 4th harmonic at mistunings of 1% or less, it continued to have a measurable effect on the pitch of the residual complex out to about 8% mistuning, with a maximum effect around 3%. Many other experiments have been performed with these or similar studies. In particular, Ciocca & Darwin (1993) found that starting the mistuned harmonic significantly earlier than the remaining harmonics could remove it completely from the pitch effect, but only if the harmonic during the complex was 'organized' as a continuation of the earlier-starting tone (which they were able to defeat with alternative organizations).
Darwin & Carlyon (1995) interpret this gradual removal of the mistuned harmonic from the grouped complex, as measured from the overall pitch, as evidence that grouping is not an 'all-or-nothing' effect, and that different aspects of auditory perception, such as the number of objects and the pitch of each one, may use separate versions of the organization of energy into sources.
Launch the demonstration with the command 'detuning'. Clicking anywhere in the graph (1) results in the delivery of a stimulus consisting of repetitions of a detuned-harmonic complex followed by a purely-harmonic complex; the horizontal co-ordinate at the clicked point determines the degree of mistuning of the single harmonic in the first tone, and the vertical co-ordinate controls the shift applied to the entire matching tone. The task is to match the pitch of the residual harmonic complex (the lower pitch when two are heard) to the second tone by moving up and down at a particular degree of mistuning.
Other stimulus parameters are controlled by the popup menus and sliders on the right-hand side of the window. The number of repetitions is governed by the top popup menu (2). The second popup (3) specifies which harmonic in the complex is mistuned (where larger numbers indicate increasingly high-frequency components). The third menu (4) selects the total number of harmonics in both detuned and matching complexes, which are always a contiguous set starting from the fundamental. Note that if the harmonic indicated for detuning is higher than the top of the complex, no detuning will be heard.
Control (5) allows the fundamental pitch of the complex to be varied between 0 and 1000 Hz, either by moving the slider or by typing a value directly into the box. Similar control (6) is provided for the duration of each tone, varying between 0 and 1000 ms. Finally, control (7) can cause the detuned harmonic to be started up to 200 ms before or after the remainder of the complex. Negative values correspond to the detuned harmonic starting before the rest of the complex.
After some practice at matching the pitches, you may wish to record your responses. After each stimulus presentation, pressing 'p' will result in a red circle (8) appearing at the co-ordinates corresponding to the parameters. After a number of such identifications, compare your responses with a schematic approximation to Moore et al's results (9).
The Darwin & Carlyon chapter mentioned above provides a good description of the basic phenomena as well as arguing their particular interpretation in terms of multiple grouping mechanisms.
Produced by: Dan Ellis
Based on: "streamer" demo by Martin Cooke
Release date: July 16 1998
Permissions: This demonstration may be used and modified freely by anyone. It may be distributed in unmodified form.