Autocorrelation


introduction | demonstration | investigate | reading | credits | downloading | home

Introduction

Pitch is defined as that attribute of auditory sensation in terms of which sounds may be ordered on a musical scale (American Standards Association). There are a number of theories of pitch perception and these have given rise to computational models which implement them. These models have 3 stages:

  1. peripheral processing
  2. feature analysis
  3. pitch determination

For pitch perception models that use temporal information some mechanism for identifying periodicities in the signal for use in the feature analysis stage is required. This demonstration does precisely this.

The usual method for deciding if a signal is periodic and then estimating its period is the autocorrelation function:

Essentially, all that is happening is the signal x(t) is being convolved with a time-lagged version of itself. To obtain a useful set of results, the autocorrelation function is computed over a range of lag values.
It is an important property of the autocorrelation function that it is itself periodic. For periodic signals the function attains a maximum at sample lags of 0, +-P, +-2P, etc. where P is the period of the signal.

One major limitation of the autocorrelation function is that it can retain too much information present in the signal. In speech, numerous peaks present in the autocorrelation function are due to damped oscillations of the vocal tract response. If these peaks happen to be bigger than the peaks due to periodicity, the simple procedure of picking the largest peak to be the period will fail.

Therefore, the signal needs to be pre-processed in some way to make the periodicity more prominant while suppressing other features which may cause distracting peaks. Such pre-processing techniques are sometimes called spectrum flatteners. Many techniques have been proposed but centre clipping [2] appears the best for this situation.

Centre clipping works by clipping a certain percentage of the waveform. Let Amax be the maximum amplitude of the signal and CL be the clipping level. CL is a fixed percentage of Amax (say 30%). Therefore, the output from the center clipper is as follows:

y(n) = x(n)-CL [x(n)>CL]
y(n) = 0 [x(n)<=CL]

ie, for samples below the clipping level, the output is zero, and for samples above the clipping level, the output is equal to the input minus the clipping level. See [1, page 151] and [2] for more information.

The demonstration

 

Type 'auto' to launch the demo. When the window appears, use the load menu (1) to load a either a tone, a noise or a sound file. The signal can be played by clicking anywhere within the signal axes. Once the signal has been loaded and displayed, a set of cursors appear. These can be moved about in order to select various parts of the signal. They can be linked together if necessary (2). By default, the segment of the signal contained within the cursors is played after movement of the cursors. This can be turned off if desired (2). The use of centre clipping (see above) and the clipping level can also set in (2).

When the left hand cursor is moved, the autocorrelation plot will refresh showing the current autocorrelation function output. An overall estimate of the signal's pitch - its pitch contour can be displayed by clicking the button at the bottom right of the demo (3). This can be automatically updated when the window type or size is changed(2)

The zoom panel (4) allows the signal (and the pitch contour if present) to be zoomed in on.

Things to investigate

  1. When you place the cursors over a vowel, what shape is the autocorrelation function output?
    Now place the cursors over a fricative, for example, an /s/ sound. What shape is the output now?
    How can your observations be explained?
  2. Display the pitch contour. Whilst slowly tracking along the signal with the left cursor, study how increases and decreases in the pitch shown on the pitch contour relate to changes in the autocorrelation function output.
    Are there any discontinuities (spurious peaks) in the pitch contour? What is causing these?
  3. What effect does turning centre clipping on have?
  4. Does altering the window type alter the plots? If so, why?

References

[1] Rabiner, L.R. and Schafer, R.W., "Digital Processing of Speech Signals". Prentice-Hall, 1978.

[2] Sondhi, M.M., "New Methods of Pitch Extraction". IEEE Trans. Audio and Electroacoustics, Vol. AU-16, No.2, pp.262-266, June 1968.

Further reading

For a frequency-domain method of pitch estimation, see also the demonstration for pipeline processing. (pipeline).


Credits

Produced by: Stuart N Wrigley

Release date: January 20 1999

Permissions: This demonstration may be used and modified freely by anyone. It may be distributed in unmodified form.