RESPITE: Private: The CASA Toolkit Page: Multisource Decoding Demonstration

The RESPITE CASA Toolkit Project

Multisource Decoding Demonstration


The following recipe is a demonstration of the multisource decoding technique for robust ASR applied to the Aurora 2.0 data base. (It assumes you already have Aurora 2.0 installed on your system.) The scripts below are those that were used to generate the results for the following publication:

Step 1. Download CTK

(If you already have CTK installed, you may jump to step 3.)
  1. Download the latest version of CTK, and unpack it.
  2. Make an environment variable called $CTKROOT to be the path of the directory called `CTK' that is constructed when the tar file is unpacked.

Step 2. Installing CTK

  1. Follow the instructions in $CTKROOT/INSTALL.

Step 3: Generating the feature data.

(If you have already generated rate32_d feature data for Aurora 2.0 then you may jump to step 4.)

This step generates the features for the Aurora 2.0 Test Set A. The features are based on the output of a 32 channel gammatone filterbank, and include temporal derivative. The necessary scripts and support files are contained in the package, CTK_AURORA - Part 1, which can be downloaded below:

  1. Download and unpack the tar file - a directory called CTK_AURORA will be created.
  2. Set the environment variable $CTK_AURORAROOT to be the path of this directory.
  3. Make another environment variable called $AURORAROOT to be the path of top level directory of your aurora installation.
  4. Change directory to $CTK_AURORAROOT/scripts.
  5. Run the script $CTK_AURORAROOT/scripts/do_make_rate32_d.
The script will now produce the feature data for the whole of Test Set A. This data will appear in the directory $CTK_AURORAROOT/data/rate32_d/testa. This directory will be constructed automatically. Note, the complete test set occupies roughly 1.2 Gigabytes of disk space. If you do not have sufficient disk space on the disk where $CTK_AURORAROOT is located, then before running the script, construct the directory $CTK_AURORAROOT/data as a symbolic link to a directory on a device where there is plenty of space. The script will take several hours to run.

Step 4: Setting up and testing the clean speech models.

This step installs HMMs that have been trained on rate32_d data and tests them using traditional ASR techniques. The necessary scripts, model files and other support files are contained in the package, CTK_AURORA - Part 2, which can be downloaded below:
  1. Download the tar file and copy it to the directory above $CTK_AURORAROOT (i.e. $CTK_AURORAROOT/..).
  2. Unpack the tar file. The tar file contains model files, label files and transcription files that will be installed in directories under $CTK_AURORAROOT.
  3. Change directory to $CTK_AURORAROOT/scripts.
  4. Make sure $CTKROOT is set correctly.
  5. Type: setenv CTKWORK $CTK_AURORAROOT
  6. Type: test_trad_asr 1 clean
    There will be a short pause while the HMM definitions are read, after this recognition output should start appearing on stdout. The script is set up to use 3 mixture models and should produce around 97.0% accuracy on clean data. With 7 mixture models the result will be closer to 99.0%.
If you examine the directory $CTK_AURORAROOT/models you should find it contains both 3 and 7 mixture models.

Step 5: Generating the `degree of voicing' data.

The multisource decoder technique employs a voicing decision to help segregate the speech data. This step computes and stores the degree of voicing data for each utterance in the test set prior to running the recognition experiments.

The necessary scripts are contained in the package, CTK_AURORA_MS - Part 1, which can be downloaded below:

  1. Download the tar file and copy it to the directory above $CTK_AURORAROOT (i.e. $CTK_AURORAROOT/..).
  2. Unpack the tar file. The tar file contains scripts that will be installed in $CTK_AURORAROOT/scripts/.
  3. Change directory to $CTK_AURORAROOT/scripts.
  4. Make sure $CTKROOT is set correctly.
  5. Make a directory called $CTK_AURORAROOT/data/harmonicity. This will be where the voicing data is stored. This data requires roughly 34 Mb of disk space.
  6. To start generating the data type:
    do_make_voicing_data
    This is a computationally intensive task and the script will take a long time to run. (Note, the CTK script above computes degree of voicing as a biproduct of an auditory pitch tracking model. There are far cheaper techniques that could be employed for this step which would probably be equally effective). Remember to check that you have sufficient storage space before running the script!

Step 6: Multisource Decoding versus Missing Data ASR.

This step provides the scripts necessary to run the recognition experiments using both the multisource decoding technique and the missing data baseline system (as described in Barker, Cooke and Ellis (2001) ). The scripts are contained in the package, CTK_AURORA_MS - Part 2, which can be downloaded below:
  1. Download the tar file and copy it to the directory above $CTK_AURORAROOT (i.e. $CTK_AURORAROOT/..).
  2. Unpack the tar file. The tar file contains scripts that will be installed in $CTK_AURORAROOT/scripts/.
  3. Change directory to $CTK_AURORAROOT/scripts.
  4. You need to set the environment varialbe CTKWORK:
    setenv CTKWORK $CTK_AURORAROOT
  5. Make sure $CTKROOT is set correctly.
  6. To run the baseline missing data system on the test set use the script test_baseline_md_asr e.g.:
    test_baseline_md_asr 1 clean
    To run the multisource decoding system on the test set use the script test_multisource_asr e.g.:
    test_multisource_asr 1 clean
    Read the text at the top of the scripts to see the syntax.
Once started there will be a short delay while the HMM definitions are read etc. Eventually recognition output will be generated on stdout. The scripts are using the 7 mixture models. The multisource decoding test will run a little slower than the baseline missing data test.


These pages are maintained by Jon Barker, jon@dcs.shef.ac.uk
Last modified: Tue Feb 10 15:36:36 GMT 2004