|RESPITE: The CASA Toolkit Page: Documentation: Block Library Index:HMMDecoderStandard|
The HMMDecoderStandard block performs HMM Viterbi decoding on a stream of input feature vectors given a set of HMM models.
As a by-product the decoder outputs a stream of state-likelihood frames (out1. Each frame consists of the likelihood of each model state having generated the corresponding input feature frame. Within these frames the state likelihoods occur in the same order in which the states are defined in the HMM definition file.
The operation of the decoder is specified by the following set of parameters:
The HMM_FILE parameter specifies the name of a file that associates HMM definitions with HMM NAMEs. This file can have one of two possible formats, depending on whether the HMMs are stored in a single file or are stored separately:
This is a string parameter that gives the correct transcription for the utterance to be recognised. The transcription must be encoded as a sequence of single character labels which correspond to the character labels for the correct sequence of models.
This parameter specifies the name of a file which associates HMM NAMEs with HMM LABELs. Whereas each HMM must have a unique NAME, several HMMs can share the same LABEL. e.g. there may be both a male and female version of the digit one with NAMEs "one_m" and "one_f" both having the LABEL "1".
Each line of the file defines a separate LABEL. The LABEL occurs as the first character on the line and is followed by the NAME of each HMM that shares this LABEL. e.g:
1 one_m one_fetc.
2 two_m two_f
S sil sp
This parameter specifies the name of a file containing the grammar to be applied to the set of models.
The GRAMMAR_FILE specifies the grammar in terms of the NAMEs of the individual HMMs. The format is the same as that used in version 1.x of HTK. For more details see here.
If no GRAMMAR_FILE is specified, then all the models are placed in a simple loop grammar. i.e. any model can follow any other model.
The SILENCE parameter is a string composed of the labels for all the models that are to be regarded as silence. These labels will be removed from the transcription and the recognition hypothesis before the recognition statistics are calculated, e.g. if the SILENCE parameter is set to ``s" and the recognition output is ``s1s2s" then this will be treated as ``12" when scoring the correctness and accuracy.
The HAS_DELTAS parameter is a boolean switch that when turned on informs the decoder that the input data contains `delta' parameters i.e. if the input data is a vector of size 64, then the 1st 32 elements are treated as static features, and the 2nd 32 elements are the corresponding delta features. This switch only makes a difference when the decoder is using a probability calculation for which deltas features are handled differently to static features. For example, when using bounded marginalisation the bounds constraint is only applied to missing static features, not missing delta features.
The decoder will normally use the deltas if they are supplied. However, if the USE_DELTAS switch is set to FALSE then deltas will be ignored. If left unset then USE_DELTAS will take the value of HAS_DELTAS i.e. they are used if present. (Note, it is an error to have HAS_DELTAS as FALSE and USE_DELTAS as TRUE.)
The LOG_FILE parameter specifies the name of an optional recognition log file to which the recognition statistics will be sent. If the file does not already exist it will be created. If it does exist then the statistics will be appended to it. If no LOG_FILE parameter is specified, or if LOG_FILE is set to the empty string (i.e. LOG_FILE=""), then the recognition statistics will be sent to stdout.
The LOG_FILE_2 parameter specifies the name of an additional recognition log file to which detailed per utterance information about the results of the decoding will be sent. This file is in XML format and a corresponding DTD file can be found in $CTKROOT/src. If LOG_FILE_2 is not set then the additional log file will not be generated.
If OUTPUT_CONFUSIONS is set to true then the recognition statistics will include a confusion matrix.
The decoder can produce approximate N-best lists. The NBEST parameter determines the size of the N-best list to produce. By default NBEST is set to 1 and only the highest scoring hypothesis is considered.
The N-best lists are computed using the approximate lattice N-best algorithm (see Schwartz and Austin, ICASSP `91 for details).
The WORD_PENALTY is added to the score of a token as it passes out of the final state of a model. By default this penalty is set to 0.0, but if the recogniser is making excessive insertion errors then the recognition accuracy can sometimes be improved by setting the penalty to a positive value. It has been found that this penalty can greatly improve results when performing missing data recognition (see next section). The appropriate value to use is best determined empirically.
The MAX_APPROX is a boolean parameter that can be set to true to offer a small increase in speed in the probability calculation when using multiple mixture models. If this is set to true then rather than summing the likelihood contributions of each Gaussian mixture, the overall likelihood is estimated by taking it to be the likelihood of the mixture with the biggest (i.e. the maximum) likelihood. This approximation is normally very close as there is typically a difference of several orders of magnitude between the likelihoods of the most likely mixture and even the 2nd most likely.
This string is a regular expression that is used as an `hypothesis filter'. When the filter is used, the the decoder will reject any hypotheses which match the regular expression and will scan down a 50-best list to find the highest ranking hypothesis that does not match the filter. If the list contains no compliant hypotheses, the decoder reverts to accepting the originally selected best hypothesis.
The FIRST_TOKEN is a string parameter that supplies the label name of a forced first token, i.e. when this parameter is set the decoding is forced to start with the given model. This is typically used to force a decoding to start with the silence model, e.g. FIRST_TOKEN="S".
If the FIRST_TOKEN parameter is not set then the decoding can start with any token.
The FINAL_TOKEN is a string parameter that supplies the label name of a forced final token, i.e. when this parameter is set the decoding is forced to end with the given model. This is typically used to force a decoding to end with the silence model, e.g. FINAL_TOKEN="S".
If the FINAL_TOKEN parameter is not set then the decoding can end with any token.
If the STATE_PATH switch is set to true then the decoder will record the state path that the winning hypothesis has taken through each model. The frame by frame state occupancy will be output to LOG_FILE_2. Note, recording this information requires some computational overhead, so if it is not required the STATE_PATH switch should be turned off.
This is a boolean parameter that if set to TRUE causes a record of the settings of the decoder parameters to be written at the end of the log file. By default DUMP_PARAMETERS is FALSE.
|Inputs||Meaning||Sample||1-D frame||2-D frame|
|out2||state max mixture label|
|LOG_FILE||String||-||Name of an optional log file|
|LOG_FILE_2||String||-||Name of additional detailed log file|
|WORD_PENALTY||Float||0.0||The creation penalty|
|HMM_FILE||String||-||Name of the HMM file list|
|GRAMMAR_FILE||String||-||File storing the grammar|
|LABEL_FILE||String||-||File storing HMM NAME-> HMM LABEL mapping|
|FIRST_TOKEN||String||-||Label of a fixed first token|
|FINAL_TOKEN||String||-||Label of a fixed final token|
|TRANSCRIPTION||String||-||The correct transcription|
|SILENCE||String||""||The silence label(s)|
|MAX_APPROX||Boolean||False||Use max mixture approximation|
|NBEST||Int||1||Return best N hypotheses|
|STATE_PATH||Boolean||False||Record HMM state path|
|HAS_DELTAS||Boolean||False||Models have delta parameters|
|USE_DELTAS||Boolean||-||Models have delta parameters|
|HYPOTHESIS FILTER||String||""||Regular expression for filtering hypotheses|
|OUTPUT_CONFUSIONS||Boolean||False||Output confusion matrix|
|DUMP_PARAMETERS||Boolean||False||Write parameters to log file|