M4 speech recognition
HTK has all the tools needed to train a speech recognition system from a flat start. However it has some limitations. Most notably release 3.2 does not include a decoder that is efficient enough to perform large vocabuary continuous speech recognition (LVCSR). As a result only bigram language models (LMs) can be used with HTK's Viterbi decoder.
The DUcoder is a more efficient stack decoder that is better suited to LVCSR. It uses the libraries from an earlier release of HTK (version 2.0) so many of the file types and models created by HTK 3.2 can be used directly by the DUcoder. It can decode n-gram language models and has many pruning options that enable quicker searches. At present no attempt has been made to port the DUcoder to HTK 3.2. An upgrade would, most likely, require the changes that were made to the HTK 2.0 core libraries to be applied to version 3.2. The difference in versions means that many of the new HTK features are not compatible with the DUcoder. For example, the DUcoder does not automatically recognise PLP feature types. While this incompatibilty may circumvented by changing the feature type label inside each file, other problems are not so easily solved. Table 1 compares HTK's Viterbi decoder (HVite) and the DUcoder.
|PLP features||yes||no (incompatible file types, easily fixed)|
|MLLR trained models||yes||no (incompatible file types)|
|Cross word context dependent decoding||yes||no (not implemented)|
|Word internal context dependent decoding||yes||yes|