Speaker Verification using Support Vector Machines

Investigator: Vincent Wan Supervisor: Steve Renals

Current state-of-the-art speaker verification systems are based on discriminatively trained generative models. In these systems, discrimination is achieved at the frame-level: each frame of data is scored separately and then the scores are combined to compute the score for the whole utterance. A better way is to apply discrimination on the sequence as a whole rather than its constituent parts. This type of discrimination may be achieved using the support vector machine (SVM).

In this thesis we develop the techniques required to make SVMs work well on speaker verification. The main focus of attention is on the kernel function. We investigate the polynomial kernel for classifying frames of data one at a time and sequence kernels to achieve sequence-level discrimination.

The polynomial kernel is less widely used than, for example, the radial basis function kernel. We examine the properties of the polynomial kernel in relation to a polynomial classifier. In doing so we develop a spherical normalisation technique that can improve the performance of the polynomial kernel. Spherical normalisation may be seen as a preconditioning step that is completely general and not just restricted to polynomial kernels. We applied the technique to some sequence kernel functions and found that spherical normalization benefits these kernels also.

The sequence kernels are derived from generative models. These kernels use a generative model to map a variable length sequence to a fixed length vector. By representing the entire sequence as a single vector, the SVM can discriminate between whole sequences directly. We study the pair hidden Markov model with conditional symmetric independence constraints for use as a kernel function and the set of score-space kernels, which includes the Fisher kernel, for deriving non-linear transformations from a sequence to a fixed length vector using any parametric generative model.

Experimentally, a support vector machine combined with a sequence kernel and spherical normalization can out perform current state-of-the-art classifiers.