Ph.D. Opportunities in SpandH

These paragraphs outline some of the research topics in the group suitable for research students.

Modelling the visual Lombard effect

Supervisor: Jon Barker

In noisy environments, talkers subconsciously alter their speaking style to make their speech more easily heard against the noise background - this is known as the `Lombard' effect. The acoustic differences between normal speech and Lombard speech have been studied in detail, but there has been surprisingly little study of visual differences, i.e. changes in the pattern of lip, jaw and face movements. An understanding of the visual aspect of Lombard speech is necessary to inform the design of improved audio-visual automatic speech recognition (AV-ASR) systems.

The project will make use of the Speech and Hearing group's audio-visual speech recording facilities. A set of audio-visual Lombard speech recordings will be made, by asking subjects to read prompts while wearing headphones delivering noise at a variety of levels. Detailed 2-D visual information will be extracted by using artificial markers, and employing existing video-based marker tracking techniques. The data will be used to train noise-level dependent viseme models, which will be evaluated by incorporation into existing AV-ASR systems.

Clinical Applications of Speech Technology

Supervisor: Phil Green or Roger Moore

The projects in CAST have shown that Speech Technology can be used effectively in Speech Training and Assistive Technology. There are a number of possibilities for Ph.D. projects in the CAST area:

  • More effective visual feedback for speech and language therapy
  • Automatic judgement of speech intelligibility and consistency
  • Computer-based tools to help therapists diagnose speech disorders
  • e-inclusion applications
This work involves collaboration with clinicians and assistive technology specialists.

Information Access from Spoken Language

Supervisor: Yoshi Gotoh

We have a well-established research effort in the general area of accessing information from spoken language, particularly from broadcast speech. This research has includes work in spoken document retrieval, automatic punctuation and identification of named entities. We have focussed on the use of trainable, statistical models, typically finite state models (eg Hidden Markov Models - HMMs).

Future research will continue these themes. There is a wide range of potential PhD projects, which may focus on new models (statistical translation models, maximum entropy models), new tasks (summarization, question answering), or the incorporation of further acoustic information (prosody).

Projects of particular interest include:

  • Statistical machine translation algorithms for summarization;
  • Maximum entropy framework for the incorporation of prosodic feature in information extraction and summarization;
  • Information Access from spontaneous speech (eg meetings);
  • Statistical approaches to question answering;
  • Information extraction for telephone conferencing.

Language Modelling

Supervisor: Yoshi Gotoh

Potential language modelling projects include:

  • Variable word rate and trigger models;
  • Language models based on latent semantic analysis;
  • Maximum entropy models.