Ph.D. Opportunities in SpandHThese paragraphs outline some of the research topics in the group suitable for research students.
Modelling the visual Lombard effectSupervisor: Jon Barker
In noisy environments, talkers subconsciously alter their speaking style to make their speech more easily heard against the noise background - this is known as the `Lombard' effect. The acoustic differences between normal speech and Lombard speech have been studied in detail, but there has been surprisingly little study of visual differences, i.e. changes in the pattern of lip, jaw and face movements. An understanding of the visual aspect of Lombard speech is necessary to inform the design of improved audio-visual automatic speech recognition (AV-ASR) systems.
The project will make use of the Speech and Hearing group's audio-visual speech recording facilities. A set of audio-visual Lombard speech recordings will be made, by asking subjects to read prompts while wearing headphones delivering noise at a variety of levels. Detailed 2-D visual information will be extracted by using artificial markers, and employing existing video-based marker tracking techniques. The data will be used to train noise-level dependent viseme models, which will be evaluated by incorporation into existing AV-ASR systems.
Multitask Learning in Speech TechnologySupervisor: Phil Green
People do not learn to recognise speech in isolation: at the same time they are learning to recognise the speaker, the speaker's mood, the speaker's intention, and so on. They are also learning to deal with corruption in the acoustics due to other sound sources. These tasks are mutually beneficial - knowing who is speaking, for instance, makes it easier to understand the speech. Multitask learning techniques attempt to replicate this by defining an internal representation which is shared between tasks. Work in SPandH has shown good results for recognition in noise whewre the additional task is speech enhancement.
Clinical Applications of Speech TechnologySupervisor: Phil Green or Roger Moore
The projects in CAST have shown that Speech Technology can be used effectively in Speech Training and Assistive Technology. There are a number of possibilities for Ph.D. projects in the CAST area:
Information Access from Spoken LanguageSupervisor: Yoshi Gotoh
We have a well-established research effort in the general area of accessing information from spoken language, particularly from broadcast speech. This research has includes work in spoken document retrieval, automatic punctuation and identification of named entities. We have focussed on the use of trainable, statistical models, typically finite state models (eg Hidden Markov Models - HMMs).
Future research will continue these themes. There is a wide range of potential PhD projects, which may focus on new models (statistical translation models, maximum entropy models), new tasks (summarization, question answering), or the incorporation of further acoustic information (prosody).
Projects of particular interest include:
Language ModellingSupervisor: Yoshi Gotoh
Potential language modelling projects include: