11 November 2009
Matthew Robertson
Speech and Hearing Group, Department of Computer Science, University of Sheffield, UK
Modelling the performance of hearing impaired listeners using an auditory model and speech in noise test
Hearing impairment affects almost nine million people in the UK. Although hearing aids have been available for some time, many hearing aids go unused. People cite many reasons for not using them, but a common reason is a lack of benefit in fluctuating noise environments, such as bars or restaurants. This project aims to use a model of the human auditory periphery, as a front end for an automatic speech recognition (ASR) system. This would allow us to model performance of human listeners on a speech in noise task. This auditory model could then be "broken" to match a specific hearing impaired listener, and used to tune hearing aids to better suit an individual's specific impairment.
One reason why hearing impaired listeners may report problems listening to speech in modulated noise, is a lack of ability to take advantage of dips in the background noise. Normal hearing listeners can use these dips to improve speech recognition in fluctuating noise backgrounds. A possible explanation for this lack of "glimpsing" may be down to a reduced ability to use temporal fine structure (TFS) information. TFS is the rapidly changing fluctuations in amplitude in a signal. TFS information has been shown to be of benefit when listening in fluctuating noise environments. We are currently using TFS information to generate missing data masks for use with the ASR system. We will report studies which use these masks for recognition, as well as trying to model the effect of hearing impairment on the availability of glimpses.
Slides (ppt),
Listen Again (mp3)
5 November 2009
Francisco Lacerda
Phonetics Lab, Department of Linguistics, Stockholm University
An ecological model of early language acquisition
The early stages of the infant's language acquisition process will be discussed in terms of a model of lexical emergence based on the acoustic characteristics of the infant-directed speech and the ecological settings of typical adult-infant interaction. The model's initial assumption is that the potential amount of sensory information available to the infant in its ecologic setting is so vast that the probability of random repetition of a given pattern is vanishingly small, given an underlying random uniform distribution. The model proposes that the infant's linguistic referential function initially emerges from the co-occurrence of a very limited number of repetitions of the acoustic pattern along with other correlated sensory information. Further lexical and grammatical categories can, at least to some extent, be derived by recursive application of essentially the same principle.
Slides (pdf)
21 October 2009
Steven Greenberg
Silicon Speech, Santa Venetia, CA, USA
Time Perspective in Spoken Language Processing
This presentation discusses how the "hidden" dimension of time may help decode spoken language in the brain. A perceptual study, using time-compressed speech and silent gaps, is used to illustrate how the concept of "decoding time" provides potential insight into the neural mechanisms involved in going from sound to meaning. Hierarchical Oscillatory Networks could be used to model the extremely complex process of speech decoding (and ultimately comprehension).
Slides (ppt),
Sounds and Figures (zip)
21 October 2009
Maurizio Filippone
Department of Computer Science, University of Sheffield, UK
The Probabilistic Approach in Data Modeling
In this short tutorial, we will review some basic concepts about probabilistic methods for data modeling. The data modeling process aims at building a description of some observable quantities of a system. The final goal of data modeling is to gain some understanding of the system under study and/or to predict future observations. Probabilistic methods provide a parametric description of the system, in the sense that the observations and all (or some) of the parameters are thought as random variables. Therefore, they are assigned probability distributions, and can be analyzed by means of statistical tools. Such a framework provides a principled way of dealing with many crucial tasks related to data modeling problems, such as model selection, predictions with confidence intervals, statistical testing, and many others. We will briefly discuss the advantages of this approach, along with its computational complexity that represents one of the main issues when dealing with very complex systems. Finally, we will show a simple application of a probabilistic method to a speech analysis problem.
Slides (pdf),
Listen Again (mp3)
14 October 2009
Jim Hieronymus
NASA Ames Research Center, Mountain View, CA
Exploiting Chinese Character Models to Improve Speech Recognition Performance
The Chinese language is based on characters which are syllabic in nature. Since languages have syllabotactic rules which govern the construction of syllables and their allowed sequences, Chinese character sequence models can be used as a first level approximation of allowed syllable sequences. N-gram character sequence models were trained on 4.3 billion characters. Characters are used as a first level recognition unit with multiple pronunciations per character. For comparison the CUHTK Mandarin word based system was used to recognize words which were then converted to character sequences. The character only system error rates for one best recognition were slightly worse than word based character recognition. However combining the two systems using log-linear combination gives better results than either system separately. An equally weighted combination gave consistent CER gains of 0.1 - 0.2% absolute over the word based standard system.
14 October 2009
Jim Hieronymus
NASA Ames Research Center, Mountain View, CA
Spoken Dialogue Systems for Space and Lunar Exploration
Building spoken dialogue systems for space applications requires systems which are flexible, portable to new applications, robust to noise and able to discriminate between speech intended for the system and conversations with other astronauts and systems. Our systems are built to be flexible by using general typed unification grammars for the language models which can be specialized using example data. These are designed so that most sensible ways of expressing a request are correctly recognized semantically. The language models are tuned with extensive user feedback and data if available. The International Space Station and the EVA Suits are noisy (76 and 70 dB SPL). This noise is best minimized by using active noise canceling microphones which permit accurate speech recognition. Finally open microphone speech recognition is important to hands free, always available operation. The EVITA system has been tested in prototype lunar space suits in the Arizona desert. Using an active noise canceling head mounted microphone in a pressurized suit, the lowest word error rate was 3.6 % for a 2500 word vocabulary. A short clip of the surface suit spoken dialogue system being used in a field test will be shown.
8 April 2009
Jaydip Ray
Consultant ENT Surgeon, Sheffield Teaching Hospitals & Sheffield Children's Hospital, Sheffield, UK
Modern otology and its links with the scientific community
Otology is a rapidly advancing and expanding super speciality within ENT surgery which
screens and treats patients of all age groups starting from newborns to elderly. Significant
proportion are medical conditions like hearing and balance disorders or allergies which whilst
the remainder is purely surgical. The speciality has embraced modern technology (fibreoptics,
lasers, electronics, computer science, speech recognition, speech processing, radioisotopes,
imaging) and other biosciences like genetics and immunology to move ahead. It also uses various
implanted devices for cosmetic and functional rehabilitation of structure and function.
One of the best examples of this harmonious working of medicine, surgery and computer
science is in ongoing developments in implanted hearing devices like cochlear implants and also
in image guided surgery.
This talk is aimed to provide a broad overview of the work we do and also to explore areas
of common interest.
29 January 2009
Mark Wibrow
University of Sheffield, UK
Acoustic Cues for Sarcasm? How Interesting
Just under 10% of human utterances are ironic. Irony is type of figurative
language which may pose problems for machine dialog systems as the intended
meaning of an ironic utterance is not the same as its literal meaning.
This talk will focus on a particular type of irony: sarcasm. English speakers
have an intuitive idea of a sarcastic tone of voice, yet analyses of sarcasm
stemming from pragmatics, social psychology and neuropsychology, whilst
acknowledging the use of this tone in speech, focus on the use of other cues to
recognise conflicting intended and literal meanings, which are integrated under
the influence of a "theory of mind" (the ability to infer the attitudes and
beliefs of others).
Some recent (and not so recent) research on English sarcasm will be discussed in
order to establish whether humans can and do use the sarcastic tone of voice to
supplement the identification of sarcastic utterances, and whether this tone
has systematic acoustic properties which could be exploited in machine dialog
systems, in both spoken language comprehension and production.
Slides (pdf)