Rogier van Dalen is a Research Associate in Speech Recognition at the University of Cambridge, where he also did his PhD. His research interest is applying machine learning techniques to speech recognition. Host: charles.fox@sheffield.ac.uk

Max Little

MIT

The Parkinson's Voice Initiative

Neurological disorders such as Parkinson's destroy the ability to move; there are over 6 million worldwide with the disease, but no cure. Until we have a cure, and indeed, to find a cure, we need objective tests. Unfortunately, there are no biomarkers (e.g. blood tests). Current objective symptom tests for Parkinson's are expensive, time-consuming, and logistically difficult, so mostly, they are not done outside trials. What is exciting though: voice is affected as much by Parkinson's as limb movements, so we have developed the technology to test for symptoms using voice recordings alone. This could enable some radical breakthroughs, because voice-based tests are as accurate as clinical tests, but additionally, they can be administered remotely, and patients can do the tests themselves. Also, they are high speed (take less than 30 seconds), and are ultra low cost (they don't involve expert staff time). So, they are massively scalable. Host: charles.fox@sheffield.ac.uk

Richard Smith

Smith Watkins Ltd

Good Vibrations! The Physics of Brass Instrument Design

A talk with demonstrations showing how scientific methods can help the design of musical instruments and at the same time demolish some of the myths perpetuated by musicians.

Of particular interest speech researchers interested in vocal tract and oral cavity modelling, who would like to see how their assumptions and models compare to those used in instrument design. Richard is currently working on ideas about how instrument acoustics interact with the oral cavity, and how ultra-high notes can be produced. His company designed the custom instruments used at the recent Royal Wedding and Royal Jubilee.

More details at, http://www.smithwatkins.com/ -- see the "library" section for publications and scientific overview articles.

'Unconventional, maybe; eccentric, perhaps; but then few scientists in their field can claim to have charted new territories of knowledge like Richard Smith.'--Yorkshire Post

Richard Smith wrote a doctoral thesis on trumpet acoustics before joining Boosey and Hawkes, where he worked for 12 years as chief designer and technical manager responsible for the world famous Besson brass range, including the original trumpets used by Derek Watkins and John Wallace, trombones for Roy Williams and Don Lusher, and the cornets used by most brass and military bands. Richard’s research work into acoustics, testing and development of brass instruments has been widely publicised in the scientific literature and on TV and radio, and he has travelled in Europe, the United States and Japan, testing instruments with top professional symphonic and session players, and presenting papers at international conferences on acoustics and instrument design. In 2000, Richard's cornet 'The Soloist' was awarded Millennium Product Status by the U.K. Design Council, recognising its enduring place among the best of British design, creativity and innovation, as 'a brass cornet with a unique system of interchangeable leadpipes, providing several instruments in one body that match changing playing conditions and genres.' These awards were granted to only 1,000 British products and services, deemed to be challenging existing conventions and solving key problems in an environmentally and ethically sound manner, as judged by a panel of judges drawn from design, business, science and the arts. In 2008, Richard was made an Honorary Fellow of the College of Science and Engineering (University of Edinburgh), in recognition of his collaborative work within the School of Physics on the measurement and understanding of the acoustics of brass instruments. He continues to maintain a close association with Edinburgh University, and is furthering the training of the next generation of British brass instrument designers and makers through a series of apprenticeship schemes. Richard moved to North Yorkshire in 2005 and in 2010, he celebrated, with Derek, 25 years of designing and building specialist brass instruments. Host: charles.fox@sheffield.ac.uk

Dan Stowell

Queen Mary's London

Tracking multiple intermittent sources in noise: inferring a mixture of Markov renewal processes

Consider the sound of birdsong, or footsteps. They are intermittent sounds, having as much structure in the gaps between events as in the events themselves. And often there's more than one bird, or more than one person - so the sound is a mixture of intermittent sources. Standard tracking techniques (e.g. Markov models, autoregressive models) are a poor fit to such situations. We describe a simple signal model (the Markov renewal process (MRP)) for these intermittent data, and introduce a novel inference technique that can infer the presence of multiple MRPs even in heavy noise. We illustrate the technique via a simulation of auditory streaming phenomena, and an experiment to track a mixture of singing birds. Host: charles.fox@sheffield.ac.uk

Steve Renals

University of Edinburgh

(Deep) neural nets in speech recognition

In this talk I'll present some of our recent work in using deep neural networks (DNNs) for speech recognition. Amongst other things the talk will include:

- a discussion of the similarities and differences between the recently discovered deep neural network approaches, and the neural network approaches used for speech recognition in the 80s, 90s, and 00s;

- MLAN, an approach to incorporate out-of-domain data using posterior features;

- supervised and unsupervised ways to make use of multilingual acoustic training data;

- comparison of tandem (DNN outputs used as features) and hybrid (DNN outputs used directly as probability estimates) approaches, and their combinations.

The talk will include results of experiments on Globalphone, BBC broadcasts, and TED talks.

This is joint work with Peter Bell, Arnab Ghoshal, and Pawel Swietojanski. Host: charles.fox@sheffield.ac.uk

Oscar Saz

Carnegie Melon University

Speech recognition and evaluation in the presence of severe phonological errors

In this talk, I will address some of the work I carried out in the recognition and evaluation of a corpus with speech from children with cognitive disorders. These speakers present such heavy phonological errors in their speech, due to learning delays, that their lexicons are completely different from the normal pronunciation of words. I will address how to automatically learn new dictionaries to improve recognition rates for these speakers and different methods to detect these pronunciation errors with the goal of developing computer assisted speech therapy tools. In the end, I will make a comparison with more recent work on language learning tools for non-native speakers and how, in this case, it might be more necessary to focus on the comprehension level than in the phonological level. Host: charles.fox@sheffield.ac.uk

Marcelo Rivolta

BMS, Sheffield University

Repairing the ear with stem cells

The presentation will discuss recent advances using stem cells in the search of a treatment for hearing loss. A method has been developed to generate ear sensory cells from human embryonic stem cells (hESCs). By exposing hESCs in a dish to the chemical signals that induce the formation of the ear in vivo, we have generated ear stem cells that can produce sensory hair cell-like cells and auditory neurons. We have taken the hESC-derived ear stem cells and explore if they could repair a deaf ear in a gerbil model of auditory neuropathy (that is when the cochlear nerve is damaged). When hESC-derived ear cells were transplanted into cochleae that have lost their auditory neurons the cells survived, engrafted and differentiated. Moreover, they sent projections making connections with the hair cells and with the brain. But more remarkably, they elicited a functional recovery. When the ear of the transplanted animals was stimulated with sound, we could record brain activity using a test called Auditory Brainstem-evoked Responses (ABR). The significance of this work and future steps will also be reviewed. Host: charles.fox@sheffield.ac.uk

Chris Mitchell

Audio Analytic Ltd

Sound Recognition in Physical Security Applications

Audio Analytic is a start-up company based in Cambridge, UK that primarily sells sound recognition software into the physical security market place; Aggression, Gunshot, Car Alarm and Glass Break detection. Applying sound recognition techniques within the physical security market place has a number of distinct technological, process and system level challenges. These challenges although specific to the discipline of machine listening are experienced in the general sense when applying a new concept to a market place. The talk, presenting by Dr. Christopher Mitchell, CEO & Founder of Audio Analytic, will discuss the company's experiences in applying cutting edge research in sound recognition to industrial applications, the lessons learnt and how to effectively and successfully transfer lab technology to industrial use. Host: charles.fox@sheffield.ac.uk

Department of Computer Science

SpandH Seminar Abstracts

Ke Chen

Extracting Speaker Specific Information with a Deep Neural Architecture

Keiichi Tokuda

Flexible speech synthesis in karaoke, amine, smart phones, video games, digital signage, TV and radio programs, etc.

Stephen Cox

Read my Lips: Reflections on nearly Ten Years of Research at the University of East Anglia in Automatic Lip Reading

Stuart Green

Opportunities for applied speech technologies in film and broadcast

Raymond Ng and Erfan Loweimi

The USFD Systems for the IWSLT 2014 Evaluation (Ng)

Phase information in speech recognition (Loweimi)

Tobias May

A monaural cocktail-party processor: Speech segregation in background noise

Michael I Mandel

Detailed models for understanding speech in noise

Oscar Saz, Charles Fox and Heidi Christensen

Natural Speech Technology

Tim Jurgens

Auditory models for better rehabilitative devices

Patrick Naylor

Acoustic Signal Processing and Applications to Speech Dereverberation

Jeff Adams

Speech & NLP at Amazon: Unique Challenges, Unique Resources

Angela Josupeit

Modeling of Speech Localization in a Multitalker Environment using Binaural and Harmonic Cues

Pete Howell

Screening school-aged children for risk of stuttering and other speech disorders

Alexa Wright

Conversation Piece: Speech technology in art

Rogier van Dalen

Efficient segmental features for speech recognition