Workshop Programme

This program may be subject to minor changes. The detailed programme follows below.

9:00 Welcome
9:10 Keynote 1: Michiel Bacchiani (Google)
10:00 Break
10:20 Overview of the 4th CHiME Challenge
10:50 Poster session 1
12:20 Lunch
13:30 Keynote 2: Björn Schuller (University of Passau)
14:20 Oral session
15:40 Break
16:00 Poster session 2
17:30 Closing

Detailed Programme


  • The 4th International Workshop on Speech Processing in Everyday Environments
    Emmanuel Vincent (Inria, France), Shinji Watanabe (Mitsubishi Electric Research Labs, USA), Jon Barker and Ricard Marxer (University of Sheffield, UK)

Keynote 1

Session Chair: Ralf Schlüter (RWTH Aachen University, Germany)
  • Google speech processing from mobile to farfield [View Slides] [View Video]
    Michiel Bacchiani (Google)
    Abstract Recent years have shown a large scale adoption of speech recognition by the public, in particular around mobile devices. Google, with its Android operating system, has integrated speech recognition as a key input modality. The century of speech that our systems process each day shows how popular speech processing has become. This talk will briefly describe some of the history and highlight some of the technical challenges we faced.
    More recently, home farfield devices, as popularized by Amazon Echo, have resulted in a major research emphasis on speech processing in such conditions. This talk will describe the Google research effort that underpin the upcoming Google Home devices. It will describe how our neural network technology is capable of processing multi-channel data and implicitly learns how to localize and beamform the incoming signal. We show three distinct approach to implement this. One uses factored raw waveform processing in the input layers. The second uses processing of the complex FFT signal in the input layer. And the third uses an adaptive filtering approach.

Overview of the 4th CHiME Challenge

  • Datasets, tasks, baselines and results [View Slides] [View Video]
    Emmanuel Vincent (Inria, France), Shinji Watanabe (Mitsubishi Electric Research Labs, USA), Jon Barker and Ricard Marxer (University of Sheffield, UK)

Poster session 1

Session Chair: Athanasios Mouchtaris (FORTH-ICS and University of Crete, Greece)

Keynote 2

Session Chair: Rahim Saeidi (Aalto University, Finland)
  • Computational paralinguistics in everyday environments [View Slides]
    Björn Schuller (University of Passau)
    Abstract An increasingly long list of states and traits of speakers is being targeted for automatic recognition by computers including their age, emotion, health condition, or personality. However, hardly any of these have been encountered in “everyday” usage by the broad consumer mass up to now. This is certainly also owed to robustness issues, which shall be discussed here. Traditionally, these comprise speech enhancement, feature enhancement, feature space adaptation, or matched conditions training – mainly to cope with additive or convolutional noise. In addition, a number of further robustness issues mark this field of speech analysis, including interdependence of states and traits, potential subjectivity in the labels, phonetic content variation in the acoustic analysis, varying language and erroneous speech recognition in the linguistic analysis, and diversity of the cultural background of speakers. Finally, a number of hardly tackled issues remain such as the analysis of multiple speakers or in far field condition with multiple microphones. In the talk, an overview on these challenges and existing solutions is given. Then, required future research efforts will be named to help Computational Paralinguistics’ massive launch into the next generation dialogue systems and many other applications.

Oral session

Session Chair: Marc Delcroix (NTT, Japan)

Poster session 2

Session Chair: Xiong Xiao (Nanyang Technological University, Singapore)
US - California - San Francisco - Pier 14 - Panorama - 2012.06.28 - Brylie Oxley