AMIDA: Augmented Multi-party Interaction with Distance Access

AMIDA is a new European Commissioned project funded to continue the research begun under AMI.

AMIDA will develop and expand the research vision to understand better and build new support for human communication at a distance.

The ground-breaking research that we shall undertake in AMIDA will span several traditionally separate disciplines, including:

  • Qualitative human analysis and human factors;
  • Audio-video processing, including unconstrained speech recognition and natural scene analysis;
  • Multimodal structure and content analysis, including the modelling of individuals and groups, through the joint processing of multiple (multimodal) information channels (audio, visual, slides, handwriting, and white board activity);
  • HCI, application prototyping, evaluation, and system integration.

The AMIDA research work will directly build upon the recognized achievements and large multimodal corpora (becoming a standard reference in the area of multimodal processing) resulting from AMI.

AMIDA represents a very challenging shift in emphasis from meeting recordings to live meetings with remote participants, using affordable commodity sensors (such as webcams and cheaper microphones), and targeting the development of advanced videoconferencing systems featuring new functionalities such as (1) filtering, searching and browsing; (2) remote monitoring; (3) interactive accelerated playback; (4) meeting support; and (5) shared context and presence.

While addressing additional scientific challenges (such as real-time processing and processing of lower quality audio and visual signals), AMIDA also raises the transfer potential of technologies through genuine integration of the AMIDA industrial partners collaborating on common prototypes and applications. Finally, through its Community of Interest, the AMI Consortium will also actively engage beyond the consortium members to spread awareness and knowledge among vendors, futurists and other research laboratories in the field.

A major output of WP4, and the project as a whole, is the development of a state-of-the-art online, realtime automatic speech recogniser. This technology has been further harnessed to produce the world's first web-based fully functioning automatic speech recogniser (webASR). I wrote and maintain all aspects of the web interface. It's free and very easy to use - just upload an audio file and after a short processing period, the transcript will be available in a range of formats including PDF.

Members from Sheffield include Thomas Hain, Phil Green, Roger Moore, Asmaa El Hannani, Vincent Wan and Stuart Wrigley