Automatic Speech Summarization for Mobile Messaging

Research Student: Costis Koumpis Supervisor: Steve Renals

As the emphasis in cellular networks changes from voice-only communication to a rich combination of content based applications and services, speech recognition can provide access to several types of information through a number of portable solutions, including mobile phones and personal digital assistants.

Voicemail is a common part of any office solution today. Sophisticated personal voicemail systems for the small office can be built using low cost computers and modern voiceband modems. Although, several advances in voicemail retrieval scheme have been reported, the limitations of the old paradigm still remain: users of voicemail systems on the receipt of a notification have to call their answering machine and download/listen to their actual/compressed messages. In order to overcome these limitations, the following interdependent issues have to be resolved:

  • how to access the vast volume of background knowledge that is needed to interpret a random simple spoken message
  • how to make it instantly and securely available to the recipient
This project deals with an alternative architecture for voicemail data retrieval on the move. It is comprised of three separate components: a speech recognizer, a text summarizer and a WAP Push Service initiator over SMS, enabling mobile users to receive in real-time text summaries of their voicemail without an explicit request. The motivation lies on the fact that such systems can reduce delays to important decision-making as they are suitable for capturing and distributing information quickly, no matter ones location and without human intervention. Additionally, the system under development offers uninterrupted information flow in noisy places (crowded streets, train stations, airports) or in so called `mobile phone free' environments (conference/meeting rooms, hospitals), better message management (visual listing and indexing of messages) and lower cost of receiving calls while roaming abroad.

Our approach to automatic summarization of voicemail has been extended by relating the prosodic features present in a message to the content of that message. We use Parcel feature subset selection algorithm to evaluate which of the several and often correlated lexical and prosodic features are potentially optimal as classifier inputs for voicemail summarization. Parcel minimizes the management of classifier performance data, facilitates the comparison of a large number of classifiers, and allows clear visual comparisons and sensitivity analysis.

For further information see the Speech Summarization using Prosodic Information project.