S3L: Statistical Summarization of Spoken Language

Funded by EPSRC (GR/R42405) from 15 December 2001 - 14 June 2005

Investigators: Steve Renals and Yoshi Gotoh
Research Associate: Heidi Christensen
Research Student: BalaKrishna Kolluru
Industrial Collaborators: BBC Research and Development Department; SoftSound


The main aim of the proposed research is the automatic summarization of broadcast speech. We plan to adopt a statistical approach to the problem, including the development of new models and algorithms for summarization, an investigation of the utility of prosodic features, and the construction and evaluation of demonstration systems.

This project is primarily concerned with developing methods for the non-extractive summarization of spoken language using trainable statistical models. Although rule-based approaches have had some success, they have tended to be domain specific and typically require a large amount of effort to encode the domain knowledge as a template or script. Statistical methods have the potential to remove the bottleneck of manually encoding domain knowledge, and to increase the generality of summarization systems. Furthermore, we are specifically concerned with spoken language, which is more casual and less grammatical than text. We believe that statistical methods are well suited to this situation, particularly given the presence of speech recognition errors. Recent research in areas such as named entity identification has indicated that the relatively simple methods that have proven to be so successful in speech recognition may be applied to more demanding language processing tasks. A key scientific question that this project will address is whether such simple models may be applied to more complex tasks, such as summarization.

The main specific objectives are the development, implementation and evaluation of the following techniques for broadcast speech:

  1. Extractive summarization;
  2. Direct generative summarization using language model approaches;
  3. Content/style models for non-extractive summarization;
  4. Multi-document summarization;
  5. Incorporation of prosodic features using maximum entropy models.
A final objective is the construction of demonstration systems employing these techniques.