SWAG Overview

Summary

This working group will address points of common interest in the creation and dissemination of spoken-word digital audio archives. The final result will be a white paper describing the current-state-of-the-art in the field and providing a series of recommendations with respect to those areas in which future research should be concentrated to ensure progress. The working group was established through the cooperation of the European Union Network of Excellence for Digital Libraries (DELOS) and the United States National Science Foundation.

Introduction

Our diverse cultures rely increasingly on audio and video resources. We need to chart a steady course for the preservation of these resources and to determine the most effective ways to access the rich content embedded there. For example, though our nations possess enormous collections of spoken-word materials, much of these collections will decay or remain inaccessible to the public unless we act to chart a preservation and access path. Our aim is to forge agreement on topics vital to preservation and access so that as technology changes, we will be able to rely on our collections to understand and preserve these vital components of our cultural heritage. We also need to focus research support on areas of preservation and access that we believe will yield the greatest benefits across many intersecting disciplines.

Much of our collective aural heritage from the 20th century, and most TV and radio broadcasts, remains in analog audio format. We confront significant issues in:

There have been a variety of projects, in Europe and the United States, concerned with collecting, indexing and searching audio collections. These include: \emph{digital archiving projects such as the Northwestern University digital audio archive for U.S. Supreme Court oral arguments and the EU-supported Euromedia project; spoken document retrieval projects such as the THISL and OLIVE projects in Europe and the Informedia project in the US; and evaluation programmes such as the TREC Spoken Document Retrieval track.

Building on this platform, it is now possible to discuss an agenda for future research, standardization and development of best practice in this field. To support this goal the European Network of Excellence for Digital Libraries (DELOS, supported by the European Union Information Society Technologies programme) and the US National Science Foundation (Digital Libraries programme) have collaborated in establishing a working group whose remit covers the creation and dissemination of spoken-word digital audio archives. This working group consists of 12 researchers who are actively working in different aspects of spoken word digital audio collections. The field is at an intersection of several areas, and this is reflected by the diversity of the working group members who come from computer science, political science, libraries and archives, broadcasting, engineering, language technology and history. The working group will have two meetings. The first took place at Northwestern University, Evanston IL, USA in June 2002. The second will take place in Europe, probably in September 2002, and their may be a third meeting in early 2003.

The main objective of the working group is the production of a white paper describing the current-state-of-the-art in Spoken Word Digital Audio Collections, and providing a series of recommendations with respect to those areas in which future research should be concentrated to ensure progress.

Topics

The working group will focus on four principal areas: Storage and Standards; Structuring, Browsing and Search; Multilinguality; and Intellectual Property.

Storage and Standards

At a its most basic level, the issue of storage and standards is concerned with the best practice for capturing and storing spoken-word audio now in analog (ie, magnetic tape) format into one or more digital formats. Properly recorded and stored under ideal conditions, magnetic tape should last for generations. However, the mechanical equipment to retrieve analog audio may disappear as manufacturers abandon the format resulting from changes in market forces. How best to preserve this heritage in digital format? Specific audio storage issues include sample rate and compression. A faithful archive of digital audio should be at high sample rates (eg a CD rate of 44.1 kHz) and use lossless compression (eg the shorten software) so that the original audio can be accessed. However, this approach results in large storage requirements, and is impractical for transmission across most networks, or for storage on portable devices. Lossy audio compression (eg MP3, RealAudio, ogg vorbis) has much lower storage/transmission requirements, but with some (possibly imperceptible) loss of quality.

To make this discussion concrete, it is estimated that total European audio recordings holding are at 50 to 100 million hours - and 95% (all the tape and vinyl) needs transfer/digitisation to be preserved or to be accessible. A major issue in preservation is cost. Most archives do not understand their costs. A 'total cost of ownership' model is needed in order to make legitimate comparisons for various preservation strategies and outcomes (ie on-demand transfers vs bulk transfers; use of discrete 'carriers' eg CDs vs datatape and mass-storage robotics; creation of online access and electronic delivery; documentation to support online access and delivery). In order to meaningfully discuss costs and benefits, the value of archive material has to be assessed in terms of the population/purpose served by the archive.

An audio archive is of little use with some associated metadata. At its simplest this may include track identifiers, packaging and labelling information, and so on. More sophisticated metadata, which may be automatically obtained using rapidly developing computer speech and language technology includes word-level transcriptions, speaker information, and content information such as references to names and numbers, emotion, and intentions. A variety of (inter-related) metadata standards are under development, such as SMIL, annotation graphs and MPEG-7.

Structuring, Browsing and Search

The digital audio collections we are discussing are vast, and automatic assistance to navigating and searching them is of great importance. To take an example from the US, in March 1972, President Richard Nixon conspired to cover-up White House involvement in the Watergate break-during a 90-minute conversation with his aides. Researchers may wish to listen to the entire conversation. Teachers, students and citizens may only be interested in the five minutes or so known as the "smoking-gun" evidence. Of course, we can determine the 'jewels' in these collections and highlight them. However in most cases listeners will want to mark-up or annotate such holdings to suit their own needs. At a basic level, this may simply involving identifying time marks in audio streams, so that users may create personal audio collections. However more advanced approaches may use approaches such as query-based searching, query-by-example (or filtering), topic discovery, and speaker identity.

Many of these advanced navigation and search operations require a word-level transcript of the spoken audio: current spoken document retrieval systems operate using automatically generated transcripts of spoken audio. Even though state-of-the-art speech recognition systems still have high word error rates (25% and above for conversational speech, worse when acoustic conditions are unfavourable), these approaches have been rather successful. However important issues, such as identification of events such as non-speech actuality (eg crowd noise), or features such as emotion have barely been addressed. Richer levels of browsing involving things such as topic detection and tracking, summarization, and question answering are also desirable. Personalization is another issue, where users may wish to define their own annotations on an archive (and possibly share such annotations).

Multilinguality

One of the virtues of preserving digital audio collections is to preserve language heritage. This raises the important issue of how archives in many different languages and dialects should be dealt with. Of particular importance are multilingual speech recognition, machine translation and cross-language retrieval and browsing. The issue of machine translation is important, since a diverse set of audio sources on a particular topic may be rendered much more useful if translations are available, or can be automatically generated on demand.

Intellectual Property

Intellectual property issues sometimes serve as a roadblock to digital library efforts. Many national collections only allow on-premises access. Most if not all broadcast collections have no public access. One solution is to elect projects from materials that are either out of copyright or were never subject to copyright restrictions. However, much of our cultural heritage (eg broadcast material) is subject to copyright. With conversion of audio archives to mass storage and electronic distribution, the technology will support much wider access -- but there are rights issues that must be identified properly, and then proposals formulated for how such wider access could be made 'legal'. It may also be possible to enlist large collection holders into digital library projects by offering a low- or no-cost preservation solution for materials that may only decay while awaiting a buyer or user.

Conclusion

In outlining these possible topics, we are struck by the enormous strides we have taken in the last seven years or so to bring rich audio collections on-line. There is much to be gained by cooperating with fellow investigators to encourage the productive interchange of knowledge and to find common ground so that our efforts will reach the greatest possible audience at the lowest possible cost, greatest ease and utility, and the highest standards we can bring to bear.


Steve Renals
Last modified: Tue Sep 10 18:18:33 BST 2002