THematic Indexing of Spoken Language
BBC News Retrieval Demonstrator
- The BBC news retrieval demonstrator contains over 1000 hours of BBC radio and TV news and current affairs material collected over a period of two years.
- The recording, decoding and indexing procedure is fully automatic.
- The demonstrator is installed on the BBC intranet and is currently being evaluated by the BBC Information & Archives department. Initial feedback has been encouraging and is being incorporated in an enhanced system.
- Recordings of BBC radio and TV news shows have been made over the last two years covering 3 hours of news output per day. The archive contains over 1000 hours of material, and this is increasing at 5 hours per day after extending coverage to current affairs programmes.
- The recorded material is decoded into text using the Abbot large vocabulary continuous speech recognition system. Decoding is performed in real time on a daily basis. The text files produced by Abbot are then fed to the thislIR indexing and retrieval system.
- The text files produced by Abbot for each news broadcast are segmented automatically into a set of documents.
- Each document in the archive is then indexed by the thislIR information retrieval system. The new index is then ready for access by users.
- The index created by thislIR has been incorporated in a web-style interface (see below).
- Top Panel: the user enters a text query about some news item of interest (a prototype spoken query interface has also been produced). The user also has the option of filtering out dates and programmes before submitting the query to thislIR.
- Middle panel: during retrieval thislIR works in a similar way to a web search engine, and produces a list of the news clips which scored highest in response to the query. Information about the programme containing the clip, its duration and transmission date are also provided.
- Bottom panel: the user can now listen to the selected clip and view the speech recognition output - complete with errors! If the clip does not contain the required material, the user can try another clip from the list or modify the original query and start again.
- The demonstrator has generated much enthusiasm within the BBC and work on its development will continue after the end of the project.
- Versions of the demonstrator tailored towards the needs of BBC Monitoring and the BBC Natural History Unit are being developed.