Evaluation of information retrieval (IR) systems is an expensive and labour intensive task. This is because it the evaluation metrics used by the IR community require judgments to be made of the relevance of each document returned in response to each query. This task can only be performed manually.
Fortunately, the Text REtrieval Conference (TREC) organized by the US National Institute of Standards & Technology (NIST) provides the resources to do this.
Participation in TREC has enabled the THISL news retrieval system to be evaluated rigorously against rival systems.
THISL participated in the Spoken Document Retrieval (SDR) track at TREC-8.
The evaluation corpus consisted of 902 US news broadcasts (500 hours of audio data) collected over a five month period.
The evaluation task was to retrieve lists of documents (news stories) relevant to a set of 50 test queries.
The SDR track consisted of two main test conditions. For the Story Boundary Known condition, the news broadcasts were segmented manually into individual news stories with non-relevant material such as commercials being excluded. By contrast, the Story Boundary Unknown condition reflected the more realistic situation where it was the job of the retrieval system to automatically segment each broadcast into news stories.
The THISL system performed extremely competitively in both tests, particularly at the challenging Story Boundary Unknown condition where THISL was one of the few groups to attempt it. Experience gained with the BBC Demonstrator made an important contribution to this result.