ASA in listeners and machines

Issues in evaluating CASA systems

wide range of goals
- polyphonic music transcription: adherence to score
- robust ASR: word error rate as a fn of target to noise ratio
no formal evaluation framework (cf DARPA, MUC), but some 'standard' corpora
- double vowel set
- Cooke (1991) 100 mixture set
as psychoacoustic models or as engineering artefacts?
- traditionally the former, but difficult to take beyond double vowels in practice
- is CASA
  - the computational implementation of (convincing, but unproven) ASA; or
  - any computational search for objects in the auditory scene?
as speech enhancement preprocessors
- seductive, but problematic
lately: modify recogniser to cope (missing data ASR)