Current automatic speech recognition (ASR) performance is at its best one order of magnitude below human performance. A new statistical framework is needed that will incorporate knowledge sources in a combined knowledge-based and data-driven paradigm. The project is a part of a joint international effort to develop the next generation speech technology, knowledge-rich speech processing, and will focus on the speech signal processing.
The full system will be applied to information retrieval tasks on the R UNDKAST database, an audio database of Norwegian broadcast news shows. For comparison a baseline HMM-system will be implemented in addition to the knowledge-rich system.
The project will consist of three interconnected activities:
1. Front-end developmen t.
The purpose of the ASR front end is to extract all necessary information for the task of discriminating sounds, words and utterances in a manner that is maximally robust to irrelevant variations. We will investigate and develop a set of analysis and de tection algorithms based on knowledge of human speech production, perception and cognitive processing.
2. Statistical framework.
In contrast to current systems, the proposed front end will produce a stream of temporally asynchronous and statistically depe ndent observations. This will necessitate establishing a different statistical framework for bottom-up verification, evaluation and combination of hypotheses from front-end observations to sentence hypotheses
3. Spoken information retrieval.
Vast amounts of information are stored in audio and multimedia archives worldwide. Most of the spoken information is not transcribed, and thus not text-searchable. Speech recognition is a means for either automatically transcribing spoken audio, or for directly search ing audio files by keywords. In this activity, the new algorithms will be tested and benchmarked against conventional technology for the tasks of transcription and information retrieval on the RUNDKAST database.