Improving Accuracy in Automatic Speech Recognition Systems by Model Adaptation Techniques

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu | Master's thesis
Ask about the availability of the thesis by sending email to the Aalto University Learning Centre
Signal Processing
Degree programme
TLT - Master’s Programme in Communications Engineering
The performance of the speech recognition systems to translate voice to text is still an issue in large vocabulary continuous speech recognition tasks. The major source of poor performance of such systems is the mismatch between the training conditions and the testing conditions. ASR systems have shown to perform better when trained for a specific user and application. As training models needs a large amount of data, both for acoustic model and language model, adaptation methods are used to achieve gain in recognition accuracy with the basic system, while needing much less data to adjust parameters. The acoustic and language models are adapted to make ASR systems more speaker dependent, noise robust and context dependent. In the first problem, the goal is to reduce the mismatch between the user's vocal characteristics and the generic acoustic model. This along with adaptation to the noise concern the acoustic model specifications. Moreover, we use language model adaptation techniques to change the parameters (combination probabilities) in the grammar model, hence giving more weights to the word sequences that are more relevant to the task in progress. In this work an unsupervised acoustic model adaptation has been implemented using linear VTLN and constrained MLLR. VTLN changes the speaker's formant positions and MLLR deals with model parameters in feature space. We show the overall performance increases by using either of these two methods. The relative WER reduction by using cMLLR was 9.44\%. In language model adaptation, a linear interpolation of the generic and specific models has been implemented. The perplexity of the adapted language model was relatively improved by 14.47\% compared to the generic model. The perplexity of the model approximately defines the performance of the ASR system though not being directly proportional to it. Both acoustic and language model adaptation revealed to improve the performance of the ASR system.
Kurimo, Mikko
Thesis advisor
Milhorat, Pierrick
Boudy, Jérôme
speech recognition, speaker adaptation, language model adaptation, hidden markov models
Other note