Learning Centre

Improving Accuracy in Automatic Speech Recognition Systems by Model Adaptation Techniques

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Milhorat, Pierrick
dc.contributor.advisor Boudy, Jérôme
dc.contributor.author Mirzaei, Saeideh
dc.date.accessioned 2015-06-24T11:36:06Z
dc.date.available 2015-06-24T11:36:06Z
dc.date.issued 2015-06-10
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/16853
dc.description.abstract The performance of the speech recognition systems to translate voice to text is still an issue in large vocabulary continuous speech recognition tasks. The major source of poor performance of such systems is the mismatch between the training conditions and the testing conditions. ASR systems have shown to perform better when trained for a specific user and application. As training models needs a large amount of data, both for acoustic model and language model, adaptation methods are used to achieve gain in recognition accuracy with the basic system, while needing much less data to adjust parameters. The acoustic and language models are adapted to make ASR systems more speaker dependent, noise robust and context dependent. In the first problem, the goal is to reduce the mismatch between the user's vocal characteristics and the generic acoustic model. This along with adaptation to the noise concern the acoustic model specifications. Moreover, we use language model adaptation techniques to change the parameters (combination probabilities) in the grammar model, hence giving more weights to the word sequences that are more relevant to the task in progress. In this work an unsupervised acoustic model adaptation has been implemented using linear VTLN and constrained MLLR. VTLN changes the speaker's formant positions and MLLR deals with model parameters in feature space. We show the overall performance increases by using either of these two methods. The relative WER reduction by using cMLLR was 9.44\%. In language model adaptation, a linear interpolation of the generic and specific models has been implemented. The perplexity of the adapted language model was relatively improved by 14.47\% compared to the generic model. The perplexity of the model approximately defines the performance of the ASR system though not being directly proportional to it. Both acoustic and language model adaptation revealed to improve the performance of the ASR system. en
dc.format.extent 55
dc.language.iso en en
dc.title Improving Accuracy in Automatic Speech Recognition Systems by Model Adaptation Techniques en
dc.type G2 Pro gradu, diplomityö en
dc.contributor.school Sähkötekniikan korkeakoulu fi
dc.subject.keyword speech recognition en
dc.subject.keyword speaker adaptation en
dc.subject.keyword language model adaptation en
dc.subject.keyword hidden markov models en
dc.identifier.urn URN:NBN:fi:aalto-201506303498
dc.programme.major Signal Processing fi
dc.programme.mcode S3013 fi
dc.type.ontasot Master's thesis en
dc.type.ontasot Diplomityö fi
dc.contributor.supervisor Kurimo, Mikko
dc.programme TLT - Master’s Programme in Communications Engineering fi
dc.location P1 fi
local.aalto.openaccess no
dc.rights.accesslevel closedAccess
local.aalto.idinssi 51906
dc.type.publication masterThesis
dc.type.okm G2 Pro gradu, diplomityö


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics