Improving Accuracy in Automatic Speech Recognition Systems by Model Adaptation Techniques

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorMilhorat, Pierrick
dc.contributor.advisorBoudy, Jérôme
dc.contributor.authorMirzaei, Saeideh
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.supervisorKurimo, Mikko
dc.date.accessioned2015-06-24T11:36:06Z
dc.date.available2015-06-24T11:36:06Z
dc.date.issued2015-06-10
dc.description.abstractThe performance of the speech recognition systems to translate voice to text is still an issue in large vocabulary continuous speech recognition tasks. The major source of poor performance of such systems is the mismatch between the training conditions and the testing conditions. ASR systems have shown to perform better when trained for a specific user and application. As training models needs a large amount of data, both for acoustic model and language model, adaptation methods are used to achieve gain in recognition accuracy with the basic system, while needing much less data to adjust parameters. The acoustic and language models are adapted to make ASR systems more speaker dependent, noise robust and context dependent. In the first problem, the goal is to reduce the mismatch between the user's vocal characteristics and the generic acoustic model. This along with adaptation to the noise concern the acoustic model specifications. Moreover, we use language model adaptation techniques to change the parameters (combination probabilities) in the grammar model, hence giving more weights to the word sequences that are more relevant to the task in progress. In this work an unsupervised acoustic model adaptation has been implemented using linear VTLN and constrained MLLR. VTLN changes the speaker's formant positions and MLLR deals with model parameters in feature space. We show the overall performance increases by using either of these two methods. The relative WER reduction by using cMLLR was 9.44\%. In language model adaptation, a linear interpolation of the generic and specific models has been implemented. The perplexity of the adapted language model was relatively improved by 14.47\% compared to the generic model. The perplexity of the model approximately defines the performance of the ASR system though not being directly proportional to it. Both acoustic and language model adaptation revealed to improve the performance of the ASR system.en
dc.format.extent55
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/16853
dc.identifier.urnURN:NBN:fi:aalto-201506303498
dc.language.isoenen
dc.locationP1fi
dc.programmeTLT - Master’s Programme in Communications Engineeringfi
dc.programme.majorSignal Processingfi
dc.programme.mcodeS3013fi
dc.rights.accesslevelclosedAccess
dc.subject.keywordspeech recognitionen
dc.subject.keywordspeaker adaptationen
dc.subject.keywordlanguage model adaptationen
dc.subject.keywordhidden markov modelsen
dc.titleImproving Accuracy in Automatic Speech Recognition Systems by Model Adaptation Techniquesen
dc.typeG2 Pro gradu, diplomityöen
dc.type.okmG2 Pro gradu, diplomityö
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
dc.type.publicationmasterThesis
local.aalto.idinssi51906
local.aalto.openaccessno
Files