Confidence Scoring and Speaker Adaptation in Mobile Automatic Speech Recognition Applications

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Varjokallio, Matti
dc.contributor.advisor Hämälainen, Leo
dc.contributor.author Abbas, Muhammad
dc.date.accessioned 2017-02-24T10:54:10Z
dc.date.available 2017-02-24T10:54:10Z
dc.date.issued 2017-01-23
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/24706
dc.description.abstract Generally, the user group of a language is remarkably diverse in terms of speaker-specific characteristics such as dialect and speaking style. Hence, quality of spoken content varies notably from one individual to another. This diversity causes problems for Automatic Speech Recognition systems. An Automatic Speech Recognition system should be able to assess the hypothesised results. This can be done by evaluating a confidence measure on the recognition results and comparing the resulting measure to a specified threshold. This threshold value, referred to as confidence score, informs how reliable a particular recognition result is for the given speech. A system should perform optimally irrespective of input speaker characteristics. However, most systems are inflexible and non-adaptive and thus, speaker adaptability can be improved. For achieving these purposes, a solid criterion is required to evaluate the quality of spoken content and the system should be made robust and adaptive towards new speakers as well. This thesis implements a confidence score using posterior probabilities to examine the quality of the output, based on the speech data and corpora provided by Devoca Oy. Furthermore, speaker adaptation algorithms: Maximum Likelihood Linear Regression and Maximum a Posteriori are applied on a GMM-HMM system and their results are compared. Experiments show that Maximum a Posteriori adaptation brings 2% to 25% improvement in word error rates of semi-continuous model and is recommended for use in the commercial product. The results of other methods are also reported. In addition, word graph is suggested as the method for obtaining posterior probabilities. Since it guarantees no such improvement in the results, the confidence score is proposed as an optional feature for the system. en
dc.format.extent 64 + 8
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.title Confidence Scoring and Speaker Adaptation in Mobile Automatic Speech Recognition Applications en
dc.type G2 Pro gradu, diplomityö fi
dc.contributor.school Sähkötekniikan korkeakoulu fi
dc.subject.keyword acoustic model en
dc.subject.keyword ASR en
dc.subject.keyword confidence score en
dc.subject.keyword MAP en
dc.subject.keyword MLLR en
dc.subject.keyword speaker adaptation en
dc.identifier.urn URN:NBN:fi:aalto-201702242589
dc.programme.major Signal, Speech and Language Processing fi
dc.programme.mcode ELEC3031 fi
dc.type.ontasot Master's thesis en
dc.type.ontasot Diplomityö fi
dc.contributor.supervisor Kurimo, Mikko
dc.programme CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013) fi
dc.ethesisid Aalto 7867
dc.location P1 fi


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account