Confidence Scoring and Speaker Adaptation in Mobile Automatic Speech Recognition Applications
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2017-01-23
Department
Major/Subject
Signal, Speech and Language Processing
Mcode
ELEC3031
Degree programme
CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)
Language
en
Pages
64 + 8
Series
Abstract
Generally, the user group of a language is remarkably diverse in terms of speaker-specific characteristics such as dialect and speaking style. Hence, quality of spoken content varies notably from one individual to another. This diversity causes problems for Automatic Speech Recognition systems. An Automatic Speech Recognition system should be able to assess the hypothesised results. This can be done by evaluating a confidence measure on the recognition results and comparing the resulting measure to a specified threshold. This threshold value, referred to as confidence score, informs how reliable a particular recognition result is for the given speech. A system should perform optimally irrespective of input speaker characteristics. However, most systems are inflexible and non-adaptive and thus, speaker adaptability can be improved. For achieving these purposes, a solid criterion is required to evaluate the quality of spoken content and the system should be made robust and adaptive towards new speakers as well. This thesis implements a confidence score using posterior probabilities to examine the quality of the output, based on the speech data and corpora provided by Devoca Oy. Furthermore, speaker adaptation algorithms: Maximum Likelihood Linear Regression and Maximum a Posteriori are applied on a GMM-HMM system and their results are compared. Experiments show that Maximum a Posteriori adaptation brings 2% to 25% improvement in word error rates of semi-continuous model and is recommended for use in the commercial product. The results of other methods are also reported. In addition, word graph is suggested as the method for obtaining posterior probabilities. Since it guarantees no such improvement in the results, the confidence score is proposed as an optional feature for the system.Description
Supervisor
Kurimo, MikkoThesis advisor
Varjokallio, MattiHämälainen, Leo
Keywords
acoustic model, ASR, confidence score, MAP, MLLR, speaker adaptation