Continuous Unsupervised Topic Adaptation for Morph-based Speech Recognition

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2017-02-17
Degree programme
99 + app. 83
Aalto University publication series DOCTORAL DISSERTATIONS, 10/2017
Modern automatic speech recognition (ASR) systems are speaker independent and designed to recognize continuous large vocabulary speech. The key components of an ASR system are the acoustic model, language model, lexicon and decoder. A constant challenge for an ASR system over time, is how to adapt to changing topics and the introduction of new names and words. Enabling continuous topic adaptation for ASR systems requires finding new relevant text sources for adapting the language model and identifying words which need new and modified pronunciation rules. In this thesis, unsupervised methods that enable continuous topic adaptation for a Finnish morph-based ASR system are studied. Based on first-pass ASR output, topic and time relevant text data is retrieved from a collection of pre-indexed Web texts. Adapting the background language model with the best matching texts improves recognition accuracy. The recognition accuracy of foreign names and acronyms, one of the focus areas in this thesis, is also improved. Further improvement is achieved by identifying foreign names and acronyms in the retrieved texts, and generating adapted pronunciation rules for them. In statistical morph-based ASR, words are sometimes oversegmented. To enable a more reliable and easier mapping of adapted pronunciation rules, oversegmented foreign names and acronyms are restored back into their base forms. Morpheme restoration also improves recognition accuracy slightly. User feedback is also explored in this thesis for enabling ongoing lexicon adaptation of ASR systems. Based on user corrections of ASR output, optimal pronunciation rules for mis-recognized words are recovered by using forced alignment and Viterbi decoding. A collection of recovered pronunciation rules can be used for the recognition of new speech data. Experiments showed some minor improvements in the recognition of foreign names using user feedback based lexicon adaptation.
Supervising professor
Kurimo, Mikko, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
morph-based speech recognition, retrieval-based language model adaptation, lexicon adaptation, user feedback based adaptation, foreign proper name detection, morph restoration
Other note
  • [Publication 1]: André Mansikkaniemi and Mikko Kurimo. Unsupervised Vocabulary Adaptation for Morph-based Language Models. In NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pages 37–40, Montréal, Canada, June 2012.
  • [Publication 2]: André Mansikkaniemi and Mikko Kurimo. Adaptation of morph-based Speech Recognition for Foreign Entity Names. In Fifth International Conference Human Language Technologies - The Baltic Perspective, pages 129–137, Tartu, Estonia, October 2012.
    DOI: 10.3233/978-1-61499-133-5-129 View at publisher
  • [Publication 3]: André Mansikkaniemi and Mikko Kurimo. Unsupervised Topic Adaptation for Morph-based Speech Recognition. In Interspeech 2013, pages 2693–2697, Lyon, France, September 2013.
  • [Publication 4]: André Mansikkaniemi and Mikko Kurimo. Adaptation of Morph-Based Speech Recognition for Foreign Names and Acronyms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, pages 941–950, vol. 23, no. 5, May 2015.
    DOI: 10.1109/TASLP.2015.2414818 View at publisher
  • [Publication 5]: André Mansikkaniemi and Mikko Kurimo. Unsupervised and User Feedback Based Lexicon Adaptation for Foreign Names and Acronyms. In Third International Conference on Statistical Language and Speech Processing, SLSP 2015, Volume 9449, pp. 197-206, Budapest, Hungary, November 2015.
    DOI: 10.1007/978-3-319-25789-1_19 View at publisher
  • [Publication 6]: Mikko Kurimo and Seppo Enarvi and Ottokaar Tilk and Matti Varjokallio and André Mansikkaniemi and Tanel Alumäe. Modeling Underresourced Languages for Speech Recognition. Language Resources and Evaluation, pages 1-27, February 2016.
    DOI: 10.1007/s10579-016-9336-9 View at publisher