Finnish language modeling and ASR with Deep Transformer Models
Loading...
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Author
Date
2020-08-18
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
74
Series
Abstract
Transformers have taken the centre stage for most NLP applications after LSTM’s were established as advanced approaches to sequence modeling and transduction problems such as language modeling and speech recognition. Recently, BERT and Transformer-XL based architectures demonstrated the efficacy of Transformer models as a language model pre-trained on large corpora. It is important to understand that these strides have mostly been made with the English language for which abundant resources already exist. It is also morphologically a simpler language in comparisons to an agglutinative language like Finnish. An important question then arises on the usage and performances of these Transformer models for Finnish. In this thesis, we take an important Natural Language Processing task like Automatic Speech Recognition and compare different architectures mentioned above for Finnish. First, we perform an intrinsic evaluation task like language modeling to understand how well suited the Transformer architectures are for Finnish. Next for an extrinsic evaluation, we employ these models as a language model in an ASR system and analyze their performance. Our Transformer models performed exceptionally well both in the language modeling and the ASR task. Transformer-XL achieves 29% (absolute) better perplexity and 3% (absolute) better WERin ASR than our previous best LSTM-based approach. We also introduce a novel three-pass decoding scheme which improves the ASR performance by 8%. In this thesis, we also achieve the following (i) to formulate an alpha smoothing framework to use the non-autoregressive BERT language model for an ASR task, and (ii) to explore sub-word units with Transformer-XL for an agglutinative language like Finnish.Description
Supervisor
Kurimo, MikkoThesis advisor
Rouhe, AkuGrönroos, Stig-Arne
Keywords
language modeling, automatic speech recognition, transformer, BERT, transformer-xl