A Mobile App For Practicing Finnish Pronunciation Using Wav2vec 2.0
Perustieteiden korkeakoulu | Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Machine Learning, Data Science and Artificial Intelligence (Macadamia)
Master’s Programme in Computer, Communication and Information Sciences
AbstractAs Finland attracts more foreign talents, there are demands for self-learning tools to help second language (L2) speakers learn Finnish with proper feedback. However, there are few resources in L2 data in Finnish, especially focusing on the beginner level for adults. Moreover, since L2 adults are mainly busy studying or working in Finland, the application must allow users to practice anytime, anywhere. This thesis aims to address the above issues by developing a mobile app for beginner Finnish L2 learners to practice their pronunciation. The app would evaluate the users' speech samples, give feedback on their pronunciation, and then provide them with instructions in the form of text, photos, audio, and videos to help them improve their pronunciation. Due to the limited resources available, this work explores the wav2vec 2.0 model's capability for the application. We trained our models with the native Finnish speakers' corpus and used them to provide pronunciation feedback on L2 samples without any L2 training data. The results show that the models can detect mispronunciation on phoneme level about 60% of the time (Recall rate) compared to a native Finnish listener. By adding regularizations, selecting training datasets, and using a smaller model size, we achieved a comparable Recall rate of approximately 63% with a slightly lower Precision of around 29%. Compared to the state-of-the-art model in Finnish Automatic Speech Recognition, the trade-off resulted in a significantly faster response time.
Thesis advisorVoskoboinik, Ekaterina
mispronunciation detection and diagnosis, mobile app, low-resource, wav2vec 2.0, end-to-end