A Mobile App For Practicing Finnish Pronunciation Using Wav2vec 2.0

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2023-05-15
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence (Macadamia)
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
65+3
Series
Abstract
As Finland attracts more foreign talents, there are demands for self-learning tools to help second language (L2) speakers learn Finnish with proper feedback. However, there are few resources in L2 data in Finnish, especially focusing on the beginner level for adults. Moreover, since L2 adults are mainly busy studying or working in Finland, the application must allow users to practice anytime, anywhere. This thesis aims to address the above issues by developing a mobile app for beginner Finnish L2 learners to practice their pronunciation. The app would evaluate the users' speech samples, give feedback on their pronunciation, and then provide them with instructions in the form of text, photos, audio, and videos to help them improve their pronunciation. Due to the limited resources available, this work explores the wav2vec 2.0 model's capability for the application. We trained our models with the native Finnish speakers' corpus and used them to provide pronunciation feedback on L2 samples without any L2 training data. The results show that the models can detect mispronunciation on phoneme level about 60% of the time (Recall rate) compared to a native Finnish listener. By adding regularizations, selecting training datasets, and using a smaller model size, we achieved a comparable Recall rate of approximately 63% with a slightly lower Precision of around 29%. Compared to the state-of-the-art model in Finnish Automatic Speech Recognition, the trade-off resulted in a significantly faster response time.
Description
Supervisor
Kurimo, Mikko
Thesis advisor
Voskoboinik, Ekaterina
Grosz, Tamas
Keywords
mispronunciation detection and diagnosis, mobile app, low-resource, wav2vec 2.0, end-to-end
Other note
Citation