Self-Supervised Learning for Colloquial Finnish

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2024-01-22

Department

Major/Subject

Machine Learning, Data Science and Artificial Intelligence

Mcode

SCI3044

Degree programme

Master’s Programme in Computer, Communication and Information Sciences

Language

en

Pages

53

Series

Abstract

Self-supervised learning has shown great results for automatic speech recognition (ASR) systems as they utilize untranscribed speech data in their training, learning speech representations from raw audio data. In this work, the aim was to create a self-supervised speech model for colloquial Finnish and compare the results of pretraining from scratch and continued pretraining. To achieve this, three different models following the base Wav2Vec 2.0 architecture were pretrained. One model was pretrained from scratch while the other two were continuously pretrained. For the continued pretraining experiments, a monolingual Finnish model and a multilingual model were used. The experiments showed similar results for the model pretrained from scratch and the continuously pretrained monolingual Finnish model, with both yielding promising results. The model pretrained continuously from the monolingual Finnish model slightly outperformed the model pretrained from scratch, obtaining a word error rate of 28.1% and a character error rate of 7.5%.

Description

Supervisor

Kurimo, Mikko

Thesis advisor

Getman, Yaroslav

Keywords

machine learning, automatic speech recognition, self-supervised learning, Wav2Vec 2.0

Other note

Citation