Augmentation, Oversampling and Curriculum Learning for Small Imbalanced Speech Data

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Authors

Department

Mcode

ELEC3049

Language

en

Pages

81+10

Series

Abstract

Automatic Speech Recognition (ASR) systems have seen remarkable breakthrough in recent years, which has in turn fostered the development of ASR-supported Automatic Speaking Assessment (ASA) systems. However, their advancement is engaged with two main challenges: data scarcity and data imbalance, especially in languages such as Finnish and Finland Swedish. This thesis aims to explore methods that alleviate these two challenges when training ASR and ASA systems for second language (L2) speakers. These systems could be found in applications such as language learning apps and language proficiency tests. Training such ASR systems requires transcribed L2 speech data, which is scarce in most languages. Additionally, proficiency scores are required to train ASA systems, but very expensive to obtain. Thus, it is important to maximise the utilisation of existing datasets. This study works with a L2 Finnish dataset and a L2 Finland Swedish dataset, both are small (approx. 14 hours or less) and imbalanced. In particular, intermediate proficiency levels are well-represented in the datasets, while beginner- and advanced-levels have only very few samples. To solve these two problems, four methods were explored: 1) audio augmentation, 2) augmentation using Text-To-Speech (TTS) synthesisers, 3) oversampling with augmentation, and 4) class-wise curriculum learning. To improve ASR performance on L2 speech, audio augmentation is shown to be an effective method, while augmentation with TTS synthesiser has positive impact mainly for speech of lower proficiency. For ASA training, audio augmentation alone does not yield significant improvement, while its combination with oversampling leads to the best results. Lastly, class-wise curriculum learning is shown to be less effective than other methods in our experiments.

Description

Supervisor

Kurimo, Mikko

Thesis advisor

Voskoboinik, Ekaterina
Al-Ghezi, Ragheb

Other note

Citation