Semi-supervised learning using pseudo-labels: A case study in Northern Sámi ASR
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Language
en
Pages
50
Series
Abstract
While Self-Supervised Learning has advanced Automatic Speech Recognition (ASR) for low-resource languages, limited labeled data remains a bottleneck, often preventing reasonable downstream performance. In line with this requirement, Semi-Supervised Learning approaches have gained prominence as providing a means to data-efficient methods that can be helpful, even with limited supervised data. This work investigates the usefulness of semi-supervised learning, specifically Pseudo-Labeling with self-supervised speech models for Northern Sámi, an under-resourced agglutinative language. Within a teacher–student framework, pseudo-labels are incorporated using two strategies: (a) filtering them through WER-based agreement between the models, and (b) using all pseudo-labels generated by a teacher model. Results show that pseudo-labels from the larger teacher foundation model can improve the downstream performance of a smaller one without additional labeled data, yielding 33.30–3.84% relative character error rate reduction (CERR) across three out-of-distribution test sets. The smaller student model is evaluated in a supervised setting (20 hours of labeled data) and semi-supervised settings with 20, 26 and 75 hours of pseudo-labeled data. Notably, the student model can benefit as much, or even more, from pseudo-labels as from labeled data. Further gains can be achieved when both annotated and pseudo-labels are combined in a two-stage fine-tuning process, rather than mixing all data in a single training phase.Description
Supervisor
Kurimo, MikkoThesis advisor
Grósz, TamásGetman, Yaroslav