What happens in continued pre-training? Analysis of self-supervised speech models with continued pre-training for colloquial Finnish ASR

Loading...
Thumbnail Image

Access rights

openAccess
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

2024

Major/Subject

Mcode

Degree programme

Language

en

Pages

5

Series

Interspeech 2024, pp. 5043-5047, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Abstract

The advancement of self-supervised learning has enabled the rapid development of highly accurate speech recognition models, such as wav2vec 2.0, for many languages. While high-resourced languages like English benefit from purely monolingual models, other, less-resourced ones must build upon multilingual foundations. In this work, we investigate various strategies to specialize models for the colloquial Finnish language and demonstrate that continued pre-training of available multilingual models is the best solution. Furthermore, we investigate the success of the pre-training procedure by examining the learned quantized representations and show how the continued pre-training improved the discovered latent codeword groups.

Description

Publisher Copyright: © 2024 International Speech Communication Association. All rights reserved.

Keywords

ASR, continued pre-training, quantized representations, wav2vec2

Other note

Citation

Getman, Y, Grósz, T & Kurimo, M 2024, What happens in continued pre-training? Analysis of self-supervised speech models with continued pre-training for colloquial Finnish ASR . in Interspeech 2024 . Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Society for Computers and Their Applications (ISCA), pp. 5043-5047, Interspeech, Kos Island, Greece, 01/09/2024 . https://doi.org/10.21437/Interspeech.2024-476