Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model
Loading...
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2023-08-20
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
Series
Proceedings of Interspeech 2023, pp. 196-200, Interspeech
Abstract
Self-supervised speech models, such as the wav2vec2, have become extremely popular in the past few years. Their main appeal is that after their pre-training on a large amount of audio, they require only a small amount of supervised, finetuning data to achieve outstanding results. Despite their immense success, very little is understood about the pre-trained models and how finetuning changes them. In this work, we take the first steps towards a better understanding of wav2vec2 systems using model interpretation tools such as visualization and latent embedding clustering. Through our analysis, we gain new insights into the abilities of the pre-trained networks and the effect that finetuning has on them. We demonstrate that the clusters learned by the pre-trained model are just as important a factor as the supervised training data distribution in determining the accuracy of the finetuned system, which could aid us in selecting the most suitable pre-trained model for the supervised data.Description
Keywords
Other note
Citation
Grosz, T, Getman, Y, Al-Ghezi, R, Rouhe, A & Kurimo, M 2023, Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model . in Proceedings of Interspeech 2023 . Interspeech, International Speech Communication Association (ISCA), pp. 196-200, Interspeech, Dublin, Ireland, 20/08/2023 . https://doi.org/10.21437/interspeech.2023-837