Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0
Loading...
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2022
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
3543-3547
3543-3547
Series
Proceedings of Interspeech'22, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Abstract
Low resource speech recognition can potentially benefit a lot from exploiting a pretrained model such as wav2vec 2.0. These pretrained models have learned useful representations in an unsupervised or self-supervised task, often leveraging a very large corpus of untranscribed speech. The pretrained models can then be used in various ways. In this work we compare two approaches which exploit wav2vec 2.0: an attention-based end-to-end model (AED), where the wav2vec 2.0 model is used in the model encoder, and a hybrid hidden Markov model (HMM/DNN) speech recognition system, where the wav2vec 2.0 model is used in the acoustic model. These approaches are compared in a very difficult Northern Sámi task, as well as an easier, simulated low resource task in Finnish. We find that the wav2vec 2.0 AED models can learn a working attention mechanism, but are still outperformed by wav2vec 2.0 HMM/DNN systems. Our best wav2vec 2.0 HMM/DNN recipe on 20 hours is competitive with an HMM/DNN system trained on 1600 hours.Description
Funding Information: We are grateful for the Academy of Finland project funding, numbers: 337073, 345790. We acknowledge the computational resources provided by the Aalto Science-IT project. Publisher Copyright: Copyright © 2022 ISCA.
Keywords
low resource, speech recognition, wav2vec 2.0
Other note
Citation
Rouhe, A, Virkkunen, A, Leinonen, J & Kurimo, M 2022, Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0 . in Proceedings of Interspeech'22 . Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association (ISCA), pp. 3543-3547, Interspeech, Incheon, Korea, Republic of, 18/09/2022 . https://doi.org/10.21437/Interspeech.2022-11318