Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0

Loading...
Thumbnail Image

Access rights

openAccess

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

2022

Major/Subject

Mcode

Degree programme

Language

en

Pages

5
3543-3547

Series

Proceedings of Interspeech'22, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Abstract

Low resource speech recognition can potentially benefit a lot from exploiting a pretrained model such as wav2vec 2.0. These pretrained models have learned useful representations in an unsupervised or self-supervised task, often leveraging a very large corpus of untranscribed speech. The pretrained models can then be used in various ways. In this work we compare two approaches which exploit wav2vec 2.0: an attention-based end-to-end model (AED), where the wav2vec 2.0 model is used in the model encoder, and a hybrid hidden Markov model (HMM/DNN) speech recognition system, where the wav2vec 2.0 model is used in the acoustic model. These approaches are compared in a very difficult Northern Sámi task, as well as an easier, simulated low resource task in Finnish. We find that the wav2vec 2.0 AED models can learn a working attention mechanism, but are still outperformed by wav2vec 2.0 HMM/DNN systems. Our best wav2vec 2.0 HMM/DNN recipe on 20 hours is competitive with an HMM/DNN system trained on 1600 hours.

Description

Funding Information: We are grateful for the Academy of Finland project funding, numbers: 337073, 345790. We acknowledge the computational resources provided by the Aalto Science-IT project. Publisher Copyright: Copyright © 2022 ISCA.

Keywords

low resource, speech recognition, wav2vec 2.0

Other note

Citation

Rouhe, A, Virkkunen, A, Leinonen, J & Kurimo, M 2022, Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0 . in Proceedings of Interspeech'22 . Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association (ISCA), pp. 3543-3547, Interspeech, Incheon, Korea, Republic of, 18/09/2022 . https://doi.org/10.21437/Interspeech.2022-11318