Finnish ASR with deep transformer models

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorJain, Abhilashen_US
dc.contributor.authorRouhe, Akuen_US
dc.contributor.authorGrönroos, Stig Arneen_US
dc.contributor.authorKurimo, Mikkoen_US
dc.contributor.departmentSpeech Recognitionen_US
dc.contributor.departmentDept Signal Process and Acousten_US
dc.date.accessioned2021-01-25T10:18:17Z
dc.date.available2021-01-25T10:18:17Z
dc.date.issued2020en_US
dc.description| openaire: EC/H2020/780069/EU//MeMAD
dc.description.abstractRecently, BERT and Transformer-XL based architectures have achieved strong results in a range of NLP applications. In this paper, we explore Transformer architectures-BERT and Transformer-XL-as a language model for a Finnish ASR task with different rescoring schemes. We achieve strong results in both an intrinsic and an extrinsic task with Transformer-XL. Achieving 29% better perplexity and 3% better WER than our previous best LSTM-based approach. We also introduce a novel three-pass decoding scheme which improves the ASR performance by 8%. To the best of our knowledge, this is also the first work (i) to formulate an alpha smoothing framework to use the non-autoregressive BERT language model for an ASR task, and (ii) to explore sub-word units with Transformer-XL for an agglutinative language like Finnish.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.extent3630-3634
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationJain , A , Rouhe , A , Grönroos , S A & Kurimo , M 2020 , Finnish ASR with deep transformer models . in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH . vol. 2020-October , Interspeech , International Speech Communication Association (ISCA) , pp. 3630-3634 , Interspeech , Shanghai , China , 25/10/2020 . https://doi.org/10.21437/Interspeech.2020-1784en
dc.identifier.doi10.21437/Interspeech.2020-1784en_US
dc.identifier.issn2308-457X
dc.identifier.otherPURE UUID: d9c019ce-f31a-42f4-9cb8-e519b2b39c32en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/d9c019ce-f31a-42f4-9cb8-e519b2b39c32en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85098184485&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/55066357/Finnish_ASR_with_Deep_Transformer_Models.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/102267
dc.identifier.urnURN:NBN:fi:aalto-202101251577
dc.language.isoenen
dc.publisherInternational Speech Communication Association
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMADen_US
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECHen
dc.relation.ispartofseriesVolume 2020-Octoberen
dc.relation.ispartofseriesInterspeechen
dc.rightsopenAccessen
dc.subject.keywordBERTen_US
dc.subject.keywordLanguage modelingen_US
dc.subject.keywordSpeech recognitionen_US
dc.subject.keywordTransformer-XLen_US
dc.subject.keywordTransformersen_US
dc.titleFinnish ASR with deep transformer modelsen
dc.typeConference article in proceedingsfi
dc.type.versionpublishedVersion
Files