An Equal Data Setting for Attention-Based Encoder-Decoder and HMM/DNN Models: A Case Study in Finnish ASR

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorRouhe, Akuen_US
dc.contributor.authorVan Camp, Astriden_US
dc.contributor.authorSingh, Mittulen_US
dc.contributor.authorVan Hamme, Hugoen_US
dc.contributor.authorKurimo, Mikkoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.editorKarpov, Alexeyen_US
dc.contributor.editorPotapova, Rodmongaen_US
dc.contributor.groupauthorSpeech Recognitionen
dc.contributor.organizationKU Leuvenen_US
dc.date.accessioned2021-10-20T06:17:27Z
dc.date.available2021-10-20T06:17:27Z
dc.date.issued2021en_US
dc.description| openaire: EC/H2020/780069/EU//MeMAD
dc.description.abstractStandard end-to-end training of attention-based ASR models only uses transcribed speech. If they are compared to HMM/DNN systems, which additionally leverage a large corpus of text-only data and expert-crafted lexica, the differences in modeling cannot be disentangled from differences in data. We propose an experimental setup, where only transcribed speech is used to train both model types. To highlight the difference that text-only data can make, we use Finnish, where an expert-crafted lexicon is not needed. With 1500h equal data, we find that both ASR paradigms perform similarly, but adding text data quickly improves the HMM/DNN system. On a smaller 160h subset we find that HMM/DNN models outperform AED models.en
dc.description.versionPeer revieweden
dc.format.extent12
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationRouhe, A, Van Camp, A, Singh, M, Van Hamme, H & Kurimo, M 2021, An Equal Data Setting for Attention-Based Encoder-Decoder and HMM/DNN Models: A Case Study in Finnish ASR. in A Karpov & R Potapova (eds), Speech and Computer - 23rd International Conference, SPECOM 2021, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12997 LNAI, Springer, pp. 602-613, International Conference on Speech and Computer, Virtual, Online, 27/09/2021. https://doi.org/10.1007/978-3-030-87802-3_54en
dc.identifier.doi10.1007/978-3-030-87802-3_54en_US
dc.identifier.isbn9783030878016
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.otherPURE UUID: 53a7261a-503e-4a5a-9683-849902529a59en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/53a7261a-503e-4a5a-9683-849902529a59en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85116359931&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/74623755/E2E_vs_HMM_SPE_Com.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/110467
dc.identifier.urnURN:NBN:fi:aalto-202110209650
dc.language.isoenen
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMADen_US
dc.relation.ispartofInternational Conference on Speech and Computeren
dc.relation.ispartofseriesSpeech and Computer - 23rd International Conference, SPECOM 2021, Proceedingsen
dc.relation.ispartofseriespp. 602-613en
dc.relation.ispartofseriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 12997 LNAIen
dc.rightsopenAccessen
dc.subject.keywordAttention-based Encoder-Decoderen_US
dc.subject.keywordEqual dataen_US
dc.subject.keywordHMM/DNNen_US
dc.titleAn Equal Data Setting for Attention-Based Encoder-Decoder and HMM/DNN Models: A Case Study in Finnish ASRen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion

Files