New data, benchmark and baseline for L2 speaking assessment for low-resource languages

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKurimo, Mikkoen_US
dc.contributor.authorGetman, Yaroslaven_US
dc.contributor.authorVoskoboinik, Ekaterinaen_US
dc.contributor.authorAl-Ghezi, Ragheben_US
dc.contributor.authorKallio, Heinien_US
dc.contributor.authorKuronen, Mikkoen_US
dc.contributor.authorvon Zansen, Annaen_US
dc.contributor.authorHilden, Railien_US
dc.contributor.authorKronholm, Sirkkuen_US
dc.contributor.authorHuhta, Arien_US
dc.contributor.authorLindén, Kristeren_US
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.groupauthorSpeech Recognitionen
dc.contributor.organizationSpeech Recognitionen_US
dc.contributor.organizationUniversity of Jyväskyläen_US
dc.contributor.organizationUniversity of Helsinkien_US
dc.date.accessioned2023-11-29T09:50:27Z
dc.date.available2023-11-29T09:50:27Z
dc.date.issued2023en_US
dc.descriptionWorkshop on Speech and Language Technology in Education : SLaTE ; Conference date: 18-08-2023 Through 20-08-2023
dc.description.abstractThe development of large multilingual speech models provides the possibility to construct high-quality speech technology even for low-resource languages. In this paper, we present the speech data of L2 learners of Finnish and Finland Swedish that we have recently collected for training and evaluation of automatic speech recognition (ASR) and speaking assessment (ASA). It includes over 4000 recordings by over 300 students per language in short read-aloud and free-form tasks. The recordings have been manually transcribed and assessed for pronunciation, fluency, range, accuracy, task achievement, and a holistic proficiency level. We present also an ASR and ASA benchmarking setup we have constructed using this data and include results from our baseline systems built by fine-tuning self-supervised multilingual model for the target language. In addition to benchmarking, our baseline system can be used by L2 students and teachers for online self-training and evaluation of oral proficiency.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.extent166-170
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationKurimo, M, Getman, Y, Voskoboinik, E, Al-Ghezi, R, Kallio, H, Kuronen, M, von Zansen, A, Hilden, R, Kronholm, S, Huhta, A & Lindén, K 2023, New data, benchmark and baseline for L2 speaking assessment for low-resource languages . in Proceedings of 9th Workshop on Speech and Language Technology in Education (SLaTE) . International Speech Communication Association (ISCA), pp. 166-170, Workshop on Speech and Language Technology in Education, Dublin, Ireland, 18/08/2023 . https://doi.org/10.21437/SLaTE.2023-32en
dc.identifier.doi10.21437/SLaTE.2023-32en_US
dc.identifier.issn2311-4975
dc.identifier.otherPURE UUID: cae10a3b-fefc-43cc-a539-3bb08c8e8d04en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/cae10a3b-fefc-43cc-a539-3bb08c8e8d04en_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/128663755/kurimo23_slate.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/124674
dc.identifier.urnURN:NBN:fi:aalto-202311297013
dc.language.isoenen
dc.publisherISCA - International Speech Communication Association
dc.relation.ispartofWorkshop on Speech and Language Technology in Educationen
dc.relation.ispartofseriesProceedings of 9th Workshop on Speech and Language Technology in Education (SLaTE)en
dc.relation.ispartofseriesISCA International Workshop on Speech and Language Technology in Educationen
dc.rightsopenAccessen
dc.subject.keywordEducational sciencesen_US
dc.subject.keywordsuullinen kielitaitoen_US
dc.subject.keywordkielitaidon arviointien_US
dc.subject.keywordoral language skillsen_US
dc.subject.keywordlanguage assessmenten_US
dc.subject.keywordElectronicen_US
dc.subject.keywordautomation and communications engineeringen_US
dc.subject.keywordelectronicsen_US
dc.subject.keywordpuheentunnistusen_US
dc.subject.keywordautomaattinen puheen arviointien_US
dc.subject.keywordautomatic speech recognitionen_US
dc.subject.keywordautomatic speaking assessmenten_US
dc.titleNew data, benchmark and baseline for L2 speaking assessment for low-resource languagesen
dc.typeConference article in proceedingsfi
dc.type.versionpublishedVersion
Files