New data, benchmark and baseline for L2 speaking assessment for low-resource languages

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Conference article in proceedings
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Date
2023
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
166-170
Series
Proceedings of 9th Workshop on Speech and Language Technology in Education (SLaTE), ISCA International Workshop on Speech and Language Technology in Education
Abstract
The development of large multilingual speech models provides the possibility to construct high-quality speech technology even for low-resource languages. In this paper, we present the speech data of L2 learners of Finnish and Finland Swedish that we have recently collected for training and evaluation of automatic speech recognition (ASR) and speaking assessment (ASA). It includes over 4000 recordings by over 300 students per language in short read-aloud and free-form tasks. The recordings have been manually transcribed and assessed for pronunciation, fluency, range, accuracy, task achievement, and a holistic proficiency level. We present also an ASR and ASA benchmarking setup we have constructed using this data and include results from our baseline systems built by fine-tuning self-supervised multilingual model for the target language. In addition to benchmarking, our baseline system can be used by L2 students and teachers for online self-training and evaluation of oral proficiency.
Description
Workshop on Speech and Language Technology in Education : SLaTE ; Conference date: 18-08-2023 Through 20-08-2023
Keywords
Educational sciences, suullinen kielitaito, kielitaidon arviointi, oral language skills, language assessment, Electronic, automation and communications engineering, electronics, puheentunnistus, automaattinen puheen arviointi, automatic speech recognition, automatic speaking assessment
Other note
Citation
Kurimo , M , Getman , Y , Voskoboinik , E , Al-Ghezi , R , Kallio , H , Kuronen , M , von Zansen , A , Hilden , R , Kronholm , S , Huhta , A & Lindén , K 2023 , New data, benchmark and baseline for L2 speaking assessment for low-resource languages . in Proceedings of 9th Workshop on Speech and Language Technology in Education (SLaTE) . ISCA International Workshop on Speech and Language Technology in Education , International Speech Communication Association (ISCA) , pp. 166-170 , Workshop on Speech and Language Technology in Education , Dublin , Ireland , 18/08/2023 . https://doi.org/10.21437/SLaTE.2023-32