Exploring ordinal classification and data augmentation techniques for spoken language assessment
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Language
en
Pages
58
Series
Abstract
The rapid growth of language learning, especially English, has increased the demand for Computer-Assisted Language Learning (CALL) applications. One of the most challenging tasks in this area is automated assessment of spontaneous second-language (L2) speech. Current spoken language assessment (SLA) models have two common limitations: (i) they ignore the ordinal structure of the proficiency scores, and (ii) the datasets are usually imbalanced, which causes biases towards frequent categories. This thesis addresses both limitations by applying the CORN ordinal loss (Conditional Ordinal Regression for Neural Networks) to text-based SLA models and combining it with data augmentation. Using the Speak & Improve Corpus, with 300 hours of transcribed and rated L2 English speech, the scoring is reformulated as ordinal classification over proficiency categories rather than cross-entropy. Data imbalance is mitigated through sentence-level manipulations and BERT-based masked language model augmentation. Probability calibration is additionally applied on the development set to balance the prediction categories toward the imbalanced classes. Evaluated against the baseline models of the Speak & Improve Challenge, the proposed approach reduces Root Mean Square Error from 0.468 to 0.411 and increases Pearson Correlation Coefficient from 0.726 to 0.782. The resulting system placed 2nd in the 2025 Closed Track despite using a comparatively simple text-based architecture, highlighting the effectiveness of ordinal modeling for SLA.Description
Supervisor
Kurimo, MikkoThesis advisor
Grósz, TamásPhan, Nhan