Exploring ordinal classification and data augmentation techniques for spoken language assessment

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Electrical Engineering | Master's thesis

Department

Mcode

Language

en

Pages

58

Series

Abstract

The rapid growth of language learning, especially English, has increased the demand for Computer-Assisted Language Learning (CALL) applications. One of the most challenging tasks in this area is automated assessment of spontaneous second-language (L2) speech. Current spoken language assessment (SLA) models have two common limitations: (i) they ignore the ordinal structure of the proficiency scores, and (ii) the datasets are usually imbalanced, which causes biases towards frequent categories. This thesis addresses both limitations by applying the CORN ordinal loss (Conditional Ordinal Regression for Neural Networks) to text-based SLA models and combining it with data augmentation. Using the Speak & Improve Corpus, with 300 hours of transcribed and rated L2 English speech, the scoring is reformulated as ordinal classification over proficiency categories rather than cross-entropy. Data imbalance is mitigated through sentence-level manipulations and BERT-based masked language model augmentation. Probability calibration is additionally applied on the development set to balance the prediction categories toward the imbalanced classes. Evaluated against the baseline models of the Speak & Improve Challenge, the proposed approach reduces Root Mean Square Error from 0.468 to 0.411 and increases Pearson Correlation Coefficient from 0.726 to 0.782. The resulting system placed 2nd in the 2025 Closed Track despite using a comparatively simple text-based architecture, highlighting the effectiveness of ordinal modeling for SLA.

Description

Supervisor

Kurimo, Mikko

Thesis advisor

Grósz, Tamás
Phan, Nhan

Other note

Citation