Leveraging Uncertainty for Finnish L2 Speech Scoring with LLMs

Loading...
Thumbnail Image

Access rights

openAccess
CC BY-NC-ND
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

9

Series

The Workshop on Automatic Assessment of Atypical Speech (AAAS-2025). Proceedings of the Workshop

Abstract

Automatic speech assessment (ASA) supports learning but often requires extensive data, which is scarce for languages with fewer learners. Recent research shows that Large Language Models (LLMs) can generalize to new tasks with minimal training data using in-context learning (ICL). We find LLMs effective in estimating the proficiency of individuals learning Finnish as a second language (L2) when given a few examples of human expert grading. The proficiency grades produced by the model, when evaluating verbatim transcripts from an automatic speech recognition (ASR) system, agree with human ratings at a level comparable to the agreement between the human raters. Our experiments reveal that adding more grading demonstrations in ICL improves the model’s accuracy but, counterintuitively, increases its uncertainty when selecting an appropriate proficiency level. We show that this uncertainty can be leveraged further by creating soft labels: instead of assigning the most probable level (hard label), we aggregate the model’s confidence across all possible levels, resulting in noticeable performance improvements. Further analysis reveals that the sources of model uncertainty differ across ICL settings. In zero-shot, uncertainty stems from intrinsic response properties, such as proficiency level. In few-shot, it is driven by the relationship between the sample and the demonstrations.

Description

Keywords

Other note

Citation

Voskoboinik, E, Phan, N, Grósz, T & Kurimo, M 2025, Leveraging Uncertainty for Finnish L2 Speech Scoring with LLMs. in The Workshop on Automatic Assessment of Atypical Speech (AAAS-2025). Proceedings of the Workshop. University of Tartu Library, Workshop on Automatic Assessment of Atypical Speech, Tallinn, Estonia, 05/03/2025. < https://hdl.handle.net/10062/107137 >