Direct Repair Optimization : Training Small Language Models For Educational Program Repair Improves Feedback
Loading...
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
Series
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pp. 564–581
Abstract
Locally deployed Small Language Models (SLMs) offer a promising solution for providing timely and effective programming feedback to students learning to code. However, SLMs often produce misleading or hallucinated feedback, limiting their reliability in educational settings. Current approaches for improving SLM feedback rely on existing human annotations or LLM-generated feedback. This paper addresses a fundamental challenge: Can we improve SLMs’ feedback capabilities without relying on human or LLM-generated annotations? We demonstrate that training SLMs on the proxy task of program repair is sufficient to enhance their ability to generate high-quality feedback. To this end, we introduce Direct Repair Optimization (DRO), a self-supervised online reinforcement learning strategy that trains language models to reason about how to efficiently fix students’ programs.Our experiments, using DRO to fine-tune LLaMA-3.1–3B and Qwen-2.5–3B on a large-scale dataset of Python submissions from real students, show substantial improvements on downstream feedback tasks. We release our code to support further research in educational feedback and highlight promising directions for future work.Description
Keywords
Other note
Citation
Koutcheme, C, Dainese, N & Hellas, A 2025, Direct Repair Optimization : Training Small Language Models For Educational Program Repair Improves Feedback. in E Kochmar, B Alhafni, M Bexte, J Burstein, A Horbach, R Laarmann-Quante, A Tack, V Yaneva & Z Yuan (eds), Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). Association for Computational Linguistics, pp. 564–581, Workshop on Innovative Use of NLP for Building Educational Applications, Vienna, Austria, 31/07/2025. < https://aclanthology.org/2025.bea-1.41/ >