Evaluating Language Models for Generating and Judging Programming Feedback
Loading...
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Koutcheme, Charles
Dainese, Nicola
Sarsa, Sami
Hellas, Arto
Leinonen, Juho
Ashraf, Syed
Denny, Paul
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
7
Series
SIGCSE TS 2025 - Proceedings of the 56th ACM Technical Symposium on Computer Science Education, Volume 1, pp. 624-630
Abstract
The emergence of large language models (LLMs) has transformed research and practice across a wide range of domains. Within the computing education research (CER) domain, LLMs have garnered significant attention, particularly in the context of learning programming. Much of the work on LLMs in CER, however, has focused on applying and evaluating proprietary models. In this article, we evaluate the efficiency of open-source LLMs in generating high-quality feedback for programming assignments and judging the quality of programming feedback, contrasting the results with proprietary models. Our evaluations on a dataset of students’ submissions to introductory Python programming exercises suggest that state-of-the-art open-source LLMs are nearly on par with proprietary models in both generating and assessing programming feedback. Additionally, we demonstrate the efficiency of smaller LLMs in these tasks and highlight the wide range of LLMs accessible, even for free, to educators and practitioners.Description
Publisher Copyright: © 2025 Copyright held by the owner/author(s).
Other note
Citation
Koutcheme, C, Dainese, N, Sarsa, S, Hellas, A, Leinonen, J, Ashraf, S & Denny, P 2025, Evaluating Language Models for Generating and Judging Programming Feedback. in SIGCSE TS 2025 - Proceedings of the 56th ACM Technical Symposium on Computer Science Education. vol. 1, ACM, pp. 624-630, ACM Technical Symposium on Computer Science Education, Pittsburgh, Pennsylvania, United States, 26/02/2025. https://doi.org/10.1145/3641554.3701791