Complementary repair: Enhancing small open-source large language models for repairing introductory student code with prompt diversity

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorYang, Jinglin
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorHellas, Arto
dc.date.accessioned2025-05-20T17:01:22Z
dc.date.available2025-05-20T17:01:22Z
dc.date.issued2025-04-28
dc.description.abstractAutomated Program Repair (APR) could enhance introductory programming education by repairing errors in student code efficiently. Beyond simply providing solutions, APR can offer partial repairs as hints, aid instructors in identifying errors, and serve as a basis for generating automated feedback. Moreover, repaired student code can offer a more personalized and effective reference for post-submission learning. Recent advancements in Large Language Models (LLMs) have demonstrated impressive capabilities in various tasks, including code repair. However, the effectiveness of LLM-based repair is highly dependent on the design of prompts, which can significantly influence the quality of the generated solutions. Prior works have generally focused on improving prompt design, such as selecting and incorporating structurally similar reference code into the prompt, or application of advanced prompting strategies such as Chain-of-Though (CoT) to increase repair accuracy. However, few studies have explored how to leverage the variability in prompt design to enhance overall repair accuracy for buggy codes. To address this limitation, we propose ComplementaryRepair, a conversation-based APR framework that leverages diverse prompts and open-source LLMs trained on code, such as Deepseek-Coder, to improve repair accuracy and computational efficiency. We demonstrate that combining different components (e.g., error messages, test case results, reference code) into the prompt triggers divergent repairing behaviors in LLMs, leading to complementary repair. That is, for a given set of buggy code submissions, one prompt configuration may be more likely to generate a correct repair for a subset of the errors, while a different prompt configuration may be more effective for another subset. We evaluate the performance of ComplementaryRepair on two datasets from the National University of Singapore and the University of Dublin. Results indicate that, with our settings, ComplementaryRepair successfully repairs up to 98\% of buggy submissions in the Singapore dataset and up to 95\% in the Dublin dataset, requiring at most 18 repair attempts per buggy code. These findings demonstrate the potential of prompt diversity in optimizing APR for introductory programming assignments.en
dc.format.extent54
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/135603
dc.identifier.urnURN:NBN:fi:aalto-202505203868
dc.language.isoenen
dc.programmeMaster's Programme in Computer, Communication and Information Sciencesen
dc.programme.majorComputer Scienceen
dc.subject.keywordlarge language modelsen
dc.subject.keywordprompt engineeringen
dc.subject.keywordautomated program repairen
dc.subject.keywordprogramming assignmenten
dc.subject.keywordcomputer science educationen
dc.subject.keywordLLM quantizationen
dc.titleComplementary repair: Enhancing small open-source large language models for repairing introductory student code with prompt diversityen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
master_Yang_Jinglin_2025.pdf
Size:
1.53 MB
Format:
Adobe Portable Document Format