Complementary repair: Enhancing small open-source large language models for repairing introductory student code with prompt diversity

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis

Department

Major/Subject

Mcode

Language

en

Pages

54

Series

Abstract

Automated Program Repair (APR) could enhance introductory programming education by repairing errors in student code efficiently. Beyond simply providing solutions, APR can offer partial repairs as hints, aid instructors in identifying errors, and serve as a basis for generating automated feedback. Moreover, repaired student code can offer a more personalized and effective reference for post-submission learning. Recent advancements in Large Language Models (LLMs) have demonstrated impressive capabilities in various tasks, including code repair. However, the effectiveness of LLM-based repair is highly dependent on the design of prompts, which can significantly influence the quality of the generated solutions. Prior works have generally focused on improving prompt design, such as selecting and incorporating structurally similar reference code into the prompt, or application of advanced prompting strategies such as Chain-of-Though (CoT) to increase repair accuracy. However, few studies have explored how to leverage the variability in prompt design to enhance overall repair accuracy for buggy codes. To address this limitation, we propose ComplementaryRepair, a conversation-based APR framework that leverages diverse prompts and open-source LLMs trained on code, such as Deepseek-Coder, to improve repair accuracy and computational efficiency. We demonstrate that combining different components (e.g., error messages, test case results, reference code) into the prompt triggers divergent repairing behaviors in LLMs, leading to complementary repair. That is, for a given set of buggy code submissions, one prompt configuration may be more likely to generate a correct repair for a subset of the errors, while a different prompt configuration may be more effective for another subset. We evaluate the performance of ComplementaryRepair on two datasets from the National University of Singapore and the University of Dublin. Results indicate that, with our settings, ComplementaryRepair successfully repairs up to 98\% of buggy submissions in the Singapore dataset and up to 95\% in the Dublin dataset, requiring at most 18 repair attempts per buggy code. These findings demonstrate the potential of prompt diversity in optimizing APR for introductory programming assignments.

Description

Supervisor

Hellas, Arto

Other note

Citation