Evaluating Distance Measures for Program Repair

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKoutcheme, Charlesen_US
dc.contributor.authorSarsa, Samien_US
dc.contributor.authorLeinonen, Juhoen_US
dc.contributor.authorHaaranen, Lassien_US
dc.contributor.authorHellas, Artoen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorLecturer Hellas Arto groupen
dc.contributor.groupauthorLecturer Haaranen Lassi groupen
dc.contributor.groupauthorComputer Science Lecturersen
dc.contributor.groupauthorComputer Science - Computing education research and educational technology (CER)en
dc.contributor.organizationLecturer Hellas Arto groupen_US
dc.contributor.organizationUniversity of Aucklanden_US
dc.date.accessioned2023-09-20T06:24:35Z
dc.date.available2023-09-20T06:24:35Z
dc.date.issued2023-09-10en_US
dc.description.abstractBackground and Context: Struggling with programming assignments while learning to program is a common phenomenon in programming courses around the world. Supporting struggling students is a common theme in Computing Education Research (CER), where a wide variety of support methods have been created and evaluated. An important stream of research here focuses on program repair, where methods for automatically fixing erroneous code are used for supporting students as they debug their code. Work in this area has so far assessed the performance of the methods by evaluating the closeness of the proposed fixes to the original erroneous code. The evaluations have mainly relied on the use of edit distance measures such as the sequence edit distance and there is a lack of research on which distance measure is the most appropriate. Objectives: Provide insight into measures for quantifying the distance between erroneous code written by a student and a proposed change. We conduct the evaluation in an introductory programming context, where insight into the distance measures can provide help in choosing a suitable metric that can inform which fixes should be suggested to novices. Method: A team of five experts annotated a subset of the Dublin dataset, creating solutions for over a thousand erroneous programs written by students. We evaluated how the prominent edit distance measures from the CER literature compare against measures used in Natural Language Processing (NLP) tasks for retrieving the experts’ solutions from a pool of proposed solutions. We also evaluated how the expert-generated solutions compare against the solutions proposed by common program repair algorithms. The annotated dataset and the evaluation code are published as part of the work. Findings: Our results highlight that the ROUGE score, classically used for evaluating the performance of machine summarization tasks, performs well as an evaluation and selection metric for program repair. We also highlight the practical utility of NLP metrics, which allow an easier interpretation and comparison of the performance of repair techniques when compared to the classic methods used in the CER literature. Implications: Our study highlights the variety of distance metrics used for comparing source codes. We find issues with the classically used distance measures that can be combated by using NLP metrics. Based on our findings, we recommend including NLP metrics, and in particular, the ROUGE metric, in evaluations when considering new program repair methodologies. We also suggest incorporating NLP metrics into other areas where source codes are compared, including plagiarism detection.en
dc.description.versionPeer revieweden
dc.format.extent495–507
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationKoutcheme, C, Sarsa, S, Leinonen, J, Haaranen, L & Hellas, A 2023, Evaluating Distance Measures for Program Repair . in ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1 . ACM, pp. 495–507, ACM Conference on International Computing Education Research, Chicago, Illinois, United States, 08/08/2023 . https://doi.org/10.1145/3568813.3600130en
dc.identifier.doi10.1145/3568813.3600130en_US
dc.identifier.isbn978-1-4503-9976-0
dc.identifier.otherPURE UUID: c386639f-6769-4fec-9c63-b5e8dd3ba72den_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/c386639f-6769-4fec-9c63-b5e8dd3ba72den_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85174226198&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/122216612/Evaluating_Distance_Measures_for_Program_Repair.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/123659
dc.identifier.urnURN:NBN:fi:aalto-202309206017
dc.language.isoenen
dc.relation.ispartofACM Conference on International Computing Education Researchen
dc.relation.ispartofseriesICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1en
dc.rightsopenAccessen
dc.titleEvaluating Distance Measures for Program Repairen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files