Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorHellas, Artoen_US
dc.contributor.authorLeinonen, Juhoen_US
dc.contributor.authorSarsa, Samien_US
dc.contributor.authorKoutcheme, Charlesen_US
dc.contributor.authorKujanpää, Liljaen_US
dc.contributor.authorSorva, Juhaen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorComputer Science Lecturersen
dc.contributor.groupauthorComputer Science - Computing education research and educational technology (CER)en
dc.contributor.groupauthorLecturer Hellas Arto groupen
dc.contributor.groupauthorComputer Science - Computing Systems (ComputingSystems)en
dc.contributor.groupauthorLecturer Sorva Juha groupen
dc.contributor.organizationLecturer Hellas Arto groupen_US
dc.contributor.organizationAalto Universityen_US
dc.contributor.organizationUniversity of Aucklanden_US
dc.date.accessioned2023-09-20T06:24:06Z
dc.date.available2023-09-20T06:24:06Z
dc.date.issued2023-09-10en_US
dc.description.abstractBackground and Context: Over the past year, large language models (LLMs) have taken the world by storm. In computing education, like in other walks of life, many opportunities and threats have emerged as a consequence. Objectives: In this article, we explore such opportunities and threats in a specific area: responding to student programmers' help requests. More specifically, we assess how good LLMs are at identifying issues in problematic code that students request help on. Method: We collected a sample of help requests and code from an online programming course. We then prompted two different LLMs (OpenAI Codex and GPT-3.5) to identify and explain the issues in the students' code and assessed the LLM-generated answers both quantitatively and qualitatively. Findings: GPT-3.5 outperforms Codex in most respects. Both LLMs frequently find at least one actual issue in each student program (GPT-3.5 in 90% of the cases). Neither LLM excels at finding all the issues (GPT-3.5 finding them 57% of the time). False positives are common (40% chance for GPT-3.5). The advice that the LLMs provide on the issues is often sensible. The LLMs perform better on issues involving program logic rather than on output formatting. Model solutions are frequently provided even when the LLM is prompted not to. LLM responses to prompts in a non-English language are only slightly worse than responses to English prompts. Implications: Our results continue to highlight the utility of LLMs in programming education. At the same time, the results highlight the unreliability of LLMs: LLMs make some of the same mistakes that students do, perhaps especially when formatting output as required by automated assessment systems. Our study informs teachers interested in using LLMs as well as future efforts to customize LLMs for the needs of programming education.en
dc.description.versionPeer revieweden
dc.format.extent93–105
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationHellas, A, Leinonen, J, Sarsa, S, Koutcheme, C, Kujanpää, L & Sorva, J 2023, Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests . in ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1 . ACM, pp. 93–105, ACM Conference on International Computing Education Research, Chicago, Illinois, United States, 08/08/2023 . https://doi.org/10.1145/3568813.3600139en
dc.identifier.doi10.1145/3568813.3600139en_US
dc.identifier.isbn978-1-4503-9976-0
dc.identifier.otherPURE UUID: 4abb7eb3-76b1-4375-9819-44735d074ecben_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/4abb7eb3-76b1-4375-9819-44735d074ecben_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85170200108&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/122216038/Exploring_the_Responses_of_Large_Language_Models_to_Beginner_Programmers_Help_Requests.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/123650
dc.identifier.urnURN:NBN:fi:aalto-202309206008
dc.language.isoenen
dc.relation.ispartofACM Conference on International Computing Education Researchen
dc.relation.ispartofseriesICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1en
dc.rightsopenAccessen
dc.titleExploring the Responses of Large Language Models to Beginner Programmers’ Help Requestsen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files