Solving Proof Block Problems Using Large Language Models

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorPoulsen, Sethen_US
dc.contributor.authorSarsa, Samien_US
dc.contributor.authorPrather, Jamesen_US
dc.contributor.authorLeinonen, Juhoen_US
dc.contributor.authorBecker, Brett A.en_US
dc.contributor.authorHellas, Artoen_US
dc.contributor.authorDenny, Paulen_US
dc.contributor.authorReeves, Brent N.en_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorLecturer Hellas Arto groupen
dc.contributor.groupauthorComputer Science Lecturersen
dc.contributor.groupauthorComputer Science - Computing education research and educational technology (CER) - Research areaen
dc.contributor.organizationUtah State Universityen_US
dc.contributor.organizationDepartment of Computer Scienceen_US
dc.contributor.organizationAbilene Christian Universityen_US
dc.contributor.organizationUniversity College Dublinen_US
dc.contributor.organizationUniversity of Aucklanden_US
dc.date.accessioned2024-05-15T07:55:57Z
dc.date.available2024-05-15T07:55:57Z
dc.date.issued2024-03-07en_US
dc.descriptionPublisher Copyright: © 2024 Owner/Author.
dc.description.abstractLarge language models (LLMs) have recently taken many fields, including computer science, by storm. Most recent work on LLMs in computing education has shown that they are capable of solving most introductory programming (CS1) exercises, exam questions, Parsons problems, and several other types of exercises and questions. Some work has investigated the ability of LLMs to solve CS2 problems as well. However, it remains unclear how well LLMs fare against more advanced upper-division coursework, such as proofs in algorithms courses. After all, while known to be proficient in many programming tasks, LLMs have been shown to have more difficulties in forming mathematical proofs. In this paper, we investigate the ability of LLMs to solve mathematical proofs by using Proof Blocks, a tool previously shown to efficaciously teach proofs to students. Our results show that GPT-3.5 is almost completely unable to provide correct solutions (11.4%), while GPT-4 shows a significant increase in correctness (64.8%). However, even given this improvement, current models still struggle to correctly order lines in a proof. It remains an open question whether this is a temporary situation or if LLMs will continue to struggle to solve these types of exercises in the future.en
dc.description.versionPeer revieweden
dc.format.extent7
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationPoulsen, S, Sarsa, S, Prather, J, Leinonen, J, Becker, B A, Hellas, A, Denny, P & Reeves, B N 2024, Solving Proof Block Problems Using Large Language Models . in SIGCSE 2024 - Proceedings of the 55th ACM Technical Symposium on Computer Science Education . ACM, pp. 1063-1069, ACM Technical Symposium on Computer Science Education, Portland, United States, 20/03/2024 . https://doi.org/10.1145/3626252.3630928en
dc.identifier.doi10.1145/3626252.3630928en_US
dc.identifier.isbn979-8-4007-0423-9
dc.identifier.otherPURE UUID: e589f41f-e85b-484b-8a8b-8581e20de0e3en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/e589f41f-e85b-484b-8a8b-8581e20de0e3en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85185719497&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/145820976/SCI_Poulsen_etal_SIGCSE_2024.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/127761
dc.identifier.urnURN:NBN:fi:aalto-202405153375
dc.language.isoenen
dc.relation.ispartofACM Technical Symposium on Computer Science Educationen
dc.relation.ispartofseriesSIGCSE 2024 - Proceedings of the 55th ACM Technical Symposium on Computer Science Educationen
dc.relation.ispartofseriespp. 1063-1069en
dc.rightsopenAccessen
dc.subject.keywordaien_US
dc.subject.keywordalgorithmsen_US
dc.subject.keywordartificial intelligenceen_US
dc.subject.keywordchatgpten_US
dc.subject.keywordcode generationen_US
dc.subject.keywordgenerative aien_US
dc.subject.keywordgpt-3en_US
dc.subject.keywordgpt-4en_US
dc.subject.keywordlarge language modelsen_US
dc.subject.keywordopenaien_US
dc.subject.keywordproof blocksen_US
dc.subject.keywordproofsen_US
dc.titleSolving Proof Block Problems Using Large Language Modelsen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files