Solving Proof Block Problems Using Large Language Models
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Poulsen, Seth | en_US |
dc.contributor.author | Sarsa, Sami | en_US |
dc.contributor.author | Prather, James | en_US |
dc.contributor.author | Leinonen, Juho | en_US |
dc.contributor.author | Becker, Brett A. | en_US |
dc.contributor.author | Hellas, Arto | en_US |
dc.contributor.author | Denny, Paul | en_US |
dc.contributor.author | Reeves, Brent N. | en_US |
dc.contributor.department | Department of Computer Science | en |
dc.contributor.groupauthor | Lecturer Hellas Arto group | en |
dc.contributor.groupauthor | Computer Science Lecturers | en |
dc.contributor.groupauthor | Computer Science - Computing education research and educational technology (CER) - Research area | en |
dc.contributor.organization | Utah State University | en_US |
dc.contributor.organization | Department of Computer Science | en_US |
dc.contributor.organization | Abilene Christian University | en_US |
dc.contributor.organization | University College Dublin | en_US |
dc.contributor.organization | University of Auckland | en_US |
dc.date.accessioned | 2024-05-15T07:55:57Z | |
dc.date.available | 2024-05-15T07:55:57Z | |
dc.date.issued | 2024-03-07 | en_US |
dc.description | Publisher Copyright: © 2024 Owner/Author. | |
dc.description.abstract | Large language models (LLMs) have recently taken many fields, including computer science, by storm. Most recent work on LLMs in computing education has shown that they are capable of solving most introductory programming (CS1) exercises, exam questions, Parsons problems, and several other types of exercises and questions. Some work has investigated the ability of LLMs to solve CS2 problems as well. However, it remains unclear how well LLMs fare against more advanced upper-division coursework, such as proofs in algorithms courses. After all, while known to be proficient in many programming tasks, LLMs have been shown to have more difficulties in forming mathematical proofs. In this paper, we investigate the ability of LLMs to solve mathematical proofs by using Proof Blocks, a tool previously shown to efficaciously teach proofs to students. Our results show that GPT-3.5 is almost completely unable to provide correct solutions (11.4%), while GPT-4 shows a significant increase in correctness (64.8%). However, even given this improvement, current models still struggle to correctly order lines in a proof. It remains an open question whether this is a temporary situation or if LLMs will continue to struggle to solve these types of exercises in the future. | en |
dc.description.version | Peer reviewed | en |
dc.format.extent | 7 | |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Poulsen, S, Sarsa, S, Prather, J, Leinonen, J, Becker, B A, Hellas, A, Denny, P & Reeves, B N 2024, Solving Proof Block Problems Using Large Language Models . in SIGCSE 2024 - Proceedings of the 55th ACM Technical Symposium on Computer Science Education . ACM, pp. 1063-1069, ACM Technical Symposium on Computer Science Education, Portland, United States, 20/03/2024 . https://doi.org/10.1145/3626252.3630928 | en |
dc.identifier.doi | 10.1145/3626252.3630928 | en_US |
dc.identifier.isbn | 979-8-4007-0423-9 | |
dc.identifier.other | PURE UUID: e589f41f-e85b-484b-8a8b-8581e20de0e3 | en_US |
dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/e589f41f-e85b-484b-8a8b-8581e20de0e3 | en_US |
dc.identifier.other | PURE LINK: http://www.scopus.com/inward/record.url?scp=85185719497&partnerID=8YFLogxK | |
dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/145820976/SCI_Poulsen_etal_SIGCSE_2024.pdf | en_US |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/127761 | |
dc.identifier.urn | URN:NBN:fi:aalto-202405153375 | |
dc.language.iso | en | en |
dc.relation.ispartof | ACM Technical Symposium on Computer Science Education | en |
dc.relation.ispartofseries | SIGCSE 2024 - Proceedings of the 55th ACM Technical Symposium on Computer Science Education | en |
dc.relation.ispartofseries | pp. 1063-1069 | en |
dc.rights | openAccess | en |
dc.subject.keyword | ai | en_US |
dc.subject.keyword | algorithms | en_US |
dc.subject.keyword | artificial intelligence | en_US |
dc.subject.keyword | chatgpt | en_US |
dc.subject.keyword | code generation | en_US |
dc.subject.keyword | generative ai | en_US |
dc.subject.keyword | gpt-3 | en_US |
dc.subject.keyword | gpt-4 | en_US |
dc.subject.keyword | large language models | en_US |
dc.subject.keyword | openai | en_US |
dc.subject.keyword | proof blocks | en_US |
dc.subject.keyword | proofs | en_US |
dc.title | Solving Proof Block Problems Using Large Language Models | en |
dc.type | A4 Artikkeli konferenssijulkaisussa | fi |
dc.type.version | publishedVersion |