Automatic Algorithm Recognition Based on Programming Schemas and Beacons - A Supervised Machine Learning Classification Approach

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Korhonen, Ari, Dr., Aalto University, Finland
dc.contributor.author Taherkhani, Ahmad
dc.date.accessioned 2013-03-01T09:30:15Z
dc.date.available 2013-03-01T09:30:15Z
dc.date.issued 2013
dc.identifier.isbn 978-952-60-4990-8 (electronic)
dc.identifier.isbn 978-952-60-4989-2 (printed)
dc.identifier.issn 1799-4942 (electronic)
dc.identifier.issn 1799-4934 (printed)
dc.identifier.issn 1799-4934 (ISSN-L)
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/8787
dc.description.abstract In this thesis, we present techniques to recognize basic algorithms covered in computer science education from source code. The techniques use various software metrics, language constructs and other characteristics of source code, as well as the concept of schemas and beacons from program comprehension models. Schemas are high level programming knowledge with detailed knowledge abstracted out. Beacons are statements that imply specific structures in a program. Moreover, roles of variables constitute an important part of the techniques. Roles are concepts that describe the behavior and usage of variables in a program. They have originally been introduced to help novices learn programming. We discuss two methods for algorithm recognition. The first one is a classification method based on a supervised machine learning technique. It uses the vectors of characteristics and beacons automatically computed from the algorithm implementations of a training set to learn what characteristics and beacons can best describe each algorithm. Based on these observed instance-class pairs, the system assigns a class to each new input algorithm implementation according to its characteristics and beacons. We use the C4.5 algorithm to generate a decision tree that performs the task. In the second method, the schema detection method, algorithms are defined as schemas that exist in the knowledge base of the system. To identify an algorithm, the method searches the source code to detect schemas that correspond to those predefined schemas. Moreover, we present a method that combines these two methods: it first applies the schema detection method to extract algorithmic schemas from the given program and then proceeds to the classification method applied to the schema parts only. This enhances the reliability of the classification method, as the characteristics and beacons are computed only from the algorithm implementation code, instead of the whole given program. We discuss several empirical studies conducted to evaluate the performance of the methods. Some results are as follows: evaluated by leave-one-out cross-validation, the estimated classification accuracy for sorting algorithms is 98,1%, for searching, heap, basic tree traversal and graph algorithms 97,3% and for the combined method (on sorting algorithms and their variations from real student submissions) 97,0%. For the schema detection method, the accuracy is 88,3% and 94,1%, respectively. In addition, we present a study for categorizing student-implemented sorting algorithms and their variations in order to find problematic solutions that would allow us to give feedback on them. We also explain how these variations can be automatically recognized. en
dc.format.extent 117 + app. 137
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher Aalto University en
dc.publisher Aalto-yliopisto fi
dc.relation.ispartofseries Aalto University publication series DOCTORAL DISSERTATIONS en
dc.relation.ispartofseries 17/2013
dc.relation.haspart [Publication 1]: Ahmad Taherkhani, Ari Korhonen and Lauri Malmi. Recognizing Algorithms Using Language Constructs, Software Metrics and Roles of Variables: An Experiment with Sorting Algorithms. The Computer Journal, Volume 54 Issue 7, pages 1049–1066, June 2011.
dc.relation.haspart [Publication 2]: Ahmad Taherkhani. Using Decision Tree Classifiers in Source Code Analysis to Recognize Algorithms: An Experiment with Sorting Algorithms.The Computer Journal, Volume 54 Issue 11, pages 1845–1860, October 2011.
dc.relation.haspart [Publication 3]: Ahmad Taherkhani, Ari Korhonen and Lauri Malmi. Categorizing Variations of Student-Implemented Sorting Algorithms. Computer Science Education, Volume 22 Issue 2, pages 109–138, June 2012.
dc.relation.haspart [Publication 4]: Ahmad Taherkhani, Ari Korhonen and Lauri Malmi. Automatic Recognition of Students’ Sorting Algorithm Implementations in a Data Structures and Algorithms Course. In Proceedings of the 12th Koli Calling International Conference on Computing Education Research, Tahko, Finland,10 pages, 15–18 November 2012, to appear.
dc.relation.haspart [Publication 5]: Ahmad Taherkhani. Automatic Algorithm Recognition Based on Programming Schemas. In Proceedings of the 23th Annual Workshop on the Psychology of Programming Interest Group (PPIG ’11), University of York, UK, 12 pages, 6–8 September 2011.
dc.relation.haspart [Publication 6]: Ahmad Taherkhani and Lauri Malmi. Beacon- and Schema-Based Method for Recognizing Algorithms from Students’ Source Code. Submitted to Journal of Educational Data Mining, 23 pages, submitted in June 2012.
dc.relation.haspart [Publication 7]: Ahmad Taherkhani. Schema Detection and Beacon-Based Classification for Algorithm Recognition. In Proceedings of the 24th Annual Workshop on the Psychology of Programming Interest Group (PPIG ’12), London Metropolitan University, UK, 12 pages, 21–23 November 2012.
dc.subject.other Computer science en
dc.title Automatic Algorithm Recognition Based on Programming Schemas and Beacons - A Supervised Machine Learning Classification Approach en
dc.type G5 Artikkeliväitöskirja fi
dc.contributor.school Perustieteiden korkeakoulu fi
dc.contributor.school School of Science en
dc.contributor.department Tietotekniikan laitos fi
dc.contributor.department Department of Computer Science and Engineering en
dc.subject.keyword algorithm recognition en
dc.subject.keyword schema detection en
dc.subject.keyword beacons en
dc.subject.keyword roles of variables en
dc.subject.keyword program comprehension en
dc.subject.keyword automated assessment en
dc.identifier.urn URN:ISBN:978-952-60-4990-8
dc.type.dcmitype text en
dc.type.ontasot Doctoral dissertation (article-based) en
dc.type.ontasot Väitöskirja (artikkeli) fi
dc.contributor.supervisor Malmi, Lauri, Prof., Aalto University, Finland
dc.opn Salakoski, Tapio, Prof. University of Turku, Finland
dc.contributor.lab Learning + Technology Group en
dc.rev Sajaniemi, Jorma, Prof., University of Eastern Finland, Finland
dc.rev Johnson, Colin, Dr., University of Kent, United Kingdom
dc.date.defence 2013-03-08


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account