Evaluating the Useof Retrieval-augmented Generation for Enhancing Online Courses
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2024-11-24
Department
Major/Subject
Security and Cloud Computing
Mcode
Degree programme
Master's Programme in Security and Cloud Computing
Language
en
Pages
91
Series
Abstract
Providing sufficient and adequate teaching assistance towards students in programming education for online courses requires substantial resources, especially considering the growing enrolment numbers. To tackle the problems of scalable course assistance, we developed a chat bot specific to the Web Software Development (WSD) course at Aalto, using a novel technology called retrievalaugmented-generation (RAG), which harnesses large language models (LLM) and augments the produced answer with search results from an external data source: in our case the course material, vectorised and embedded into a vector database. Our evaluations include a benchmark, in which we compare the faithfulness and relevancy of answers generated by 54 different configurations, determined by the LLM, the embedding model, the chunk size and amount of chunks, and the retrieval mode. The 28 used questions were mainly collected from course participants taking the WSD course. The findings suggest that in the context of this experiment, higher chunk sizes work better, a vector-only retrieval mode produces better results, the choice of LLM in itself had a mild effect on the answer quality, and text-embedding-3-large and all-MiniLM-v6 performed significantly better than RoBERTa. Furthermore, we conducted an in-person user survey (N =14), in which students were required to work on course tasks given the assistance of our chat bot, and a search functionality. The goal was to assess the satisfaction of RAG when compared against a search functionality, as well as the search performance using RAG when compared against a search functionality. The findings suggest users perceive both assistants as useful or highly useful, and that the bot produces factually correct results. The preference towards a specific assistant and performance depended on various factors, including the exercise type.Description
Supervisor
Hellas, ArtoThesis advisor
Kann, ViggoKoutcheme, Charles
Keywords
AI, artificial intelligence, education, generative AI, large language models, retrieval-augmented generation