Evaluating the Useof Retrieval-augmented Generation for Enhancing Online Courses

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis

Date

2024-11-24

Department

Major/Subject

Security and Cloud Computing

Mcode

Degree programme

Master's Programme in Security and Cloud Computing

Language

en

Pages

91

Series

Abstract

Providing sufficient and adequate teaching assistance towards students in programming education for online courses requires substantial resources, especially considering the growing enrolment numbers. To tackle the problems of scalable course assistance, we developed a chat bot specific to the Web Software Development (WSD) course at Aalto, using a novel technology called retrievalaugmented-generation (RAG), which harnesses large language models (LLM) and augments the produced answer with search results from an external data source: in our case the course material, vectorised and embedded into a vector database. Our evaluations include a benchmark, in which we compare the faithfulness and relevancy of answers generated by 54 different configurations, determined by the LLM, the embedding model, the chunk size and amount of chunks, and the retrieval mode. The 28 used questions were mainly collected from course participants taking the WSD course. The findings suggest that in the context of this experiment, higher chunk sizes work better, a vector-only retrieval mode produces better results, the choice of LLM in itself had a mild effect on the answer quality, and text-embedding-3-large and all-MiniLM-v6 performed significantly better than RoBERTa. Furthermore, we conducted an in-person user survey (N =14), in which students were required to work on course tasks given the assistance of our chat bot, and a search functionality. The goal was to assess the satisfaction of RAG when compared against a search functionality, as well as the search performance using RAG when compared against a search functionality. The findings suggest users perceive both assistants as useful or highly useful, and that the bot produces factually correct results. The preference towards a specific assistant and performance depended on various factors, including the exercise type.

Description

Supervisor

Hellas, Arto

Thesis advisor

Kann, Viggo
Koutcheme, Charles

Keywords

AI, artificial intelligence, education, generative AI, large language models, retrieval-augmented generation

Other note

Citation