Impact of ontologies and data structures on retrieval augmented generation systems in manufacturing simulation software
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Engineering |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Language
en
Pages
94
Series
Abstract
Manufacturing simulation software enables engineers to model, analyse, and optimise production systems using both structured and unstructured data. These platforms support virtual representations of manufacturing processes, equipment, and workflows, relying on extensive libraries of simulation components to capture domain-specific knowledge. In such environments, effective data management and retrieval are essential for informed decision making. This thesis investigates how ontologies and data structures affect the performance of Large Language Models (LLMs) in data from manufacturing simulation software, focusing on Retrieval Augmented Generation (RAG) systems. To investigate, three case studies were conducted. The first evaluated RAG systems using FAISS vector stores, comparing local and cloud based embedding models on proprietary eCatalog data, and showed that schema rich, structured data and advanced prompt engineering significantly improved retrieval accuracy, with cloud-based models outperforming local alternatives, especially for complex, implicit queries. The second explored knowledge graph-based retrieval using Neo4j, assessing various retriever architectures, including hybrid approaches that combine vector, full-text, and graph traversal methods. It was found that knowledge graphs paired with hybrid retrieval strategies excel at handling context rich and relational queries. The third investigated LLM-powered agents interacting with a relational SQLite3 database enhanced with full-text and vector indexes, where integrating semantic search into traditional SQL querying substantially improved performance on semantically complex queries, with more advanced agents delivering superior results at the cost of increased engineering complexity. Across all case studies, the experiments consistently revealed that the interplay between data structure, embedding model quality, and retrieval architecture is critical to the success of RAG systems in manufacturing simulation. The research concludes that investing in robust data modelling, leveraging ontologies and knowledge graphs, and adopting hybrid retrieval strategies are essential for building effective, context-aware RAG solutions in this domain. These insights provide practical guidance for researchers and practitioners seeking to deploy RAG applications in complex engineering environments.Description
Supervisor
Pippuri-Mäkeläinen, JenniThesis advisor
Anttila, MikaYang, Chao