Impact of ontologies and data structures on retrieval augmented generation systems in manufacturing simulation software

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Engineering | Master's thesis

Department

Major/Subject

Mcode

Language

en

Pages

94

Series

Abstract

Manufacturing simulation software enables engineers to model, analyse, and optimise production systems using both structured and unstructured data. These platforms support virtual representations of manufacturing processes, equipment, and workflows, relying on extensive libraries of simulation components to capture domain-specific knowledge. In such environments, effective data management and retrieval are essential for informed decision making. This thesis investigates how ontologies and data structures affect the performance of Large Language Models (LLMs) in data from manufacturing simulation software, focusing on Retrieval Augmented Generation (RAG) systems. To investigate, three case studies were conducted. The first evaluated RAG systems using FAISS vector stores, comparing local and cloud based embedding models on proprietary eCatalog data, and showed that schema rich, structured data and advanced prompt engineering significantly improved retrieval accuracy, with cloud-based models outperforming local alternatives, especially for complex, implicit queries. The second explored knowledge graph-based retrieval using Neo4j, assessing various retriever architectures, including hybrid approaches that combine vector, full-text, and graph traversal methods. It was found that knowledge graphs paired with hybrid retrieval strategies excel at handling context rich and relational queries. The third investigated LLM-powered agents interacting with a relational SQLite3 database enhanced with full-text and vector indexes, where integrating semantic search into traditional SQL querying substantially improved performance on semantically complex queries, with more advanced agents delivering superior results at the cost of increased engineering complexity. Across all case studies, the experiments consistently revealed that the interplay between data structure, embedding model quality, and retrieval architecture is critical to the success of RAG systems in manufacturing simulation. The research concludes that investing in robust data modelling, leveraging ontologies and knowledge graphs, and adopting hybrid retrieval strategies are essential for building effective, context-aware RAG solutions in this domain. These insights provide practical guidance for researchers and practitioners seeking to deploy RAG applications in complex engineering environments.

Description

Supervisor

Pippuri-Mäkeläinen, Jenni

Thesis advisor

Anttila, Mika
Yang, Chao

Other note

Citation