Effectiveness and Optimization of Large Language Models in Natural Language Processing for MongoDB Data Retrieval

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorLiverani, Michele
dc.contributor.authorRiboni, Andrea
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorLaaksonen, Jorma
dc.date.accessioned2024-12-29T17:30:40Z
dc.date.available2024-12-29T17:30:40Z
dc.date.issued2024-08-19
dc.description.abstractLarge Language Models (LLMs) have captured widespread attention due to their impressive ability to mimic language understanding and generate compelling text. However, their use in structured output tasks often leads to unreliable completions due to excessive creativity, making it challenging to evaluate the quality of their outputs. This situation calls for innovative approaches. The investigations of this thesis focus on enhancing and refining the application of LLMs for data retrieval tasks within MongoDB databases, aiming to enhance user experience and streamline query generation using natural language. Additionally, the thesis proposes dynamic and data-agnostic prompt-engineering techniques tailored to maximize accuracy within the specified context. Extensive testing across various LLM architectures is conducted to evaluate their proficiency in interpreting domain-specific language and properly retrieving the desired information. Furthermore, a novel obfuscation technique is introduced, aimed at concealing prompts while preserving their underlying semantics. This method holds particular significance for companies with stringent security requirements, offering a practical solution to protect sensitive information within automated query systems. Overall, this work provides novel strategies for optimizing LLM inference in specialized applications, laying the groundwork for future advancements in Natural Language Processing for query generation, obfuscation, and evaluation.en
dc.format.extent75+18
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/132599
dc.identifier.urnURN:NBN:fi:aalto-202412298126
dc.language.isoenen
dc.programmeMaster's Programme in ICT Innovationfi
dc.programme.majorData Sciencefi
dc.programme.mcodeSCI3115fi
dc.subject.keywordlarge language modelen
dc.subject.keywordinferenceen
dc.subject.keyworduser experienceen
dc.subject.keywordobfuscationen
dc.subject.keywordevaluationen
dc.titleEffectiveness and Optimization of Large Language Models in Natural Language Processing for MongoDB Data Retrievalen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessno

Files