Semantic matching under constraints: A comparative study of two paradigms

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorBoström, Henrik
dc.contributor.advisorLayegh Kheirabadi, Amirhossein
dc.contributor.authorXu, Ying
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorKorpi-Lagg, Maarit
dc.date.accessioned2025-08-19T17:02:50Z
dc.date.available2025-08-19T17:02:50Z
dc.date.issued2025-07-16
dc.description.abstractThis study investigates the comparative effectiveness of retrieval-based and classification-based approaches to semantic matching task in low-resource, fixed-category, and structurally homogeneous text scenarios, using industrial fault diagnosis scenario as an example. The core task is to match free-form user fault descriptions to standardized fault rule texts. Five experimental configurations are evaluated in this study, combining dense retrieval, semantic reranking, and domain-specific named entity recognition (NER) as comparison strategies. The analysis is structured around four key dimensions: 1) performance trade-offs between retrieval and classification methods; 2) the effectiveness of semantic reranking in refining embedding-based retrieval results; 3) the added value of NER-based preprocessing for semantic alignment, including comparisons of tagging strategies and application scopes; and 4) the impact of varying user input types (lengthy, precise, and vague) on model robustness. Results demonstrate that classification-based methods achieve superior performance with significantly higher accuracy compared to retrieval approaches, due to their ability to fully leverage labeled training data in constrained scenarios. Semantic reranking substantially improves retrieval accuracy but introduces considerable computational overhead. Domain-specific NER preprocessing shows limited effectiveness and may degrade performance due to domain adaptation mismatch and information dilution. Notably, negative interactions between system components can occur, where combining enhancements yields inferior results compared to individual components. The analysis also reveals that lengthy, conversational user descriptions outperform precise technical specifications, explained through semantic over-specification and pathway redundancy frameworks. These findings challenge conventional assumptions about component integration in semantic matching systems and provide insights for designing domain-specific applications where labeled data is limited and user input quality varies.en
dc.format.extent50
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/138071
dc.identifier.urnURN:NBN:fi:aalto-202508196300
dc.language.isoenen
dc.programmeMaster's Programme in ICT Innovationen
dc.programme.majorData Scienceen
dc.subject.keywordsemantic matchingen
dc.subject.keywordcassification-based methoden
dc.subject.keywordretrieval-based methoden
dc.subject.keywordlow-resourceen
dc.subject.keywordNERen
dc.subject.keywordtroubleshootingen
dc.titleSemantic matching under constraints: A comparative study of two paradigmsen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
master_Xu_Ying_2025.pdf
Size:
1.88 MB
Format:
Adobe Portable Document Format