Semantic matching under constraints: A comparative study of two paradigms
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
50
Series
Abstract
This study investigates the comparative effectiveness of retrieval-based and classification-based approaches to semantic matching task in low-resource, fixed-category, and structurally homogeneous text scenarios, using industrial fault diagnosis scenario as an example. The core task is to match free-form user fault descriptions to standardized fault rule texts. Five experimental configurations are evaluated in this study, combining dense retrieval, semantic reranking, and domain-specific named entity recognition (NER) as comparison strategies. The analysis is structured around four key dimensions: 1) performance trade-offs between retrieval and classification methods; 2) the effectiveness of semantic reranking in refining embedding-based retrieval results; 3) the added value of NER-based preprocessing for semantic alignment, including comparisons of tagging strategies and application scopes; and 4) the impact of varying user input types (lengthy, precise, and vague) on model robustness. Results demonstrate that classification-based methods achieve superior performance with significantly higher accuracy compared to retrieval approaches, due to their ability to fully leverage labeled training data in constrained scenarios. Semantic reranking substantially improves retrieval accuracy but introduces considerable computational overhead. Domain-specific NER preprocessing shows limited effectiveness and may degrade performance due to domain adaptation mismatch and information dilution. Notably, negative interactions between system components can occur, where combining enhancements yields inferior results compared to individual components. The analysis also reveals that lengthy, conversational user descriptions outperform precise technical specifications, explained through semantic over-specification and pathway redundancy frameworks. These findings challenge conventional assumptions about component integration in semantic matching systems and provide insights for designing domain-specific applications where labeled data is limited and user input quality varies.Description
Supervisor
Korpi-Lagg, MaaritThesis advisor
Boström, HenrikLayegh Kheirabadi, Amirhossein