Semantic matching under constraints: A comparative study of two paradigms
| dc.contributor | Aalto-yliopisto | fi |
| dc.contributor | Aalto University | en |
| dc.contributor.advisor | Boström, Henrik | |
| dc.contributor.advisor | Layegh Kheirabadi, Amirhossein | |
| dc.contributor.author | Xu, Ying | |
| dc.contributor.school | Perustieteiden korkeakoulu | fi |
| dc.contributor.school | School of Science | en |
| dc.contributor.supervisor | Korpi-Lagg, Maarit | |
| dc.date.accessioned | 2025-08-19T17:02:50Z | |
| dc.date.available | 2025-08-19T17:02:50Z | |
| dc.date.issued | 2025-07-16 | |
| dc.description.abstract | This study investigates the comparative effectiveness of retrieval-based and classification-based approaches to semantic matching task in low-resource, fixed-category, and structurally homogeneous text scenarios, using industrial fault diagnosis scenario as an example. The core task is to match free-form user fault descriptions to standardized fault rule texts. Five experimental configurations are evaluated in this study, combining dense retrieval, semantic reranking, and domain-specific named entity recognition (NER) as comparison strategies. The analysis is structured around four key dimensions: 1) performance trade-offs between retrieval and classification methods; 2) the effectiveness of semantic reranking in refining embedding-based retrieval results; 3) the added value of NER-based preprocessing for semantic alignment, including comparisons of tagging strategies and application scopes; and 4) the impact of varying user input types (lengthy, precise, and vague) on model robustness. Results demonstrate that classification-based methods achieve superior performance with significantly higher accuracy compared to retrieval approaches, due to their ability to fully leverage labeled training data in constrained scenarios. Semantic reranking substantially improves retrieval accuracy but introduces considerable computational overhead. Domain-specific NER preprocessing shows limited effectiveness and may degrade performance due to domain adaptation mismatch and information dilution. Notably, negative interactions between system components can occur, where combining enhancements yields inferior results compared to individual components. The analysis also reveals that lengthy, conversational user descriptions outperform precise technical specifications, explained through semantic over-specification and pathway redundancy frameworks. These findings challenge conventional assumptions about component integration in semantic matching systems and provide insights for designing domain-specific applications where labeled data is limited and user input quality varies. | en |
| dc.format.extent | 50 | |
| dc.format.mimetype | application/pdf | en |
| dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/138071 | |
| dc.identifier.urn | URN:NBN:fi:aalto-202508196300 | |
| dc.language.iso | en | en |
| dc.programme | Master's Programme in ICT Innovation | en |
| dc.programme.major | Data Science | en |
| dc.subject.keyword | semantic matching | en |
| dc.subject.keyword | cassification-based method | en |
| dc.subject.keyword | retrieval-based method | en |
| dc.subject.keyword | low-resource | en |
| dc.subject.keyword | NER | en |
| dc.subject.keyword | troubleshooting | en |
| dc.title | Semantic matching under constraints: A comparative study of two paradigms | en |
| dc.type | G2 Pro gradu, diplomityö | fi |
| dc.type.ontasot | Master's thesis | en |
| dc.type.ontasot | Diplomityö | fi |
| local.aalto.electroniconly | yes | |
| local.aalto.openaccess | yes |
Files
Original bundle
1 - 1 of 1