Design and implementation of a cross- border multilingual semantic search system for legislation

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis

Department

Major/Subject

Mcode

Language

en

Pages

75

Series

Abstract

In an era of increasing cross-border mobility, accessing and understanding legislation across countries remains challenging due to language barriers and heterogeneous data sources. Motivated by the Nordic Council of Ministers’ initiative for seamless cross-border data exchange, this thesis presents the design and implementation of FinEstLawSampo, a proof-of-concept semantic search system that integrates Finnish and Estonian legislation into a unified, multilingual platform using LOD and Semantic Web technologies. Building on the Sampo-UI, FinEstLawSampo uses 12,394 Finnish statutes from LawSampo and transforms 351 Estonian statutes into RDF format. The data is enriched with EuroVoc keywords, life situation categories, and links to EU directives. The system employs automated translation with Opus-MT, keyword extraction via PyEuroVoc, and unsupervised classification using fastText embeddings to enable multilingual and context-aware search. FinEstLawSampo delivers a user-friendly portal with faceted search, visualizations, and detailed instance pages, supporting English, Finnish, and Estonian interfaces. The research results demonstrate a user-friendly portal with powerful filters, effective harmonization of cross-border legal data, and a 7-star LOD service. However, limitations include dataset size inequalities, translation inaccuracies. Despite these, FinEstLawSampo offers a scalable model for cross-border legal informatics, with future potential to expand to other Nordic-Baltic countries and integrate advanced NLP techniques, contributing to efficient access to legal data.

Description

Supervisor

Hyvönen, Eero

Thesis advisor

Rantala, Heikki
Leal, Rafael

Other note

Citation