Extracting Measured Properties for Numerical Data with SciBERT model and Question Answering
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2022-01-24
Department
Major/Subject
Human-Computer Interaction and Design
Mcode
SCI3020
Degree programme
Master's Programme in ICT Innovation
Language
en
Pages
50+0
Series
Abstract
Quantity is a measurement (e.g. 18g), which usually consists of numerical data and units. Quantity is crucial and very frequently mentioned in scientific publications. At Elsevier, there are good solutions for searching quantities or their components, like numerical data or units. In these products, you can search all the papers which contain, for example, ”< 2mm”, in their full text. However, what the observed measurements represent is still unclear. For example, when you search ”< 2mm”, does ”2mm” represent the length or diameter of a tube? The ambiguity causes many irrelevant results in their search engines. The property behind the quantity is called measured property. To solve this ambiguity and enhance the search capability, extracting what measured property a quantity represents is the next step of Elsevier. When users can search both quantity and measured property at the same time, they can definitely get more accurate results. In this paper, we propose a Question-Answering architecture for joint measured property and relationship extraction based on the numerical data extraction model. The Question-Answering architecture enables a named entity recognition model to extract entity and relationship jointly. We train a SciBERT model to extract quantity in the corpus and another SciBERT model to extract corresponding measured property for each quantity. Meanwhile, we annotate a dataset with the publications from the engineering domain, MeasPro, for our model training. It proves that our approach has excellent accuracy and it is better than the state-of-art models on MeasEval dataset.Description
Supervisor
Theune, MariëtThesis advisor
Keulen, MauriceDoornenbal, Marius
Keywords
NLP, entity extraction, relationship extraction, measurement, scientific publications