Browsing by Author "Tamper, Minna"
Now showing 1 - 20 of 34
- Results Per Page
- Sort Options
- AATOS – A configurable tool for automatic annotation
A4 Artikkeli konferenssijulkaisussa(2017) Tamper, Minna; Leskinen, Petri; Ikkala, Esko; Oksanen, Arttu; Mäkelä, Eetu; Heino, Erkki; Tuominen, Jouni; Koho, Mikko; Hyvönen, EeroThis paper presents an automatic annotation tool AATOS for providing documents with semantic annotations. The tool links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different natural language Finnish texts. The application was developed as a part of the WarSampo (http://seco.cs.aalto.fi/projects/sotasampo/en/) and Semantic Finlex (http://seco.cs.aalto.fi/projects/lawlod/en/) projects and tested using Kansa Taisteli magazine articles and consolidated Finnish legislation of Semantic Finlex. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results. - Analyses of Networks of Politicians Based on Linked Data: Case ParliamentSampo - Parliament of Finland on the Semantic Web
A3 Kirjan tai muun kokoomateoksen osa(2022-08-29) Pokkimäki, Henna; Leskinen, Petri; Tamper, Minna; Hyvönen, EeroIn parliamentary debates the speakers make reference to each other. By extracting and linking named entities from the speeches it is possible to construct reference networks and use them for analysing networks of politicians and parties and their debates. This paper presents how such a network can be constructed automatically, based on a speech corpus 2015–2022 of the Parliament of Finland, and be used as a basis for network analysis. - Analyzing biography collections historiographically as Linked Data: Case National Biography of Finland
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022-08-30) Tamper, Minna; Leskinen, Petri; Hyvönen, Eero; Valjus, Risto; Keravuori, KirsiBiographical collections are available on the Web for close reading. However, the underlying texts can also be used for data analysis and distant reading, if the documents are available as data. Such data is usable for creating intelligent user interfaces to biographical data, including Digital Humanities tooling for visualizations, data analysis, and knowledge discovery in biographical and prosopographical research. In this paper, we re-use biographical collection data from a historiographical perspective for analyzing the underlying collection. For example: What kind of people have been included in the collection? Does the language used for describing female biographees differ from that for men? As a case study, the Finnish National Biography, available as part of the Linked Open Data service and semantic portal BiographySampo – Finnish Biographies on the Semantic Web is used. The analyses show interesting results related to, e.g., how specific prosopographical groups, such as women or professional groups are represented and portrayed. Various novel statistics and network analyses of the biographees are presented. Our analyses give new insights to the editors of the National Biography as well as to researchers in biography, prosopography, and historiography. The presented approach can be applied also to similar biography collections in other countries. - Anonymization Service for Finnish Case Law: Opening Data without Sacrificing Data Protection and Privacy of Citizens
A4 Artikkeli konferenssijulkaisussa(2018-10) Tamper, Minna; Oksanen, Arttu; Tuominen, Jouni; Hyvönen, Eero; Hietanen, Aki - An Anonymization Tool for Open Data Publication of Legal Documents
A4 Artikkeli konferenssijulkaisussa(2022) Oksanen, Arttu; Hyvönen, Eero; Tamper, Minna; Tuominen, Jouni; Ylimaa, Henna; Löytynoja, Katja; Kokkonen, Matti; Hietanen, AkiThe EU General Data Protection Regulation (GDPR) requires anonymization of documents containing personal data, such as court decisions, for public use. Doing this manually is costly and time-consuming but can be automated by applying Natural Language Processing (NLP) methods. This paper introduces the ANOPPI tool developed for (semi-)automatic anonymization of Finnish texts. The tool can be used both as a web application and programmatically through a REST API. Evaluation shows that ANOPPI performs well with different types of documents, however, further improving the performance of the named entity recognition and disambiguation methods would enhance the usefulness of the software. The tool is being published as open source for public use by the Ministry of Justice in Finland. A use case of ANOPPI is to publish court decisions on the Web in the LawSampo semantic portal for human close reading and as Linked Open Data for data analysis in legal informatics. - ANOPPI: A Pseudonymization Service for Finnish Court Documents
A4 Artikkeli konferenssijulkaisussa(2019) Oksanen, Arttu; Tamper, Minna; Tuominen, Jouni; Hietanen, Aki; Hyvönen, EeroTo comply with the EU General Data Protection Regulation (GDPR) publishing court judgments online requires that personal data contained in them must be disguised. However, anonymizing the documents manually is a costly and time-consuming procedure. This paper presents Anoppi service for automatic and semi-automatic pseudonymization of Finnish court judgments. Utilizing both statistics- and rule-based named entity recognition methods and morphological analysis, Anoppi is able to automatically pseudonymize documents written in Finnish preserving their readability and layout. The service is currently still in development but pilot tests are going to be carried out in Finnish courts in 2020. - Automatic Annotation Service APPI: Named Entity Linking in Legal Domain
A4 Artikkeli konferenssijulkaisussa(2020) Tamper, Minna; Oksanen, Arttu; Tuominen, Jouni; Hietanen, Aki; Hyvönen, EeroTexts referencing court decisions and statutes can be difficult to understand without context. It can be time consuming and expensive to find related statutes or to learn about context specific terminology. As a solution, we utilized a named entity linking tool for extracting information and tailored it into a service, Appi, that can automatically annotate legal documents to provide context to the readers. The service can identify and link named entities and references to legal texts to corresponding vocabularies and data sources by combining statistics- and rule-based named entity recognition with named entity linking. The results provide users with enhanced reading experience with contextual information and the possibility to access related materials, such as statutes and court decisions. - Biografiasampo yhdistää ja rikastaa suomalaiset elämäkerrat linkitettynä datana semanttisessa webissä (Biographysampo links and enriches Finnish biographies as linked data on the Semantic Web
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2021-06-01) Hyvönen, Eero; Leskinen, Petri; Tamper, Minna; Rantala, Heikki; Ikkala, Esko; Tuominen, Jouni; Keravuori, KirsiInformaatiotutkimuksen tavoitteena on kehittää uusia tapoja tuottaa, organisoida ja käyttää tietoa sekä yksilöiden että organisaatioiden näkökulmasta. Tässä katsauksessa esitellään kulttuurihistoriallisen tiedon tuottajia ja käyttäjiä palvelevan ns. Sampo-mallin sovellus Biografiasampo kansalaisille, digitaalisten ihmistieteiden tutkijoille ja uusien sovellusten kehittäjille. Biografiasammon kunnianhimoisena tavoitteena on käynnistää uusi aikakausi elämäkertakokoelmien julkaisemisessa ja käyttämisessä verkossa semanttisen webin teknologioita ja linkitetyn avoimen datan julkaisuperiaatteita hyödyntäen. Innovaationa on luoda kieliteknologian, tekoälyn ja semanttisen webin teknologioiden avulla elämäkertojen teksteistä ja niihin eri lähteissä liittyvistä tietokannoista tietämysverkko (knowledge graph) osana kansallista tietoinfrastruktuuria. Sovelluksen ydinaineistona ovat Kansallisbiografia ja muut Suomalaisen Kirjallisuuden Seuran toimittamat ja julkaisemat pienoiselämäkerrat, yhteensä 13 100 elämäntarinaa, joita on kirjoittanut 980 suomalaista tutkijaa maamme suurimmaksi sanotussa historiantutkimuksen hankkeessa. Elämäkerroista louhittua dataa on rikastettu automaattisen loogisen päättelyn avulla ja linkittämällä sitä 16 muuhun tietolähteeseen. Tietämysverkko on julkaistu linkitetyn avoimen datan Linked Data Finland -palvelussa. Datapalvelun avulla on toteutettu seitsemästä sovellusnäkymästä koostuva älykäs, avoin ja maksuton verkkopalvelu biografiasampo.fi, jolla on ollut noin 50 000 käyttäjää. Sekä järjestelmän elämäkerrat että niistä louhittu data ovat avoimesti käytettävissä datapalveluna Linked Data Finland -alustalla. - BiographySampo – Publishing and enriching biographies on the semantic web for digital humanities research
A4 Artikkeli konferenssijulkaisussa(2019-06-02) Hyvönen, Eero; Leskinen, Petri; Tamper, Minna; Rantala, Heikki; Ikkala, Esko; Tuominen, Jouni; Keravuori, KirsiThis paper argues for making a paradigm shift in publishing and using biographical dictionaries on the web, based on Linked Data. The idea is to provide the user with enhanced reading experience of biographies by enriching contents with data linking and reasoning. In addition, versatile tooling for (1) biographical research of individual persons as well as for (2) prosopographical research on groups of people are provided. To demonstrate and evaluate the new possibilities, we present the semantic portal “BiographySampo – Finnish Biographies on the Semantic Web”. The system is based on a knowledge graph extracted automatically from a collection of 13 100 textual biographies, enriched with data linking to 16 external data sources, and by harvesting external collection data from libraries, museums, and archives. The portal was released in September 2018 for free public use at http://biografiasampo.fi. - Building Lightweight Ontologies for Faceted Search with Named Entity Recognition: Case WarMemoirSampo
A4 Artikkeli konferenssijulkaisussa(2022-08-11) Koho, Mikko; Leal, Rafael; Ikkala, Esko; Tamper, Minna; Rantala, Heikki; Hyvönen, EeroThis paper discusses building lightweight ontologies for faceted search user interfaces with Named Entity Recognition (NER) from textual data. This is studied in the context of building a Knowledge Graph for the textual indexing of interview videos in the in-use WarMemoirSampo system, consisting of a Linked Open Data service and an open semantic web portal for contextualized video viewing. It is shown that state-of-the-art NER tools are able to find entities from textual data and categorize them with high enough recall and precision to be useful for building facet ontologies, without involving considerable manual domain ontology engineering. To enable entity disambiguation and to be able to show relevant contextual information and useful links for the users of the portal, also Named Entity Linking techniques are employed. - Extending the Finnish Linked Data Infrastructure with Natural Language Processing Services in FIN-CLARIAH
A4 Artikkeli konferenssijulkaisussa(2022) Tamper, Minna; Tuominen, Jouni; Hyvönen, EeroThe DARIAH-EU infrastructure for Digital Humanities (DH) is often focusing on using structured data for quantitative studies, while the EU-CLARIN infrastructure deals primarily with unstructured natural language texts. However, in DH research both texts and structured data are often needed. It therefore makes sense to develop and use both infrastructures together, as suggested in the Dutch CLARIAH programme and the corresponding FIN-CLARIAH initiative in Finland, a new part of the Finnish research infrastructure road map of the Academy of Finland. This poster paper introduces work in FIN-CLARIAH relating to the idea of integrating natural language processing (NLP) tools with the Linked Open Data (LOD) Infrastructure for Digital Humanities in Finland (LODI4DH). We present a plan for NLP services to be opened as part of the Linked Data Finland (LDF.fi) platform. The new services are used for knowledge extraction from Finnish texts for weaving LOD, and on the other hand for language DH data analyses of the published datasets in applications in many domains, such as political culture. The extended LDF.fi platform will provide users with documented APIs for NLP services using unified output formats as well as software delivery as Docker containers, to lower the bar for deployment. - Extracting Knowledge from Parliamentary Debates for Studying Political Culture and Language
A4 Artikkeli konferenssijulkaisussa(2022-08-11) Tamper, Minna; Leal, Rafael; Sinikallio, Laura; Leskinen, Petri; Tuominen, Jouni; Hyvönen, EeroThis paper presents knowledge extraction and natural language processing methods used to enrich the knowledge graph of the plenary debates (textual transcripts of speeches) of the Parliament of Finland. This knowledge graph includes some 960 000 speeches (1907–2021) interlinked with a prosopographical knowledge graph about the politicians. A recent subset of the speeches was used to extract named entities and topical keywords for semantic searching and browsing the data and for data analysis. The process is based on linguistic analysis, named entity linking, and automatic subject indexing. The results were included into the ParliamentSampo knowledge graph in a SPARQL endpoint. This data can be used for studying parliamentary language and culture in Digital Humanities research and for developing applications, such as the ParliamentSampo portal. - Extraction of Entities and Concepts from Finnish Texts
Perustieteiden korkeakoulu | Master's thesis(2016-12-12) Tamper, MinnaKeywords are used in many document databases to improve search. The process of assigning keywords from controlled vocabularies to a document is called subject indexing. If the controlled vocabulary used for indexing is an ontology, with semantic relations and descriptions of concepts, the process is also called semantic annotation. In this thesis an automatic annotation tool was created to provide the documents with semantic annotations. The application links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated legislation of Finnish legislation. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results. - Finding Nineteenth-century Berry Spots: Recognizing and Linking Place Names in a Historical Newspaper Berry-picking Corpus
A4 Artikkeli konferenssijulkaisussa(2019) La Mela, Matti; Tamper, Minna; Kettunen, KimmoThe paper studies and improves methods of named entity recognition (NER) and linking (NEL) for facilitating historical research, which uses digitized newspaper texts. The specific focus is on a study about historical process of commodification. The named entity detection pipeline is discussed in three steps. First, the paper presents the corpus, which consists of newspaper articles on wild berry picking from the late nineteenth century. Second, the paper compares two named entity recognition tools: the trainable Stanford NER and the rule-based FiNER. Third, the linking and disambiguation of the recognized places is explored. In the linking process, information about the newspaper publication place is used to improve the identification of small places. The paper concludes that the pipeline performs well for mapping the commodification, and that specific problems relate to the recognition of place names (among named entities). It is shown how Stanford NER performs better in the task (F-score of 0.83) than the FiNER tool (F-score of 0.68). Concerning the linking of places, the use of newspaper metadata appears useful for disambiguation between small places. However, the historical language (with its OCR errors) recognized by the Stanford model poses challenges for the linking tool. The paper proposes that other information, for instance about the reuse of the newspaper articles, could be used to further improve the recognition and linking quality. - FindSampo: A Linked Data Based Portal and Data Service for Analyzing and Disseminating Archaeological Object Finds
A4 Artikkeli konferenssijulkaisussa(2022-05-31) Rantala, Heikki; Ikkala, Esko; Rohiola, Ville; Koho, Mikko; Tuominen, Jouni; Oksanen, Eljas; Wessman, Anna; Hyvönen, EeroThis paper presents the FindSampo system for analyzing and disseminating archaeological object finds made by the public. The system is based on Linked Open Data (LOD), and consists of a web portal and an open data service. The underlying knowledge graph contains data of some 3000 archaeological object finds catalogued in the archaeological collection of the Finnish Heritage Agency (FHA) from 2015 to 2020. The portal and LOD service have been open to public use since May 2021. - Finnish Parliament on the Semantic Web: Using ParliamentSampo Data Service and Semantic Portal for Studying Political Culture and Language
A4 Artikkeli konferenssijulkaisussa(2022-05) Hyvönen, Eero; Leskinen, Petri; Sinikallio, Laura; La Mela, Matti; Tuominen, Jouni; Elo, Kimmo; Drobac, Senka; Koho, Mikko; Ikkala, Esko; Tamper, Minna; Leal, Rafael; Kesäniemi, JoonasThis paper introduces the system ParliamentSampo - Parliament of Finland on the Semantic Web, a Linked Open Data (LOD) service, data infrastructure, and semantic portal for studying Finnish political culture, language, and networks of the Members of Parliament (MP). The article presents the vision behind the system, the LOD service, and explores the possibilities to utilize it in research and application development. A knowledge graph of linked data has been created based on ca. 962 000 speeches in all plenary sessions of the Parliament of Finland in 1907-2021; the data is also available in XML format, utilizing the new international Parla-CLARIN format. For the first time, the entire time series of the Finnish parliamentary speeches has been converted into data and a data service in a unified format. In addition, the speeches have been interlinked with another knowledge graph created from the database of the MPs and enriched from other data sources into a broader ontology-based data service. The paper shows how the LOD service SPARQL endpoint can be used to research parliamentary culture, the use of political language, and networks of politicians through data analysis. The service endpoint can also be used to develop applications for different user groups without programming skills, such as the ParliamentSampo semantic portal introduced in the paper, too. This application aims to make political decision making more transparent to the general public, media, politicians, and other end users. - From Text to Knowledge: Methods, Tools, and Applications for Digital Humanities Based on Linked Data
School of Science | Doctoral dissertation (article-based)(2023) Tamper, MinnaThe digitization of Cultural Heritage collections has enabled the use of computational methods such as Natural Language Processing (NLP) on textual collections. These methods have been used widely in Digital Humanities (DH) to study digitized contents with automated processes. The Semantic Web and linked data technologies have been applied to describe document collections and their metadata in library and museum collections. They provide infrastructure for connecting different collections by linking them using shared vocabularies that describe metadata values and fields. Linked data is also used in Finnish museum and library collections. It is commonly used to modeling document metadata, such as author, or title of a piece of work. Also, the content of a document in a collection is usually described using manually assigned keywords. Other information about the content is often scarce and finding documents related to an actor can be laborious. This thesis studies and presents novel models, methods, and tools for transforming and enriching document collections automatically to linked data. Linked data technology helps to link together documents of a collection based on their metadata, e.g., author, or publisher. It can be also used to link documents based on information extracted about the content, such as actors mentioned in text. The aim of this thesis is to study how the NLP methods and linked data can be used to study digitized document collections, such as biographies. Research in this thesis is conducted by designing, implementing, and evaluating proof-of-concept systems, tools, and data for real life use cases. The research follows the principles of the design science and action research. The thesis presents a toolkit that can be used to model, transform, and enrich biographical text document collections to linked data to improve collection's information retrieval and interoperability internally and with other collections. The data model for describing text document collection's content and features, e.g., keywords and mentioned names, creates a foundation for building intelligent services based on the linked data such as network or linguistic analysis. These services can be used to visualize the interlinked data by showing the relations between themes or actors. In addition, the linked-data-based datasets can be used as an input for NLP tools to create data analytical visualizations and applications. This approach can be also used to evaluate the quality and content of text document collections for DH research. The prototypes created for data transformation, enrichment, and information visualization can be also applied to other document collections. - Harmonizing and Using Numismatic Linked Data in Digital Humanities Research and Application Development: Case DigiNUMA
A4 Artikkeli konferenssijulkaisussa(2022-07-20) Rantala, Heikki; Oksanen, Eljas; Hyvönen, EeroThis paper outlines the ongoing work of the DigiNUMA project for creating solutions in data harmonisation, analysis, and dissemination of pan-European archaeological and numismatic Cultural Heritage, using linked data and semantic web technologies. The project focuses on Viking Age (800–1150 AD) Finnish and English numismatic data as a case study. A broader context is gained by research into harmonizing collection data of the National Museum of Finland, the British Museum, and the Fitzwilliam Museum in Cambridge for compatibility with the international Nomisma.org ontology, and by creating tools that can be used to work with other Nomisma.org datasets. - How to Search and Contextualize Scenes inside Videos for Enriched Watching Experience: Case Stories of the Second World War Veterans
A4 Artikkeli konferenssijulkaisussa(2022-07-20) Hyvönen, Eero; Ikkala, Esko; Koho, Mikko; Leal, Rafael; Rantala, Heikki; Tamper, MinnaThis demo paper demonstrates the idea of publishing and watching videos on the Semantic Web. An in-use application, WarMemoirSampo, is presented that enables scene segments in videos to be searched by their semantic content. While watching a video, additional contextual information is provided dynamically. The system is based on a SPARQL endpoint whose knowledge graph has been extracted automatically from timestamped natural language descriptions of the video contents. - LawSampo Portal and Data Service for Publishing and Using Legislation and Case Law as Linked Open Data on the Semantic Web
A4 Artikkeli konferenssijulkaisussa(2022) Hyvönen, Eero; Tamper, Minna; Ikkala, Esko; Koho, Mikko; Leal, Rafael; Kesäniemi, Joonas; Oksanen, Arttu; Tuominen, Jouni; Hietanen, AkiThis paper argues for the idea of publishing legislation and case law as Linked Open Data (LOD) on the Semantic Web, to cater several user groups, including the general public, legislators, lawyers, researchers of legal informatics, and application developers. To support the argument, the proof-of-concept system LawSampo - Finnish Legislation and Case Law on the Semantic Web is introduced, including a semantic portal and a LOD service. Based on the Sampo Model, the main novelty of LawSampo is the provision of heterogenous distributed legal data through multiple application perspectives for faceted searching and exploring the data and for data analysis in legal informatics.