Learning Centre

Finding Nineteenth-century Berry Spots

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author La Mela, Matti
dc.contributor.author Tamper, Minna
dc.contributor.author Kettunen, Kimmo
dc.contributor.editor Navarretta, Costanza
dc.contributor.editor Agirrezabal, Manex
dc.contributor.editor Maegaard, Bente
dc.date.accessioned 2019-06-03T14:18:50Z
dc.date.available 2019-06-03T14:18:50Z
dc.date.issued 2019
dc.identifier.citation La Mela , M , Tamper , M & Kettunen , K 2019 , Finding Nineteenth-century Berry Spots : Recognizing and Linking Place Names in a Historical Newspaper Berry-picking Corpus . in C Navarretta , M Agirrezabal & B Maegaard (eds) , DHN 2019 - Digital Humanities in the Nordic Countries : Proceedings of the Digital Humanities in the Nordic Countries 4th Conference, Copenhagen, Denmark, March 5-8, 2019 . CEUR Workshop Proceedings , vol. 2364 , CEUR , pp. 295-307 , Digital Humanities in the Nordic Countries , Copenhagen , Denmark , 06/03/2019 . < http://ceur-ws.org/Vol-2364/ > en
dc.identifier.issn 1613-0073
dc.identifier.other PURE UUID: d1b429a0-027b-43e3-8561-fd2953b82db0
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/d1b429a0-027b-43e3-8561-fd2953b82db0
dc.identifier.other PURE LINK: http://ceur-ws.org/Vol-2364/
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/33936034/27_paper.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/38379
dc.description.abstract The paper studies and improves methods of named entity recognition (NER) and linking (NEL) for facilitating historical research, which uses digitized newspaper texts. The specific focus is on a study about historical process of commodification. The named entity detection pipeline is discussed in three steps. First, the paper presents the corpus, which consists of newspaper articles on wild berry picking from the late nineteenth century. Second, the paper compares two named entity recognition tools: the trainable Stanford NER and the rule-based FiNER. Third, the linking and disambiguation of the recognized places is explored. In the linking process, information about the newspaper publication place is used to improve the identification of small places. The paper concludes that the pipeline performs well for mapping the commodification, and that specific problems relate to the recognition of place names (among named entities). It is shown how Stanford NER performs better in the task (F-score of 0.83) than theFiNER tool (F-score of 0.68). Concerning the linking of places, the use of newspaper metadata appears useful for disambiguation between small places. However, the historical language (with its OCR errors) recognized by the Stanford model poses challenges for the linking tool. The paper proposes that other information, for instance about the reuse of the newspaper articles, could be used to further improve the recognition and linking quality. en
dc.format.extent 13
dc.format.extent 295-307
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher CEUR
dc.relation.ispartof Digital Humanities in the Nordic Countries en
dc.relation.ispartofseries DHN 2019 - Digital Humanities in the Nordic Countries en
dc.relation.ispartofseries CEUR Workshop Proceedings en
dc.relation.ispartofseries Volume 2364 en
dc.rights openAccess en
dc.title Finding Nineteenth-century Berry Spots en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Computer Science
dc.contributor.department Professorship Hyvönen Eero
dc.contributor.department The National Library of Finland
dc.identifier.urn URN:NBN:fi:aalto-201906033464
dc.type.version publishedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics