Learning Centre

Convolutional Neural Networks for Named Entity Recognition in Images of Documents

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Pieters, Roelof
dc.contributor.author van de Kerkhof, Jan
dc.date.accessioned 2016-10-12T11:38:32Z
dc.date.available 2016-10-12T11:38:32Z
dc.date.issued 2016-09-26
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/22821
dc.description.abstract This work researches named entity recognition (NER) with respect to images of documents with a domain-specific layout, by means of Convolutional Neural Networks (CNNs). Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which are used in this work. An NER task is first performed statically, where a static number of entity classes is extracted per document. Networks based on the deep VGG-16 network are used for this task. Here, experimental evaluation shows that framing the task as a classification task, where the network classifies each bounding box coordinate separately, leads to the best network performance. Also, a multi-headed architecture is introduced, where the network has an independent fully-connected classification head per entity. VGG-16 achieves better performance with the multi-headed architecture than with its default, single-headed architecture. Additionally, it is shown that transfer learning does not improve performance of these networks. Analysis suggests that the networks trained for the static NER task learn to recognise document templates, rather than the entities themselves, and therefore do not generalize well to new, unseen templates. For a dynamic NER task, where the type and number of entity classes vary per document, experimental evaluation shows that, on large entities in the document, the Faster R-CNN object detection framework achieves comparable performance to the networks trained on the static task. Analysis suggests that Faster R-CNN generalizes better to new templates than the networks trained for the static task, as Faster R-CNN is trained on local features rather than the full document template. Finally, analysis shows that Faster R-CNN performs poorly on small entities in the image and suggestions are made to improve its performance. en
dc.format.extent 45
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.title Convolutional Neural Networks for Named Entity Recognition in Images of Documents en
dc.type G2 Pro gradu, diplomityö fi
dc.contributor.school Perustieteiden korkeakoulu fi
dc.subject.keyword convolutional neural networks en
dc.subject.keyword faster R-CNN en
dc.subject.keyword named entity recognition en
dc.subject.keyword images en
dc.subject.keyword documents en
dc.identifier.urn URN:NBN:fi:aalto-201610124921
dc.programme.major Machine Learning and Data Mining fi
dc.programme.mcode SCI3015 fi
dc.type.ontasot Master's thesis en
dc.type.ontasot Diplomityö fi
dc.contributor.supervisor Karhunen, Juha
dc.programme Master’s Programme in Machine Learning and Data Mining fi
dc.ethesisid Aalto 4631
dc.location P1
local.aalto.openaccess yes
dc.rights.accesslevel openAccess
local.aalto.idinssi 54653
dc.type.publication masterThesis
dc.type.okm G2 Pro gradu, diplomityö


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse