Improving Text Recognition Results in Paper Documents using Spatial Transformer Networks for Image Pre-Processing
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2020-12-14
Department
Major/Subject
EIT Digital Data Science
Mcode
SCI3095
Degree programme
Master's Programme in ICT Innovation
Language
en
Pages
45 + 6
Series
Abstract
Having an OCR implementation that is robust against skew and distortion is especially useful when handling photos or scans of paper documents, which can contain folds and where the photo can be taken from a large number of different perspectives. The thesis aims to identify if Spatial Transformer Networks can be used to improve Optical Character Recognition solutions when the images are distorted and/or skewed. It firstly establishes a baseline performance metric and uses several different generated datasets to train and test the Spatial Transformer Network in combination with the Optical Character Recognition implementation in an end-to-end fashion. Empirical evidence is presented, showing that the use of a Spatial Transformer Network greatly increases accuracy and greatly reduces training time when a dataset contains images with rotated text with a maximum rotation of 30 degrees. Furthermore, this paper shows that the impact of a Spatial Transformer Network is limited when the text in a dataset is already horizontal. Furthermore, this paper shows that a Spatial Transformer Network is able to successfully condition its output on each individual input image, as opposed to learning a general transformation for the entire dataset.Description
Supervisor
Marttinen, PekkaThesis advisor
Ketola, PetriKeywords
optical character recognition, spatial transformer network, OCR, text recognition