Improving Text Recognition Results in Paper Documents using Spatial Transformer Networks for Image Pre-Processing

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2020-12-14

Department

Major/Subject

EIT Digital Data Science

Mcode

SCI3095

Degree programme

Master's Programme in ICT Innovation

Language

en

Pages

45 + 6

Series

Abstract

Having an OCR implementation that is robust against skew and distortion is especially useful when handling photos or scans of paper documents, which can contain folds and where the photo can be taken from a large number of different perspectives. The thesis aims to identify if Spatial Transformer Networks can be used to improve Optical Character Recognition solutions when the images are distorted and/or skewed. It firstly establishes a baseline performance metric and uses several different generated datasets to train and test the Spatial Transformer Network in combination with the Optical Character Recognition implementation in an end-to-end fashion. Empirical evidence is presented, showing that the use of a Spatial Transformer Network greatly increases accuracy and greatly reduces training time when a dataset contains images with rotated text with a maximum rotation of 30 degrees. Furthermore, this paper shows that the impact of a Spatial Transformer Network is limited when the text in a dataset is already horizontal. Furthermore, this paper shows that a Spatial Transformer Network is able to successfully condition its output on each individual input image, as opposed to learning a general transformation for the entire dataset.

Description

Supervisor

Marttinen, Pekka

Thesis advisor

Ketola, Petri

Keywords

optical character recognition, spatial transformer network, OCR, text recognition

Other note

Citation