Browser-based scene text detection and recognition on mobile devices

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2021-10-18

Department

Major/Subject

Data Science/Entrepreneurship

Mcode

SCI3095

Degree programme

Master's Programme in ICT Innovation

Language

en

Pages

72

Series

Abstract

Automatic detection and recognition of natural scene text is crucial to various applications such as navigation or object identification. Being able to do perform the task locally on a mobile phone enables offline functionality and increases user privacy amongst other benefits. This thesis presents TDR4W, a model tackling multi-oriented text detection and recognition in natural scenes that is designed for implementation in a progressive web application, allowing executing a model locally on a mobile device. TDR4W is based on the MobileNetV2 backbone. In contrast to many other commonly used multi-step solutions, the model unifies the prediction for detection and recognition and allows joined training. The design is almost as accurate as the previously used cloud-based solution with a difference of 1% in top-1 accuracy when tested on a dataset of labelled shipping container images. Moreover, it has less than half of the trainable parameters when compared to the previously used model, making its size much smaller. It only needs 3.9 billion floating-point operations to compute the prediction, which is not only less than the previously used cloud-based model but also less than a default segmentation model proposed by the authors of MobileNetV2, even though TDR4W works on images with bigger input size.

Description

Supervisor

Di Francesco, Mario

Thesis advisor

Rowlinson, Andrew

Keywords

scene text detection, scene text recognition, machine learning on mobile devices, machine learning in web browser

Other note

Citation