Browser-based scene text detection and recognition on mobile devices
No Thumbnail Available
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Authors
Date
2021-10-18
Department
Major/Subject
Data Science/Entrepreneurship
Mcode
SCI3095
Degree programme
Master's Programme in ICT Innovation
Language
en
Pages
72
Series
Abstract
Automatic detection and recognition of natural scene text is crucial to various applications such as navigation or object identification. Being able to do perform the task locally on a mobile phone enables offline functionality and increases user privacy amongst other benefits. This thesis presents TDR4W, a model tackling multi-oriented text detection and recognition in natural scenes that is designed for implementation in a progressive web application, allowing executing a model locally on a mobile device. TDR4W is based on the MobileNetV2 backbone. In contrast to many other commonly used multi-step solutions, the model unifies the prediction for detection and recognition and allows joined training. The design is almost as accurate as the previously used cloud-based solution with a difference of 1% in top-1 accuracy when tested on a dataset of labelled shipping container images. Moreover, it has less than half of the trainable parameters when compared to the previously used model, making its size much smaller. It only needs 3.9 billion floating-point operations to compute the prediction, which is not only less than the previously used cloud-based model but also less than a default segmentation model proposed by the authors of MobileNetV2, even though TDR4W works on images with bigger input size.Description
Supervisor
Di Francesco, MarioThesis advisor
Rowlinson, AndrewKeywords
scene text detection, scene text recognition, machine learning on mobile devices, machine learning in web browser