Recurrent neural networks for object detection in video sequences

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2017-05-08
Department
Major/Subject
Machine Learning and Data Mining
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
59+3
Series
Abstract
This thesis explores recurrent neural network based methods for object detection in video sequences. Several models for object recognition are compared by using the KITTI object tracking dataset containing photos taken in an urban traffic environment. Metrics such as robustness to noise and object velocity prediction error are used to analyze the results. Neural networks and their training methodology is described in depth and recent models from the literature are reviewed. Several novel convolutional neural network architectures are introduced for the problem. The VGG-19 deep neural network is enhanced with convolutive recurrent layers to make it suitable for video analysis. Additionally a temporal coherency loss term is introduced to guide the learning process. Velocity estimation has not been studied in the literature and the velocity estimation performance was compared against a baseline frame-by-frame object detector neural network. The results from the experiments show that the recurrent architectures operating on video sequences consistently outperform an object detector that only perceives one frame of video at once. The recurrent models are more resilient to noise and produce more confident object detections as measured by the standard deviation of the predicted bounding boxes. The recurrent models are able to predict object velocity more accurately from video than the baseline frame-by-frame model.
Description
Supervisor
Lehtinen, Jaakko
Thesis advisor
Jänis, Pekka
Keywords
machine learning, recurrent neural network, temporal coherency, object detection, video sequence
Other note
Citation