Real-time Action Recognition for RGB-­D and Motion Capture Data

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Doctoral thesis (article-based) | Defence date: 2015-01-16

Authors

Date

2014

Major/Subject

Mcode

Degree programme

Language

en

Pages

104 + app. 87

Series

Aalto University publication series DOCTORAL DISSERTATIONS, 207/2014

Abstract

In daily life humans perform a great number of actions continuously. We recognize and interpret these actions unconsciously while interacting and communicating with people and the environment. If the machines and computers could also recognize human gestures as effectively as human beings, a new world would be unfolded, filled with a large number of applications to facilitate our daily life. These significant benefits for the society have motivated the research on machine-based gesture recognition, which has already shown some initial advantages in many applications. For example, gestures can be used as commands to control robots or computer programs instead of using standard input devices such as touch screens or mice. This thesis proposes a framework for gesture recognition systems based on motion capture and RGB-D data. Motion capture data consists of positions and orientations of the key joints of the human skeleton. RGB-D data contains the RGB image and depth data from which a skeletal model can be learnt. This skeletal model can be seen as a noisy approximation of the more accurate motion capture skeleton model. The modular design of our framework enables convenient recognition using multiple data modalities. The first part of the thesis introduces various methods used in existing recognition systems in the literature and a brief introduction of the proposed real-time recognition system for both whole body gestures and hand gestures. The second part of the thesis is a collection of eight publications by the author of the thesis. Detailed information about the proposed recognition system can be found in these publications. In general, the framework can be roughly divided into two parts, feature extraction and classification. Both have significant influence on the recognition performance. Multiple features are developed and extracted from the skeletons, images, and depth data for each frame in the motion sequence. These features are combined in the early fusion stage, and classified by a single hidden layer neural network - extreme learning machine. The frame-level classification outputs are then aggregated on the sequence level to obtain the final classification result. The methodologies used in the gesture recognition system are also applied in a proposed image retrieval system. Several image features are extracted and search algorithms are applied to achieve a fast and accurate retrieval. Furthermore, a method is also proposed to align different motion sequences and to evaluate the alignment. The method can be used for gesture retrieval and for skeleton generation algorithm evaluation.

Description

Supervising professor

Oja, Erkki, Aalto Distinguished Prof., Aalto University, Department of Information and Computer science, Finland

Thesis advisor

Koskela, Markus, Dr., University of Helsinki, Department of Computer Science, Finland

Keywords

action recognition, gesture recognition, RGB-D, motion capture, extreme learning machine, computer vision, machine learning, image retrieval

Other note

Parts

  • [Publication 1]: Xi Chen and Markus Koskela and Jouko Hyvakka. Image Based Information Access for Mobile Phones. In Proceedings of 8th International Workshop on Content-Based Multimedia Indexing (CBMI2010), pages 1-5, Grenoble, France, June 2010.
  • [Publication 2]: Xi Chen and Markus Koskela. Mobile Visual Search from Dynamic Image Databases. In Proceedings of 17th Scandinavian Conference on Image Analysis (SCIA 2011), pages 196-205, Ystad, Sweden, May 2011.
  • [Publication 3]: Xi Chen and Markus Koskela. Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine. In Proceedings of 18th Scandinavian Conference on Image Analysis (SCIA 2013), pages 640-651, Espoo, Finland, June 2013.
  • [Publication 4]: Xi Chen and Markus Koskela. Skeleton-Based Action Recognition with Extreme Learning Machines. Neurocomputing, Volume 149, Part A, Pages 387-396, February 2015.
  • [Publication 5]: Xi Chen and Markus Koskela. Sequence Alignment for RGB-D and Motion Capture Skeletons. In Proceedings of the International Conference on Image Analysis and Recognition (ICIAR 2013), pages 630-639, Povoa de Varzim, Portugal, June 2013.
  • [Publication 6]: Kyunghyun Cho and Xi Chen. Classifying and Visualizing Motion Capture Sequences using Deep Neural Networks. In Proceedings of the 9th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pages 122-130, Lisbon, Portugal, January 2014.
  • [Publication 7]: Xi Chen and Markus Koskela. Online RGB-D Gesture Recognition with Extreme Learning Machines. In Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI 2013), pages 467-474, Sydney, Australia, December 2013.
  • [Publication 8]: Xi Chen and Markus Koskela. Using Appearance-Based Hand Features For Dynamic RGB-D Gesture Recognition. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR14), pages 411-416, Stockholm, Sweden, August 2014.

Citation