Real-time Action Recognition for RGB-D and Motion Capture Data
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Doctoral thesis (article-based)
| Defence date: 2015-01-16
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2014
Major/Subject
Mcode
Degree programme
Language
en
Pages
104 + app. 87
Series
Aalto University publication series DOCTORAL DISSERTATIONS, 207/2014
Abstract
In daily life humans perform a great number of actions continuously. We recognize and interpret these actions unconsciously while interacting and communicating with people and the environment. If the machines and computers could also recognize human gestures as effectively as human beings, a new world would be unfolded, filled with a large number of applications to facilitate our daily life. These significant benefits for the society have motivated the research on machine-based gesture recognition, which has already shown some initial advantages in many applications. For example, gestures can be used as commands to control robots or computer programs instead of using standard input devices such as touch screens or mice. This thesis proposes a framework for gesture recognition systems based on motion capture and RGB-D data. Motion capture data consists of positions and orientations of the key joints of the human skeleton. RGB-D data contains the RGB image and depth data from which a skeletal model can be learnt. This skeletal model can be seen as a noisy approximation of the more accurate motion capture skeleton model. The modular design of our framework enables convenient recognition using multiple data modalities. The first part of the thesis introduces various methods used in existing recognition systems in the literature and a brief introduction of the proposed real-time recognition system for both whole body gestures and hand gestures. The second part of the thesis is a collection of eight publications by the author of the thesis. Detailed information about the proposed recognition system can be found in these publications. In general, the framework can be roughly divided into two parts, feature extraction and classification. Both have significant influence on the recognition performance. Multiple features are developed and extracted from the skeletons, images, and depth data for each frame in the motion sequence. These features are combined in the early fusion stage, and classified by a single hidden layer neural network - extreme learning machine. The frame-level classification outputs are then aggregated on the sequence level to obtain the final classification result. The methodologies used in the gesture recognition system are also applied in a proposed image retrieval system. Several image features are extracted and search algorithms are applied to achieve a fast and accurate retrieval. Furthermore, a method is also proposed to align different motion sequences and to evaluate the alignment. The method can be used for gesture retrieval and for skeleton generation algorithm evaluation.Description
Supervising professor
Oja, Erkki, Aalto Distinguished Prof., Aalto University, Department of Information and Computer science, FinlandThesis advisor
Koskela, Markus, Dr., University of Helsinki, Department of Computer Science, FinlandKeywords
action recognition, gesture recognition, RGB-D, motion capture, extreme learning machine, computer vision, machine learning, image retrieval
Other note
Parts
- [Publication 1]: Xi Chen and Markus Koskela and Jouko Hyvakka. Image Based Information Access for Mobile Phones. In Proceedings of 8th International Workshop on Content-Based Multimedia Indexing (CBMI2010), pages 1-5, Grenoble, France, June 2010.
- [Publication 2]: Xi Chen and Markus Koskela. Mobile Visual Search from Dynamic Image Databases. In Proceedings of 17th Scandinavian Conference on Image Analysis (SCIA 2011), pages 196-205, Ystad, Sweden, May 2011.
- [Publication 3]: Xi Chen and Markus Koskela. Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine. In Proceedings of 18th Scandinavian Conference on Image Analysis (SCIA 2013), pages 640-651, Espoo, Finland, June 2013.
- [Publication 4]: Xi Chen and Markus Koskela. Skeleton-Based Action Recognition with Extreme Learning Machines. Neurocomputing, Volume 149, Part A, Pages 387-396, February 2015.
- [Publication 5]: Xi Chen and Markus Koskela. Sequence Alignment for RGB-D and Motion Capture Skeletons. In Proceedings of the International Conference on Image Analysis and Recognition (ICIAR 2013), pages 630-639, Povoa de Varzim, Portugal, June 2013.
- [Publication 6]: Kyunghyun Cho and Xi Chen. Classifying and Visualizing Motion Capture Sequences using Deep Neural Networks. In Proceedings of the 9th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pages 122-130, Lisbon, Portugal, January 2014.
- [Publication 7]: Xi Chen and Markus Koskela. Online RGB-D Gesture Recognition with Extreme Learning Machines. In Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI 2013), pages 467-474, Sydney, Australia, December 2013.
- [Publication 8]: Xi Chen and Markus Koskela. Using Appearance-Based Hand Features For Dynamic RGB-D Gesture Recognition. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR14), pages 411-416, Stockholm, Sweden, August 2014.