Heterogeneous non-local fusion for multimodal activity recognition
No Thumbnail Available
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Authors
Date
2020-06-08
Major/Subject
Mcode
Degree programme
Language
en
Pages
10
Series
ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 63-72
Abstract
In this work, we investigate activity recognition using multimodal inputs from heterogeneous sensors. Activity recognition is commonly tackled from a single-modal perspective using videos. In case multiple signals are used, they come from the same homogeneous modality, e.g. in the case of color and optical flow. Here, we propose an activity network that fuses multimodal inputs coming from completely different and heterogeneous sensors. We frame such a heterogeneous fusion as a non-local operation. The observation is that in a non-local operation, only the channel dimensions need to match. In the network, heterogeneous inputs are fused, while maintaining the shapes and dimensionalities that fit each input. We outline both asymmetric fusion, where one modality serves to enforce the other, and symmetric fusion variants. To further promote research into multimodal activity recognition, we introduce GloVid, a first-person activity dataset captured with video recordings and smart glove sensor readings. Experiments on GloVid show the potential of heterogeneous non-local fusion for activity recognition, outperforming individual modalities and standard fusion techniques.Description
| openaire: EC/H2020/777222/EU//ATTRACT
Keywords
Activity recognition, Datasets, Heterogenous modalities
Other note
Citation
Byvshev, P, Mettes, P & Xiao, Y 2020, Heterogeneous non-local fusion for multimodal activity recognition . in ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval . ACM, pp. 63-72, ACM International Conference on Multimedia Retrieval, Dublin, Ireland, 08/06/2020 . https://doi.org/10.1145/3372278.3390675