Heterogeneous non-local fusion for multimodal activity recognition

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorByvshev, Petren_US
dc.contributor.authorMettes, Pascalen_US
dc.contributor.authorXiao, Yuen_US
dc.contributor.departmentDepartment of Communications and Networkingen
dc.contributor.groupauthorMobile Cloud Computingen
dc.contributor.organizationUniversity of Amsterdamen_US
dc.date.accessioned2020-08-06T12:17:04Z
dc.date.available2020-08-06T12:17:04Z
dc.date.issued2020-06-08en_US
dc.description| openaire: EC/H2020/777222/EU//ATTRACT
dc.description.abstractIn this work, we investigate activity recognition using multimodal inputs from heterogeneous sensors. Activity recognition is commonly tackled from a single-modal perspective using videos. In case multiple signals are used, they come from the same homogeneous modality, e.g. in the case of color and optical flow. Here, we propose an activity network that fuses multimodal inputs coming from completely different and heterogeneous sensors. We frame such a heterogeneous fusion as a non-local operation. The observation is that in a non-local operation, only the channel dimensions need to match. In the network, heterogeneous inputs are fused, while maintaining the shapes and dimensionalities that fit each input. We outline both asymmetric fusion, where one modality serves to enforce the other, and symmetric fusion variants. To further promote research into multimodal activity recognition, we introduce GloVid, a first-person activity dataset captured with video recordings and smart glove sensor readings. Experiments on GloVid show the potential of heterogeneous non-local fusion for activity recognition, outperforming individual modalities and standard fusion techniques.en
dc.description.versionPeer revieweden
dc.format.extent10
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationByvshev, P, Mettes, P & Xiao, Y 2020, Heterogeneous non-local fusion for multimodal activity recognition. in ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval. ACM, pp. 63-72, ACM International Conference on Multimedia Retrieval, Dublin, Ireland, 08/06/2020. https://doi.org/10.1145/3372278.3390675en
dc.identifier.doi10.1145/3372278.3390675en_US
dc.identifier.isbn9781450370875
dc.identifier.otherPURE UUID: c413b4c5-cf34-42b5-b151-9eb8b3c0cdb6en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/c413b4c5-cf34-42b5-b151-9eb8b3c0cdb6en_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/55038890/Heterogeneous_non_local_fusion_for_multimodal.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/45585
dc.identifier.urnURN:NBN:fi:aalto-202008064544
dc.language.isoenen
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/777222/EU//ATTRACTen_US
dc.relation.fundinginfoThis work was funded by Business Finland (grant No. 1660/ 31/ 2018) and the European Unions Horizon 2020 Research and Innovation Programme (grant No. 777222). Special thanks to Clayton Frederick Souza Leite and Xiuyang Li for helping in composing the GloVid dataset.
dc.relation.ispartofACM International Conference on Multimedia Retrievalen
dc.relation.ispartofseriesICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrievalen
dc.relation.ispartofseriespp. 63-72en
dc.rightsopenAccessen
dc.subject.keywordActivity recognitionen_US
dc.subject.keywordDatasetsen_US
dc.subject.keywordHeterogenous modalitiesen_US
dc.titleHeterogeneous non-local fusion for multimodal activity recognitionen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion

Files