Temporal modelling of first-person actions using hand-centric verb and object streams

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorGökce, Zeynepen_US
dc.contributor.authorPehlivan, Selenen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorLecturer Laaksonen Jorma groupen
dc.contributor.organizationTED Universityen_US
dc.date.accessioned2022-05-10T10:34:12Z
dc.date.available2022-05-10T10:34:12Z
dc.date.embargoinfo:eu-repo/date/embargoEnd/2023-08-26en_US
dc.date.issued2021-11en_US
dc.descriptionPublisher Copyright: © 2021 Elsevier B.V.
dc.description.abstractAnalysis of first-person (egocentric) videos involving human actions could help in the solutions of many problems. These videos include a large number of fine-grained action categories with hand–object interactions. In this paper, a compositional verb–noun model including two complementary temporal streams is proposed with various fusion strategies to recognize egocentric actions. The first step is based on construction of verb and object video models as decomposition of actions with a special attention on hands. Particularly, the verb video model that is the spatial–temporal encoding of hand actions and the object video model that is the object scores with hand–object layout are represented as two separate pathways. The second step is the fusion stage to identify action category, where distinct verb and object models are combined to give their action judgments. We propose fusion strategies with recurrent steps collecting verb and object label judgments along a temporal video sequence. We evaluate recognition performances for individual verb and object models; and we present extensive experimental evaluations for action recognition over recurrent-based fusion approaches on the EGTEA Gaze+ dataset.en
dc.description.versionPeer revieweden
dc.format.extent17
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationGökce, Z & Pehlivan, S 2021, 'Temporal modelling of first-person actions using hand-centric verb and object streams', SIGNAL PROCESSING: IMAGE COMMUNICATION, vol. 99, 116436. https://doi.org/10.1016/j.image.2021.116436en
dc.identifier.doi10.1016/j.image.2021.116436en_US
dc.identifier.issn0923-5965
dc.identifier.otherPURE UUID: b69bb7a0-2198-49ae-9a2f-47c32c5aa8e8en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/b69bb7a0-2198-49ae-9a2f-47c32c5aa8e8en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85113626751&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/82535978/Temporal_modelling_of_first_person_actions_using_hand_centric_verb_and_object_streams.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/114176
dc.identifier.urnURN:NBN:fi:aalto-202205103040
dc.language.isoenen
dc.publisherElsevier
dc.relation.ispartofseriesSIGNAL PROCESSING: IMAGE COMMUNICATIONen
dc.relation.ispartofseriesVolume 99en
dc.rightsopenAccessen
dc.subject.keywordAction recognitionen_US
dc.subject.keywordEgocentric visionen_US
dc.subject.keywordFirst-person visionen_US
dc.subject.keywordRNNen_US
dc.subject.keywordTemporal modelsen_US
dc.titleTemporal modelling of first-person actions using hand-centric verb and object streamsen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionacceptedVersion

Files