Deep Contextual Attention for Human-Object Interaction Detection

Loading...
Thumbnail Image

Access rights

openAccess

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

2020-02

Major/Subject

Mcode

Degree programme

Language

en

Pages

5694-5702

Series

Proceedings of the International Conference on Computer Vision (ICCV2019), Proceedings of the IEEE International Conference on Computer Vision, Volume 2019-October

Abstract

This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete human parsing. We formulate the approach as a neural information fusion framework. Our model assembles the information from three inference processes over the hierarchy: direct inference (directly predicting each part of a human body using image information), bottom-up inference (assembling knowledge from constituent parts), and top-down inference (leveraging context from parent nodes). The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively. In addition, the fusion of multi-source information is conditioned on the inputs, i.e., by estimating and considering the confidence of the sources. The whole model is end-to-end differentiable, explicitly modeling information flows and structures. Our approach is extensively evaluated on four popular datasets, outperforming the state-of-the-arts in all cases, with a fast processing speed of 23fps. Our code and results have been released to help ease future research in this direction.

Description

ETSI ISBN!!!

Keywords

Other note

Citation

Wang, T, Anwer, R M, Khan, M H, Khan, F S, Pang, Y, Shao, L & Laaksonen, J 2020, Deep Contextual Attention for Human-Object Interaction Detection . in Proceedings of the International Conference on Computer Vision (ICCV2019) ., 9008846, Proceedings of the IEEE International Conference on Computer Vision, vol. 2019-October, IEEE, pp. 5693-5701, IEEE International Conference on Computer Vision, Seoul, Korea, Republic of, 27/10/2019 . https://doi.org/10.1109/ICCV.2019.00579