Learning Centre

Geometry-aware relational exemplar attention for dense captioning

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Wang, Tzu Jui Julius
dc.contributor.author Tavakoli, Hamed R.
dc.contributor.author Sjöberg, Mats
dc.contributor.author Laaksonen, Jorma
dc.date.accessioned 2020-01-02T13:51:45Z
dc.date.available 2020-01-02T13:51:45Z
dc.date.issued 2019-10-15
dc.identifier.citation Wang , T J J , Tavakoli , H R , Sjöberg , M & Laaksonen , J 2019 , Geometry-aware relational exemplar attention for dense captioning . in MULEA 2019 - 1st International Workshop on Multimodal Understanding and Learning for Embodied Applications, co-located with MM 2019 . ACM , pp. 3-11 , International Workshop on Multimodal Understanding and Learning for Embodied Applications , Nice , France , 25/10/2019 . https://doi.org/10.1145/3347450.3357656 en
dc.identifier.isbn 9781450369183
dc.identifier.other PURE UUID: 00bb430a-c476-45b0-940b-7b56167cd0ea
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/00bb430a-c476-45b0-940b-7b56167cd0ea
dc.identifier.other PURE LINK: http://www.scopus.com/inward/record.url?scp=85074931985&partnerID=8YFLogxK
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/38843939/Wang_et.al_Geometry_aware_Relational.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/41898
dc.description | openaire: EC/H2020/780069/EU//MeMAD
dc.description.abstract Dense captioning (DC), which provides a comprehensive context understanding of images by describing all salient visual groundings in an image, facilitates multimodal understanding and learning. As an extension of image captioning, DC is developed to discover richer sets of visual contents and to generate captions of wider diversity and increased details. The state-of-the-art models of DC consist of three stages: (1) region proposals, (2) region classification, and (3) caption generation for each proposal. They are typically built upon the following ideas: (a) guiding the caption generation with image-level features as the context cues along with regional features and (b) refining locations of region proposals with caption information. In this work, we propose (a) a joint visual-textual criterion exploited by the region classifier that further improves both region detection and caption accuracy, and (b) a Geometryaware Relational Exemplar attention (GREatt) mechanism to relate region proposals. The former helps the model learn a region classifier by effectively exploiting both visual groundings and caption descriptions. Rather than treating each region proposal in isolation, the latter relates regions in complementary relations, i.e. contextually dependent, visually supported and geometry relations, to enrich context information in regional representations. We conduct an extensive set of experiments and demonstrate that our proposed model improves the state-of-the-art by at least +5.3% in terms of the mean average precision on the Visual Genome dataset. en
dc.format.extent 9
dc.format.extent 3-11
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD
dc.relation.ispartof International Workshop on Multimodal Understanding and Learning for Embodied Applications en
dc.relation.ispartofseries MULEA 2019 - 1st International Workshop on Multimodal Understanding and Learning for Embodied Applications, co-located with MM 2019 en
dc.rights openAccess en
dc.title Geometry-aware relational exemplar attention for dense captioning en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Computer Science
dc.contributor.department Nokia
dc.contributor.department CSC - IT Center for Science Ltd.
dc.contributor.department Professorship Kaski Samuel
dc.subject.keyword Attention
dc.subject.keyword Dense captioning
dc.subject.keyword Relationship modeling
dc.identifier.urn URN:NBN:fi:aalto-202001021009
dc.identifier.doi 10.1145/3347450.3357656
dc.type.version acceptedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics