Paying Attention to Descriptions Generated by Image Captioning Models

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Rezazadegan Tavakoli, Hamed
dc.contributor.author Shetty, Rakshith
dc.contributor.author Borji, Ali
dc.contributor.author Laaksonen, Jorma
dc.date.accessioned 2018-02-09T09:53:11Z
dc.date.available 2018-02-09T09:53:11Z
dc.date.issued 2017
dc.identifier.citation Rezazadegan Tavakoli , H , Shetty , R , Borji , A & Laaksonen , J 2017 , Paying Attention to Descriptions Generated by Image Captioning Models . in 2017 IEEE International Conference on Computer Vision (ICCV) . IEEE International Conference on Computer Vision , IEEE , pp. 2506-2515 , International Conference on Computer Vision , Venice , Italy , 22/10/2017 . DOI: 10.1109/ICCV.2017.272 en
dc.identifier.isbn 978-1-5386-1032-9
dc.identifier.issn IEEE
dc.identifier.issn 2380-7504
dc.identifier.other PURE UUID: 0c4af4ad-e088-4596-ad0a-df73e3751d43
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/paying-attention-to-descriptions-generated-by-image-captioning-models(0c4af4ad-e088-4596-ad0a-df73e3751d43).html
dc.identifier.other PURE LINK: http://openaccess.thecvf.com/content_iccv_2017/html/Tavakoli_Paying_Attention_to_ICCV_2017_paper.html
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/17145086/Tavakoli_Paying_Attention_to_ICCV_2017_paper.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/29732
dc.description.abstract To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene. In this paper, we study the agreement between bottom-up saliency-based visual attention and object referrals in scene description constructs. We investigate the properties of human-written descriptions and machine-generated ones. We then propose a saliency-boosted image captioning model in order to investigate benefits from low-level cues in language models. We learn that (1) humans mention more salient objects earlier than less salient ones in their descriptions, (2) the better a captioning model performs, the better attention agreement it has with human descriptions, (3) the proposed saliencyboosted model, compared to its baseline form, does not improve significantly on the MS COCO database, indicating explicit bottom-up boosting does not help when the task is well learnt and tuned on a data, (4) a better generalization is, however, observed for the saliency-boosted model on unseen data. en
dc.format.extent 2506-2515
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartof International Conference on Computer Vision en
dc.relation.ispartofseries 2017 IEEE International Conference on Computer Vision (ICCV) en
dc.relation.ispartofseries IEEE International Conference on Computer Vision en
dc.rights openAccess en
dc.subject.other 113 Computer and information sciences en
dc.title Paying Attention to Descriptions Generated by Image Captioning Models en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Computer Science
dc.contributor.department Max Planck Institute for Informatics
dc.contributor.department University of Central Florida
dc.subject.keyword Visualization
dc.subject.keyword Measurement
dc.subject.keyword Data models
dc.subject.keyword Grammar
dc.subject.keyword Computational modeling
dc.subject.keyword Computer science
dc.subject.keyword 113 Computer and information sciences
dc.identifier.urn URN:NBN:fi:aalto-201802091228
dc.identifier.doi 10.1109/ICCV.2017.272
dc.type.version publishedVersion


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account