Learning Centre

Comparing human and automated approaches to visual storytelling

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Braun, Sabine
dc.contributor.author Starr, Kim
dc.contributor.author Laaksonen, Jorma
dc.date.accessioned 2021-03-10T07:26:33Z
dc.date.available 2021-03-10T07:26:33Z
dc.date.issued 2020
dc.identifier.citation Braun , S , Starr , K & Laaksonen , J 2020 , Comparing human and automated approaches to visual storytelling . in Innovation in Audio Description Research . IATIS Yearbook , Routledge . https://doi.org/10.4324/9781003052968 en
dc.identifier.isbn 9781138356672
dc.identifier.isbn 9781003052968
dc.identifier.other PURE UUID: 42e1a14d-277a-4d2d-aca9-7fcfb4be9469
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/42e1a14d-277a-4d2d-aca9-7fcfb4be9469
dc.identifier.other PURE LINK: https://www.taylorfrancis.com/chapters/comparing-human-automated-approaches-visual-storytelling-sabine-braun-kim-starr-jorma-laaksonen/e/10.4324/9781003052968-9?context=ubx&refId=e08279a5-1d2d-484e-a816-458c13162760
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/56744266/SCI_Braun_Starr_Laaksonen_Comparing_human_memad.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/102951
dc.description | openaire: EC/H2020/780069/EU//MeMAD
dc.description.abstract This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content ,whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. While computer vision communities have intensified research into the automatic generation of video descriptions (Bernardi et al., 2016), the automation of still image captioning remains a challenge in terms of accuracy (Husain and Bober, 2016). Moving images pose additional challenges linked to temporality, including co-referencing (Rohrbach et al., 2017) and other features of narrative continuity (Huang et al., 2016). Machine-generated descriptions are currently less sophisticated than their human equivalents, and frequently incoherent or incorrect. By contrast, human descriptions are more elaborate and reliable but are expensive to produce. Nevertheless, they offer information about visual and auditory elements in audiovisual content that can be exploited for research into machine training. Based on our research conducted in the EU-funded MeMAD project, this chapter outlines a methodological approach for a systematic comparison of human- and machine-generated video descriptions, drawing on corpus-based and discourse-based approaches, with a view to identifying key characteristics and patterns in both types of description, and exploiting human knowledge about video description for machine training. This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content, whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. A model for machine-generated content description is therefore likely to be a more achievable goal in the shorter term than a model for generating elaborate audio descriptions. Relevance Theory (RT) focuses on the human ability to derive meaning through inferential processes. RT asserts that these processes are highly inferential, drawing on common knowledge and cultural experience, and that they are guided by the human tendency to maximise relevance and assumption that speakers/storytellers normally choose the optimally relevant way of communicating their intentions. Moving on from basic comprehension of events to interpretation and conjecture requires the viewer to employ ‘extradiegetic’ references such as social convention, cultural norms and life experience. en
dc.format.extent 38
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD
dc.relation.ispartofseries Innovation in Audio Description Research en
dc.relation.ispartofseries IATIS Yearbook en
dc.rights openAccess en
dc.title Comparing human and automated approaches to visual storytelling en
dc.type A3 Kirjan tai muun kokoomateoksen osa fi
dc.description.version Peer reviewed en
dc.contributor.department University of Surrey
dc.contributor.department Centre of Excellence in Computational Inference, COIN
dc.contributor.department Department of Computer Science en
dc.identifier.urn URN:NBN:fi:aalto-202103102237
dc.identifier.doi 10.4324/9781003052968
dc.type.version acceptedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics