Comparing human and automated approaches to visual storytelling

Loading...
Thumbnail Image
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A3 Kirjan tai muun kokoomateoksen osa
Date
2020
Major/Subject
Mcode
Degree programme
Language
en
Pages
38
Series
Innovation in Audio Description Research, IATIS Yearbook
Abstract
This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content ,whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. While computer vision communities have intensified research into the automatic generation of video descriptions (Bernardi et al., 2016), the automation of still image captioning remains a challenge in terms of accuracy (Husain and Bober, 2016). Moving images pose additional challenges linked to temporality, including co-referencing (Rohrbach et al., 2017) and other features of narrative continuity (Huang et al., 2016). Machine-generated descriptions are currently less sophisticated than their human equivalents, and frequently incoherent or incorrect. By contrast, human descriptions are more elaborate and reliable but are expensive to produce. Nevertheless, they offer information about visual and auditory elements in audiovisual content that can be exploited for research into machine training. Based on our research conducted in the EU-funded MeMAD project, this chapter outlines a methodological approach for a systematic comparison of human- and machine-generated video descriptions, drawing on corpus-based and discourse-based approaches, with a view to identifying key characteristics and patterns in both types of description, and exploiting human knowledge about video description for machine training. This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content, whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. A model for machine-generated content description is therefore likely to be a more achievable goal in the shorter term than a model for generating elaborate audio descriptions. Relevance Theory (RT) focuses on the human ability to derive meaning through inferential processes. RT asserts that these processes are highly inferential, drawing on common knowledge and cultural experience, and that they are guided by the human tendency to maximise relevance and assumption that speakers/storytellers normally choose the optimally relevant way of communicating their intentions. Moving on from basic comprehension of events to interpretation and conjecture requires the viewer to employ ‘extradiegetic’ references such as social convention, cultural norms and life experience.
Description
| openaire: EC/H2020/780069/EU//MeMAD
Keywords
Other note
Citation
Braun, S, Starr, K & Laaksonen, J 2020, Comparing human and automated approaches to visual storytelling . in Innovation in Audio Description Research . IATIS Yearbook, Routledge . https://doi.org/10.4324/9781003052968