Visual Storytelling: Captioning of Image Sequences
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2019-12-16
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
77 + 3
Series
Abstract
In the space of automated captioning, the task of visual storytelling is a dimension. Given sequences of images as inputs, visual storytelling (VIST) is about automatically generating textual narratives as outputs. Automatically producing stories for an order of pictures or video frames have several potential applications in diverse domains ranging from multimedia consumption to autonomous systems. The task has evolved over recent years and is moving into adolescence. The availability of a dedicated VIST dataset for the task has mainstreamed research for visual storytelling and related sub-tasks. This thesis work systematically reports the developments of standard captioning as a parent task with accompanying facets like dense captioning and gradually delves into the domain of visual storytelling. Existing models proposed for VIST are described by examining respective characteristics and scope. All the methods for VIST adapt from the typical encoder-decoder style design, owing to its success in addressing the standard image captioning task. Several subtle differences in the underlying intentions of these methods for approaching the VIST are subsequently summarized. Additionally, alternate perspectives around the existing approaches are explored by re-modeling and modifying their learning mechanisms. Experiments with different objective functions are reported with subjective comparisons and relevant results. Eventually, the sub-field of character relationships within storytelling is studied and a novel idea called character-centric storytelling is proposed to account for prospective characters in the extent of data modalities.Description
Supervisor
Laaksonen, JormaThesis advisor
Laaksonen, JormaKeywords
natural language processing, computer vision, deep learning, captioning, deep reinforcement learning, sequence modeling