Visual Storytelling: Captioning of Image Sequences

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorLaaksonen, Jorma
dc.contributor.authorSurikuchi, Aditya
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorLaaksonen, Jorma
dc.date.accessioned2019-12-22T18:17:37Z
dc.date.available2019-12-22T18:17:37Z
dc.date.issued2019-12-16
dc.description.abstractIn the space of automated captioning, the task of visual storytelling is a dimension. Given sequences of images as inputs, visual storytelling (VIST) is about automatically generating textual narratives as outputs. Automatically producing stories for an order of pictures or video frames have several potential applications in diverse domains ranging from multimedia consumption to autonomous systems. The task has evolved over recent years and is moving into adolescence. The availability of a dedicated VIST dataset for the task has mainstreamed research for visual storytelling and related sub-tasks. This thesis work systematically reports the developments of standard captioning as a parent task with accompanying facets like dense captioning and gradually delves into the domain of visual storytelling. Existing models proposed for VIST are described by examining respective characteristics and scope. All the methods for VIST adapt from the typical encoder-decoder style design, owing to its success in addressing the standard image captioning task. Several subtle differences in the underlying intentions of these methods for approaching the VIST are subsequently summarized. Additionally, alternate perspectives around the existing approaches are explored by re-modeling and modifying their learning mechanisms. Experiments with different objective functions are reported with subjective comparisons and relevant results. Eventually, the sub-field of character relationships within storytelling is studied and a novel idea called character-centric storytelling is proposed to account for prospective characters in the extent of data modalities.en
dc.format.extent77 + 3
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/41756
dc.identifier.urnURN:NBN:fi:aalto-201912226705
dc.language.isoenen
dc.programmeMaster’s Programme in Computer, Communication and Information Sciencesfi
dc.programme.majorMachine Learning, Data Science and Artificial Intelligencefi
dc.programme.mcodeSCI3044fi
dc.subject.keywordnatural language processingen
dc.subject.keywordcomputer visionen
dc.subject.keyworddeep learningen
dc.subject.keywordcaptioningen
dc.subject.keyworddeep reinforcement learningen
dc.subject.keywordsequence modelingen
dc.titleVisual Storytelling: Captioning of Image Sequencesen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
master_Surikuchi_Aditya_2019.pdf
Size:
54.94 MB
Format:
Adobe Portable Document Format