Significance of Patterns in Data Visualisations

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorPuolamäki, Kai
dc.contributor.authorSavvides, Rafael
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.supervisorGionis, Aristides
dc.date.accessioned2019-08-25T15:12:42Z
dc.date.available2019-08-25T15:12:42Z
dc.date.issued2019-08-19
dc.description.abstractWhen a data analyst explores data visually and observes a pattern, how can he or she determine whether the pattern is real or just a random artefact of the data? This thesis addresses the problem of evaluating visual patterns observed during visual data exploration by developing a statistical significance testing framework for visual patterns. Traditionally, patterns observed during data exploration are not evaluated with statistical testing. The reason is that any hypotheses to be tested about the data must be formulated prior to viewing the data, else there is a risk of false discoveries (Type I errors). A naive solution for combining visual exploration with statistical testing involves pre-specifying all possible hypotheses about observable patterns and then applying a multiple testing correction. However, the sheer number of potential patterns results in an overly strict multiple testing correction, resulting in low statistical power. This means that true patterns in the data may fail to be discovered, i.e., there is a risk of false negatives (Type II errors). The framework proposed in this thesis is a principled statistical significance testing procedure that controls Type I errors and is not overly conservative. The framework is based on improving statistical power by leveraging the data analyst's knowledge and by utilising multiple testing corrections that are suitable for visual exploration. An empirical investigation of the framework is performed on real and synthetic tabular data and time series, using different test statistics and null distributions. The investigation shows that the proposed framework allows the significance of visual patterns to be determined during exploratory analysis.en
dc.format.extent62+1
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/39901
dc.identifier.urnURN:NBN:fi:aalto-201908254962
dc.language.isoenen
dc.locationP1fi
dc.programmeCCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)fi
dc.programme.majorMachine Learning, Data Science and Artificial Intelligencefi
dc.programme.mcodeSCI3044fi
dc.subject.keyworddataen
dc.subject.keywordvisualisationen
dc.subject.keywordpatternsen
dc.subject.keywordstatisticalen
dc.subject.keywordsignificanceen
dc.titleSignificance of Patterns in Data Visualisationsen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
master_Savvides_Rafael_2019.pdf
Size:
2.96 MB
Format:
Adobe Portable Document Format