Significance of Patterns in Data Visualisations

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Authors

Savvides, Rafael

Date

2019-08-19

Department

Major/Subject

Machine Learning, Data Science and Artificial Intelligence

Mcode

SCI3044

Degree programme

CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)

Language

en

Pages

62+1

Series

Abstract

When a data analyst explores data visually and observes a pattern, how can he or she determine whether the pattern is real or just a random artefact of the data? This thesis addresses the problem of evaluating visual patterns observed during visual data exploration by developing a statistical significance testing framework for visual patterns. Traditionally, patterns observed during data exploration are not evaluated with statistical testing. The reason is that any hypotheses to be tested about the data must be formulated prior to viewing the data, else there is a risk of false discoveries (Type I errors). A naive solution for combining visual exploration with statistical testing involves pre-specifying all possible hypotheses about observable patterns and then applying a multiple testing correction. However, the sheer number of potential patterns results in an overly strict multiple testing correction, resulting in low statistical power. This means that true patterns in the data may fail to be discovered, i.e., there is a risk of false negatives (Type II errors). The framework proposed in this thesis is a principled statistical significance testing procedure that controls Type I errors and is not overly conservative. The framework is based on improving statistical power by leveraging the data analyst's knowledge and by utilising multiple testing corrections that are suitable for visual exploration. An empirical investigation of the framework is performed on real and synthetic tabular data and time series, using different test statistics and null distributions. The investigation shows that the proposed framework allows the significance of visual patterns to be determined during exploratory analysis.

Description

Supervisor

Gionis, Aristides

Thesis advisor

Puolamäki, Kai

Keywords

data, visualisation, patterns, statistical, significance

Other note

Citation