Missing fairness: The discriminatory effect of missing values in datasets on fairness in machine learning

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorŽliobaitė, Indrė
dc.contributor.authorFricke, Christian
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorGionis, Aristides
dc.date.accessioned2020-12-20T18:02:15Z
dc.date.available2020-12-20T18:02:15Z
dc.date.issued2020-12-14
dc.description.abstractAs we enter a new decade, more and more governance in our society is assisted by autonomous decision-making systems, enabled by artificial intelligence and machine learning. Recently, an increasing amount of academic and general-audience publications have made aware of negative side effects accompanying such systems under the umbrella term of algorithmic fairness. While most of the articles focus on a small number of well-studied cases, to the best of our knowledge, none have dealt with large real-world datasets one might use to train models on in an industrial setting. Datasets are collections of observations recorded by humans, including many different forms of biases. Many proposed solutions to combat the structural discrimination focus on the detection and mitigation of unfairness in datasets and machine learning models. The readily available implementations and services adhere to the common practice of complete-case analysis by filtering samples containing missing values. This often leads to ignoring large portions of recorded data, further increasing subgroup imbalances and biases. In this thesis, we analyze a sparse real-world dataset and the effect of missing values on the predictive power and measurable discrimination of models trained upon it. We start with a brief review of the current literature on the topic of algorithmic fairness, that is, causes of unfairness in form of various biases, as well as the most current fairness definitions and measures. For our dataset, we acquired self-reported law school admissions data based on a popular internet platform in the USA. We explore patterns of missingness in the data and ways of imputing values based on established methods prior to training and tuning our models. Finally, we evaluate the performance of the models with respect to well-established fairness measures and detect a significant decrease of discriminatory biases for the subset of data with missing values.en
dc.format.extent62+4
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/97501
dc.identifier.urnURN:NBN:fi:aalto-2020122056328
dc.language.isoenen
dc.programmeMaster's Programme in Computer, Communication and Information Sciencesfi
dc.programme.majorMachine Learning, Data Science and Artificial Intelligencefi
dc.programme.mcodeSCI3044fi
dc.subject.keywordfairnessen
dc.subject.keywordmissing valuesen
dc.subject.keyworddata imputationen
dc.subject.keywordalgorithmic biasen
dc.titleMissing fairness: The discriminatory effect of missing values in datasets on fairness in machine learningen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
master_Fricke_Christian_2020.pdf
Size:
1.44 MB
Format:
Adobe Portable Document Format