Explaining Machine Learning Models by Generating Counterfactuals

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorGionis, Aristides
dc.contributor.authorAfonichkin, Ivan
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorGionis, Aristides
dc.date.accessioned2019-08-25T15:11:50Z
dc.date.available2019-08-25T15:11:50Z
dc.date.issued2019-08-19
dc.description.abstractNowadays, machine learning is being applied in various domains, including safety critical areas, which directly affect our lives. These systems are so complex and rely on huge amounts of training data, so that we risk to create systems that we do not understand, which might lead to undesired behavior, such as fatal decisions, discrimination, ethnic bias, racism and others. Moreover, European Union recently adopted General Data Protection Regulation (GDPR), which requires companies to provide meaningful explanation of the logic behind decisions made by machine learning systems, if these decisions affect directly a human being. We address the issue of explaining various machine-learning models by generating counterfactuals for given data points. Counterfactual is a transformation, which shows how to alternate an input object, so that a classifier predicts a different class. Counterfactuals allow us to better understand why particular classification decisions take place. They may aid in troubleshooting a classifier and identifying biases by looking at alternations needed to be made in the data instances. For example, if a loan approval application system denies a loan for a particular person, and we can find a counterfactual indicating that we need to change the gender, or the race of a person for the loan to be approved, then we have identified bias in the model and we need to study our classifier better and retrain it to avoid such undesired behavior. In this thesis we propose a new framework to generate counterfactuals for a set of data points. The proposed framework aims to find a set of similar transformations to data points, such that those changes significantly reduce the probabilities of the target class. We argue that finding similar transformations for a set of data points helps to achieve more robust explanations to classifiers. We demonstrate our framework on 3 types of data: tabular, images and texts. We evaluate our model on both simple and real-world datasets, including ImageNet and 20 NewsGroups.en
dc.format.extent47+4
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/39894
dc.identifier.urnURN:NBN:fi:aalto-201908254955
dc.language.isoenen
dc.programmeMaster’s Programme in Computer, Communication and Information Sciencesfi
dc.programme.majorMachine Learning, Data Science and Artificial Intelligencefi
dc.programme.mcodeSCI3044fi
dc.subject.keywordmachine learningen
dc.subject.keywordinterpretabilityen
dc.subject.keywordcounterfactualsen
dc.subject.keywordtransparencyen
dc.titleExplaining Machine Learning Models by Generating Counterfactualsen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
master_Afonichkin_Ivan_2019.pdf
Size:
11.56 MB
Format:
Adobe Portable Document Format