Learning Centre

Explaining Machine Learning Models by Generating Counterfactuals

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Gionis, Aristides
dc.contributor.author Afonichkin, Ivan
dc.date.accessioned 2019-08-25T15:11:50Z
dc.date.available 2019-08-25T15:11:50Z
dc.date.issued 2019-08-19
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/39894
dc.description.abstract Nowadays, machine learning is being applied in various domains, including safety critical areas, which directly affect our lives. These systems are so complex and rely on huge amounts of training data, so that we risk to create systems that we do not understand, which might lead to undesired behavior, such as fatal decisions, discrimination, ethnic bias, racism and others. Moreover, European Union recently adopted General Data Protection Regulation (GDPR), which requires companies to provide meaningful explanation of the logic behind decisions made by machine learning systems, if these decisions affect directly a human being. We address the issue of explaining various machine-learning models by generating counterfactuals for given data points. Counterfactual is a transformation, which shows how to alternate an input object, so that a classifier predicts a different class. Counterfactuals allow us to better understand why particular classification decisions take place. They may aid in troubleshooting a classifier and identifying biases by looking at alternations needed to be made in the data instances. For example, if a loan approval application system denies a loan for a particular person, and we can find a counterfactual indicating that we need to change the gender, or the race of a person for the loan to be approved, then we have identified bias in the model and we need to study our classifier better and retrain it to avoid such undesired behavior. In this thesis we propose a new framework to generate counterfactuals for a set of data points. The proposed framework aims to find a set of similar transformations to data points, such that those changes significantly reduce the probabilities of the target class. We argue that finding similar transformations for a set of data points helps to achieve more robust explanations to classifiers. We demonstrate our framework on 3 types of data: tabular, images and texts. We evaluate our model on both simple and real-world datasets, including ImageNet and 20 NewsGroups. en
dc.format.extent 47+4
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.title Explaining Machine Learning Models by Generating Counterfactuals en
dc.type G2 Pro gradu, diplomityö fi
dc.contributor.school Perustieteiden korkeakoulu fi
dc.subject.keyword machine learning en
dc.subject.keyword interpretability en
dc.subject.keyword counterfactuals en
dc.subject.keyword transparency en
dc.identifier.urn URN:NBN:fi:aalto-201908254955
dc.programme.major Machine Learning, Data Science and Artificial Intelligence fi
dc.programme.mcode SCI3044 fi
dc.type.ontasot Master's thesis en
dc.type.ontasot Diplomityö fi
dc.contributor.supervisor Gionis, Aristides
dc.programme Master’s Programme in Computer, Communication and Information Sciences fi
local.aalto.electroniconly yes
local.aalto.openaccess yes

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive

Advanced Search

article-iconSubmit a publication