ϵ-fair Subspace Dimensionality Reduction: An Evaluation of Fair Principal Component Analysis and Fair Column Subset Selection

No Thumbnail Available

Files

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Bachelor's thesis
Electronic archive copy is available locally at the Harald Herlin Learning Centre. The staff of Aalto University has access to the electronic bachelor's theses by logging into Aaltodoc with their personal Aalto user ID. Read more about the availability of the bachelor's theses.

Date

2024-09-06

Department

Major/Subject

Data Science

Mcode

SCI3095

Degree programme

Aalto Bachelor’s Programme in Science and Technology

Language

en

Pages

50

Series

Abstract

The application of artificial intelligence (AI) and machine learning (ML) is widespread presently, with many decision-makers adopting machine learning as a tool to automate the decision-making processes. As human lives are being increasingly affected by machine learning in several ways, there is a growing concern about the manner in which AI handles sensitive information to make critical decisions. Susceptibility to bias and unfairness of machine learning algorithms is evident in many recent studies, either deliberate or non-deliberate. Therefore, fairness has been taken into consideration when designing a machine learning system. However, fairness in unsupervised learning has been largely neglected compared to supervised learning. In fact, unsupervised learning is frequently adopted as the very first step of machine learning pipelines so that bias might be unintentionally introduced therein. For example, dimensionality reduction is usually conducted to process high-dimensional data. As a result, the processed data might be biased against a protected group, even though the original is not. Currently, there are two primary schools of research into fair dimensionality reduction to combat such discriminatory behavior: one attempts to relate the dimensionality reduction with the downstream classification tasks, while the other approach, termed ϵ-fair subspace dimensionality reduction in this thesis, involves solely the dimensionality reduction task rather than any downstream tasks, concerning the reconstruction errors the projection incurs for sensitive classes. Thus, this thesis aims to evaluate the impact of two ϵ-fair subspace dimensionality reduction methods, fair principal component analysis and fair column subset selection, on the downstream classification task by using a novel fairness measure class called ∆A-fairness and reviews recent advances in studies of fairness definitions for dimensionality reduction. The obtained results indicate that the two fair variants do not alleviate the discriminatory behavior of the vanilla variants in the downstream classification task, and the fair CSS method even performs worse on a data set comprising numerous categorical variables.

Description

Supervisor

Korpi-Lagg, Maarit

Thesis advisor

Matakos, Antonis

Keywords

dimensionality reduction, fairness, column subset selection, principal component analysis

Other note

Citation