Data science for social good - Theory and applications in epidemics, polarization, and fair clustering

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Doctoral thesis (article-based) | Defence date: 2020-09-25

Date

2020

Major/Subject

Mcode

Degree programme

Language

en

Pages

87 + app. 55

Series

Aalto University publication series DOCTORAL DISSERTATIONS, 118/2020

Abstract

Technical innovations have transformed our lives fundamentally, in both positive and negative ways. In this thesis, we look at the negative side. We identify three problems to tackle, namely epidemics, online polarization, and bias in automatic decision-making processes, and approach them using data-driven approaches. Thanks to globalization, our world is more interconnected than before. While trade and exchange of ideas are happening at an unprecedented rate, the rapid spread of disease is happening globally, as evidenced by the pandemic of COVID-19. To contain epidemics effectively, it is crucial to identify as many infected persons as possible. In practice, however, it is almost impossible to obtain the complete information of who is infected. We study this challenge in the context of social networks, where a disease spreads via network edges. Specifically, we assume only a subset of all infections is observed and we seek to infer who else is infected. Furthermore, we consider two different settings: (1) temporal setting, in which infection time is also observed and, (2) probabilistic setting, in which infection probability of each individual is produced.Social-media platforms enable people to share and access information easily. Meanwhile, flawed designs in these platforms contribute to the formation of online polarization. As a result, people are unlikely to adopt new ideas that differ from their beliefs, which finally leads to a polarized society. To tackle online polarization, we argue that it is important to discover who is involved in the polarization. We consider a problem setting under social networks, in which the interaction between two persons is either friendly or antagonistic. Furthermore, given some seed nodes that represent different sides of a polarized subgraph, we seek to find the polarized subgraph that is relevant to the seeds. Finding such structures can be used to understand the nature of polarization, and to mitigate the degree of polarization. Machine-learning algorithms allow the automation of many decision-making processes, for example, deciding whether to grant a loan to a loan applicant. However, unfair results that favor one demographic group (e.g., male) over another (e.g., female) are witnessed. The unfair outcomes may further affect the well-being of the mistreated groups. In this thesis, we focus on the task of data clustering, which has applications in infrastructure design and online social media. We discuss potential fairness issues in existing clustering algorithms that are designed to be fair. As a result, we propose a new fair clustering formulation that captures a novel fairness notion. For all proposed problems, we study their complexity and design algorithms whose theoretical performance is analyzed. We evaluate all proposed algorithms' efficacy in both synthetic and real-world settings.

Description

Supervising professor

Gionis, Aristides, Adj. Prof., Aalto University, Department of Computer Science, Finland

Keywords

data mining, graph mining, social network analysis, epidemics, fairness, online polarization, algorithm design, approximation algorithm

Other note

Parts

  • [Publication 1]: Xiao, Han; Rozenshtein, Polina; Tatti, Nikolaj; Gionis, Aristides. Reconstructing a cascade from temporal observations. In Proceedings of the 2018 SIAM International Conference on Data Mining, pages 666–674, May 2018.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201902251793
    DOI: 10.1137/1.9781611975321.75 View at publisher
  • [Publication 2]: Xiao, Han; Aslay, Çigdem; Gionis, Aristides. Robust cascade reconstruction by Steiner tree sampling. In 2018 IEEE International Conference on Data Mining, pages 637–646, November 2018.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201901141105
    DOI: 10.1109/ICDM.2018.00079 View at publisher
  • [Publication 3]: Xiao, Han; Ordozgoiti, Bruno; Gionis, Aristides. Searching for polarization in signed graphs: a local spectral approach. In The World Wide Web Conference, pages 362–372, April 2020.
    DOI: 10.1145/3366423.3380121 View at publisher
  • [Publication 4]: Xiao, Han; Ordozgoiti, Bruno; Gionis, Aristides. A distance-based approach to fair clustering. Submitted for publication, July 2020

Citation