Data science for social good - Theory and applications in epidemics, polarization, and fair clustering
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Xiao, Han | |
dc.contributor.department | Tietotekniikan laitos | fi |
dc.contributor.department | Department of Computer Science | en |
dc.contributor.lab | Data Mining group | en |
dc.contributor.school | Perustieteiden korkeakoulu | fi |
dc.contributor.school | School of Science | en |
dc.contributor.supervisor | Gionis, Aristides, Adj. Prof., Aalto University, Department of Computer Science, Finland | |
dc.date.accessioned | 2020-08-25T09:00:05Z | |
dc.date.available | 2020-08-25T09:00:05Z | |
dc.date.defence | 2020-09-25 | |
dc.date.issued | 2020 | |
dc.description.abstract | Technical innovations have transformed our lives fundamentally, in both positive and negative ways. In this thesis, we look at the negative side. We identify three problems to tackle, namely epidemics, online polarization, and bias in automatic decision-making processes, and approach them using data-driven approaches. Thanks to globalization, our world is more interconnected than before. While trade and exchange of ideas are happening at an unprecedented rate, the rapid spread of disease is happening globally, as evidenced by the pandemic of COVID-19. To contain epidemics effectively, it is crucial to identify as many infected persons as possible. In practice, however, it is almost impossible to obtain the complete information of who is infected. We study this challenge in the context of social networks, where a disease spreads via network edges. Specifically, we assume only a subset of all infections is observed and we seek to infer who else is infected. Furthermore, we consider two different settings: (1) temporal setting, in which infection time is also observed and, (2) probabilistic setting, in which infection probability of each individual is produced.Social-media platforms enable people to share and access information easily. Meanwhile, flawed designs in these platforms contribute to the formation of online polarization. As a result, people are unlikely to adopt new ideas that differ from their beliefs, which finally leads to a polarized society. To tackle online polarization, we argue that it is important to discover who is involved in the polarization. We consider a problem setting under social networks, in which the interaction between two persons is either friendly or antagonistic. Furthermore, given some seed nodes that represent different sides of a polarized subgraph, we seek to find the polarized subgraph that is relevant to the seeds. Finding such structures can be used to understand the nature of polarization, and to mitigate the degree of polarization. Machine-learning algorithms allow the automation of many decision-making processes, for example, deciding whether to grant a loan to a loan applicant. However, unfair results that favor one demographic group (e.g., male) over another (e.g., female) are witnessed. The unfair outcomes may further affect the well-being of the mistreated groups. In this thesis, we focus on the task of data clustering, which has applications in infrastructure design and online social media. We discuss potential fairness issues in existing clustering algorithms that are designed to be fair. As a result, we propose a new fair clustering formulation that captures a novel fairness notion. For all proposed problems, we study their complexity and design algorithms whose theoretical performance is analyzed. We evaluate all proposed algorithms' efficacy in both synthetic and real-world settings. | en |
dc.format.extent | 87 + app. 55 | |
dc.format.mimetype | application/pdf | en |
dc.identifier.isbn | 978-952-60-3990-9 (electronic) | |
dc.identifier.isbn | 978-952-60-3989-3 (printed) | |
dc.identifier.issn | 1799-4942 (electronic) | |
dc.identifier.issn | 1799-4934 (printed) | |
dc.identifier.issn | 1799-4934 (ISSN-L) | |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/46243 | |
dc.identifier.urn | URN:ISBN:978-952-60-3990-9 | |
dc.language.iso | en | en |
dc.opn | Koutra, Danai, Asst. Prof., University of Michigan, USA | |
dc.publisher | Aalto University | en |
dc.publisher | Aalto-yliopisto | fi |
dc.relation.haspart | [Publication 1]: Xiao, Han; Rozenshtein, Polina; Tatti, Nikolaj; Gionis, Aristides. Reconstructing a cascade from temporal observations. In Proceedings of the 2018 SIAM International Conference on Data Mining, pages 666–674, May 2018. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201902251793. DOI: 10.1137/1.9781611975321.75 | |
dc.relation.haspart | [Publication 2]: Xiao, Han; Aslay, Çigdem; Gionis, Aristides. Robust cascade reconstruction by Steiner tree sampling. In 2018 IEEE International Conference on Data Mining, pages 637–646, November 2018. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201901141105. DOI: 10.1109/ICDM.2018.00079 | |
dc.relation.haspart | [Publication 3]: Xiao, Han; Ordozgoiti, Bruno; Gionis, Aristides. Searching for polarization in signed graphs: a local spectral approach. In The World Wide Web Conference, pages 362–372, April 2020. DOI: 10.1145/3366423.3380121 | |
dc.relation.haspart | [Publication 4]: Xiao, Han; Ordozgoiti, Bruno; Gionis, Aristides. A distance-based approach to fair clustering. Submitted for publication, July 2020 | |
dc.relation.ispartofseries | Aalto University publication series DOCTORAL DISSERTATIONS | en |
dc.relation.ispartofseries | 118/2020 | |
dc.rev | Tong, Hanghang, Assoc. Prof., University of Illinois, USA | |
dc.rev | Orecchia, Lorenzo, Asst. Prof., University of Chicago, USA | |
dc.subject.keyword | data mining | en |
dc.subject.keyword | graph mining | en |
dc.subject.keyword | social network analysis | en |
dc.subject.keyword | epidemics | en |
dc.subject.keyword | fairness | en |
dc.subject.keyword | online polarization | en |
dc.subject.keyword | algorithm design | en |
dc.subject.keyword | approximation algorithm | en |
dc.subject.other | Computer science | en |
dc.title | Data science for social good - Theory and applications in epidemics, polarization, and fair clustering | en |
dc.type | G5 Artikkeliväitöskirja | fi |
dc.type.dcmitype | text | en |
dc.type.ontasot | Doctoral dissertation (article-based) | en |
dc.type.ontasot | Väitöskirja (artikkeli) | fi |
local.aalto.acrisexportstatus | checked 2020-10-19_1211 | |
local.aalto.archive | yes | |
local.aalto.formfolder | 2020_08_25_klo_11_46 | |
local.aalto.infra | Science-IT |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- isbn9789526039909.pdf
- Size:
- 6.77 MB
- Format:
- Adobe Portable Document Format