Data science for social good - Theory and applications in epidemics, polarization, and fair clustering

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorXiao, Han
dc.contributor.departmentTietotekniikan laitosfi
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.labData Mining groupen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorGionis, Aristides, Adj. Prof., Aalto University, Department of Computer Science, Finland
dc.date.accessioned2020-08-25T09:00:05Z
dc.date.available2020-08-25T09:00:05Z
dc.date.defence2020-09-25
dc.date.issued2020
dc.description.abstractTechnical innovations have transformed our lives fundamentally, in both positive and negative ways. In this thesis, we look at the negative side. We identify three problems to tackle, namely epidemics, online polarization, and bias in automatic decision-making processes, and approach them using data-driven approaches. Thanks to globalization, our world is more interconnected than before. While trade and exchange of ideas are happening at an unprecedented rate, the rapid spread of disease is happening globally, as evidenced by the pandemic of COVID-19. To contain epidemics effectively, it is crucial to identify as many infected persons as possible. In practice, however, it is almost impossible to obtain the complete information of who is infected. We study this challenge in the context of social networks, where a disease spreads via network edges. Specifically, we assume only a subset of all infections is observed and we seek to infer who else is infected. Furthermore, we consider two different settings: (1) temporal setting, in which infection time is also observed and, (2) probabilistic setting, in which infection probability of each individual is produced.Social-media platforms enable people to share and access information easily. Meanwhile, flawed designs in these platforms contribute to the formation of online polarization. As a result, people are unlikely to adopt new ideas that differ from their beliefs, which finally leads to a polarized society. To tackle online polarization, we argue that it is important to discover who is involved in the polarization. We consider a problem setting under social networks, in which the interaction between two persons is either friendly or antagonistic. Furthermore, given some seed nodes that represent different sides of a polarized subgraph, we seek to find the polarized subgraph that is relevant to the seeds. Finding such structures can be used to understand the nature of polarization, and to mitigate the degree of polarization. Machine-learning algorithms allow the automation of many decision-making processes, for example, deciding whether to grant a loan to a loan applicant. However, unfair results that favor one demographic group (e.g., male) over another (e.g., female) are witnessed. The unfair outcomes may further affect the well-being of the mistreated groups. In this thesis, we focus on the task of data clustering, which has applications in infrastructure design and online social media. We discuss potential fairness issues in existing clustering algorithms that are designed to be fair. As a result, we propose a new fair clustering formulation that captures a novel fairness notion. For all proposed problems, we study their complexity and design algorithms whose theoretical performance is analyzed. We evaluate all proposed algorithms' efficacy in both synthetic and real-world settings.en
dc.format.extent87 + app. 55
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-60-3990-9 (electronic)
dc.identifier.isbn978-952-60-3989-3 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/46243
dc.identifier.urnURN:ISBN:978-952-60-3990-9
dc.language.isoenen
dc.opnKoutra, Danai, Asst. Prof., University of Michigan, USA
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Xiao, Han; Rozenshtein, Polina; Tatti, Nikolaj; Gionis, Aristides. Reconstructing a cascade from temporal observations. In Proceedings of the 2018 SIAM International Conference on Data Mining, pages 666–674, May 2018. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201902251793. DOI: 10.1137/1.9781611975321.75
dc.relation.haspart[Publication 2]: Xiao, Han; Aslay, Çigdem; Gionis, Aristides. Robust cascade reconstruction by Steiner tree sampling. In 2018 IEEE International Conference on Data Mining, pages 637–646, November 2018. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201901141105. DOI: 10.1109/ICDM.2018.00079
dc.relation.haspart[Publication 3]: Xiao, Han; Ordozgoiti, Bruno; Gionis, Aristides. Searching for polarization in signed graphs: a local spectral approach. In The World Wide Web Conference, pages 362–372, April 2020. DOI: 10.1145/3366423.3380121
dc.relation.haspart[Publication 4]: Xiao, Han; Ordozgoiti, Bruno; Gionis, Aristides. A distance-based approach to fair clustering. Submitted for publication, July 2020
dc.relation.ispartofseriesAalto University publication series DOCTORAL DISSERTATIONSen
dc.relation.ispartofseries118/2020
dc.revTong, Hanghang, Assoc. Prof., University of Illinois, USA
dc.revOrecchia, Lorenzo, Asst. Prof., University of Chicago, USA
dc.subject.keyworddata miningen
dc.subject.keywordgraph miningen
dc.subject.keywordsocial network analysisen
dc.subject.keywordepidemicsen
dc.subject.keywordfairnessen
dc.subject.keywordonline polarizationen
dc.subject.keywordalgorithm designen
dc.subject.keywordapproximation algorithmen
dc.subject.otherComputer scienceen
dc.titleData science for social good - Theory and applications in epidemics, polarization, and fair clusteringen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2020-10-19_1211
local.aalto.archiveyes
local.aalto.formfolder2020_08_25_klo_11_46
local.aalto.infraScience-IT

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789526039909.pdf
Size:
6.77 MB
Format:
Adobe Portable Document Format