Scalable algorithm designs for mining massive datasets

dc.contributorAalto Universityen
dc.contributor.advisorOrdozgoiti, Bruno, Lecturer, Queen Mary University of London, UK
dc.contributor.authorThejaswi, Suhas
dc.contributor.departmentTietotekniikan laitosfi
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorGionis, Aristides, Adj. Prof., Aalto University, Department of Computer Science, Finland, KTH Royal Institute of Technology, Sweden
dc.description.abstractCombinatorial problems are everywhere, starting from operations-research problems such as finding connectivity for efficient network design, placement of facilities such that the travelling distance from clients to facilities is minimized, detecting patterns in database transactions and genetic material, contact tracing in epidemic- propagation networks, and efficient advertising in social media, among others. Many of these aforementioned problems are known to be computationally intractable, meaning that, no efficient solutions are known to solve these problems. Despite being intractable, these problems have enormous impact on the society and our day-to-day activities, which prompts us to find effective solutions that can practically solve large instances of these real-world problem scenarios. A broad topic of this thesis is to design algorithms that can practically solve a set of computationally-intractable problems under restrictive settings, using mathematical abstractions and computational tools. In particular, the problems of our interest include pattern detection in graphs and data clustering with fairness constraints. Pattern detection in graphs is the problem of deciding the existence of a smaller subgraph in a usually larger graph. The problem has applications in understanding complex biological and metabolic systems, discovering controversies in social media discussions, studying epidemic propagation in contact networks, among others. Clustering with fairness constraints is the problem of grouping similar objects together in to clusters, while simultaneously taking a fairness notion into consideration. The fairness notion arise from different societal norms, including, avoiding discrimination based on gender, race, ethnicity and financial status, among others. This thesis is organized around many algorithmic techniques and computational tools for the design of algorithms that can practically solve large instances of intractable problems. These computational tools include, parameterized algorithms, approximation algorithms, heuristics, and implementation engineering. We introduce novel-problem formulations in the field of pattern detection and data clustering, analyze the computational complexity of problems that we introduce, and present algorithmic solutions that can solve these problems under restrictive settings, that is, when certain parameters of the problem are small. In cases where it is not possible to design efficient algorithmic solutions, we present careful validations in the form of computational-complexity results, lower bound on the running time, and lower bound on the approximation factor. Finally, we present carefully engineered implementation of our proposed algorithmic solutions, and validate our scalability claims using both synthetic and real-world datasets.en
dc.format.extent78 + app 132
dc.identifier.isbn978-952-64-0943-6 (electronic)
dc.identifier.isbn978-952-64-0942-9 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.opnOthon Michail, Senior Lecturer, University of Liverpool, UK
dc.publisherAalto Universityen
dc.relation.haspart[Publication 1]: Petteri Kaski, Juho Lauri, Suhas Thejaswi. Engineering motif search for large motifs. In Proceedings of the Symposium of Experimental Algorithms, Pages 28:1–18:19, August 2018. Full text in Acris/Aaltodoc: DOI: 10.4230/LIPIcs.SEA.2018.28
dc.relation.haspart[Publication 2]: Suhas Thejaswi, Aristides Gionis, Juho Lauri. Finding path motifs in large temporal graphs using algebraic fingerprints. Big Data, Specialissue — Best of SIAM Data Mining 2020, Pages 335–362, Volume 8, Issue 5, October 2020. Full textin Acris/Aaltodoc: DOI: 10.1089/big.2020.0078
dc.relation.haspart[Publication 3]: Suhas Thejaswi, Juho Lauri, Aristides Gionis. Restless reachability problems in temporal graphs. Submitted for publication, October 2021
dc.relation.haspart[Publication 4]: Suhas Thejaswi, Bruno Ordozgoiti, Aristides Gionis. Diversity aware k-median: Clustering with fair center representation. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery from Data, Pages 765–780, April 2021. DOI: 10.1007/978-3-030-86520-7_47
dc.relation.haspart[Publication 5]: Suhas Thejaswi, Ameet Gadekar, Bruno Ordozgoiti, Michal Osadnik. Clustering with fair center representation: parameterized approximation algorithms and heuristics. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Pages 1749–1759, August 2022
dc.relation.ispartofseriesAalto University publication series DOCTORAL THESESen
dc.revArijt, Khan, Assoc. Prof., Aalborg University, Denmark
dc.revOthon Michail, Senior Lecturer, University of Liverpool, UK
dc.subject.keywordalgorithmic biasen
dc.subject.keywordalgorithmic fairnessen
dc.subject.keywordalgebraic fingerprintingen
dc.subject.keywordcombinatorial optimizationen
dc.subject.keywordparameterized algorithmsen
dc.subject.keywordscalable algorithmsen
dc.subject.otherComputer scienceen
dc.titleScalable algorithm designs for mining massive datasetsen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2022-10-21_0910
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
1.15 MB
Adobe Portable Document Format