Scalable algorithm designs for mining massive datasets

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2022-10-21
Degree programme
78 + app 132
Aalto University publication series DOCTORAL THESES, 131/2022
Combinatorial problems are everywhere, starting from operations-research problems such as finding connectivity for efficient network design, placement of facilities such that the travelling distance from clients to facilities is minimized, detecting patterns in database transactions and genetic material, contact tracing in epidemic- propagation networks, and efficient advertising in social media, among others. Many of these aforementioned problems are known to be computationally intractable, meaning that, no efficient solutions are known to solve these problems. Despite being intractable, these problems have enormous impact on the society and our day-to-day activities, which prompts us to find effective solutions that can practically solve large instances of these real-world problem scenarios. A broad topic of this thesis is to design algorithms that can practically solve a set of computationally-intractable problems under restrictive settings, using mathematical abstractions and computational tools. In particular, the problems of our interest include pattern detection in graphs and data clustering with fairness constraints. Pattern detection in graphs is the problem of deciding the existence of a smaller subgraph in a usually larger graph. The problem has applications in understanding complex biological and metabolic systems, discovering controversies in social media discussions, studying epidemic propagation in contact networks, among others. Clustering with fairness constraints is the problem of grouping similar objects together in to clusters, while simultaneously taking a fairness notion into consideration. The fairness notion arise from different societal norms, including, avoiding discrimination based on gender, race, ethnicity and financial status, among others. This thesis is organized around many algorithmic techniques and computational tools for the design of algorithms that can practically solve large instances of intractable problems. These computational tools include, parameterized algorithms, approximation algorithms, heuristics, and implementation engineering. We introduce novel-problem formulations in the field of pattern detection and data clustering, analyze the computational complexity of problems that we introduce, and present algorithmic solutions that can solve these problems under restrictive settings, that is, when certain parameters of the problem are small. In cases where it is not possible to design efficient algorithmic solutions, we present careful validations in the form of computational-complexity results, lower bound on the running time, and lower bound on the approximation factor. Finally, we present carefully engineered implementation of our proposed algorithmic solutions, and validate our scalability claims using both synthetic and real-world datasets.
Supervising professor
Gionis, Aristides, Adj. Prof., Aalto University, Department of Computer Science, Finland, KTH Royal Institute of Technology, Sweden
Thesis advisor
Ordozgoiti, Bruno, Lecturer, Queen Mary University of London, UK
algorithmic bias, algorithmic fairness, algebraic fingerprinting, combinatorial optimization, parameterized algorithms, scalable algorithms
Other note
  • [Publication 1]: Petteri Kaski, Juho Lauri, Suhas Thejaswi. Engineering motif search for large motifs. In Proceedings of the Symposium of Experimental Algorithms, Pages 28:1–18:19, August 2018.
    Full text in Acris/Aaltodoc:
    DOI: 10.4230/LIPIcs.SEA.2018.28 View at publisher
  • [Publication 2]: Suhas Thejaswi, Aristides Gionis, Juho Lauri. Finding path motifs in large temporal graphs using algebraic fingerprints. Big Data, Specialissue — Best of SIAM Data Mining 2020, Pages 335–362, Volume 8, Issue 5, October 2020. Full textin Acris/Aaltodoc:
    DOI: 10.1089/big.2020.0078 View at publisher
  • [Publication 3]: Suhas Thejaswi, Juho Lauri, Aristides Gionis. Restless reachability problems in temporal graphs. Submitted for publication, October 2021
  • [Publication 4]: Suhas Thejaswi, Bruno Ordozgoiti, Aristides Gionis. Diversity aware k-median: Clustering with fair center representation. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery from Data, Pages 765–780, April 2021.
    DOI: 10.1007/978-3-030-86520-7_47 View at publisher
  • [Publication 5]: Suhas Thejaswi, Ameet Gadekar, Bruno Ordozgoiti, Michal Osadnik. Clustering with fair center representation: parameterized approximation algorithms and heuristics. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Pages 1749–1759, August 2022