Browsing by Author "Thejaswi, Suhas"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- Diversity-Aware k-median: Clustering with Fair Center Representation
A4 Artikkeli konferenssijulkaisussa(2021) Thejaswi, Suhas; Ordozgoiti, Bruno; Gionis, AristidesWe introduce a novel problem for diversity-aware clustering. We assume that the potential cluster centers belong to a set of groups defined by protected attributes, such as ethnicity, gender, etc. We then ask to find a minimum-cost clustering of the data into k clusters so that a specified minimum number of cluster centers are chosen from each group. We thus require that all groups are represented in the clustering solution as cluster centers, according to specified requirements. More precisely, we are given a set of clients C, a set of facilities, a collection F= { F1, ⋯, Ft} of facility groups, a budget k, and a set of lower-bound thresholds R= { r1, ⋯, rt}, one for each group in F. The diversity-aware k-median problem asks to find a set S of k facilities in such that | S∩ Fi| ≥ ri, that is, at least ri centers in S are from group Fi, and the k-median cost ∑ c∈Cmin s∈Sd(c, s) is minimized. We show that in the general case where the facility groups may overlap, the diversity-aware k-median problem is NP -hard, fixed-parameter intractable with respect to parameter k, and inapproximable to any multiplicative factor. On the other hand, when the facility groups are disjoint, approximation algorithms can be obtained by reduction to the matroid median and red-blue median problems. Experimentally, we evaluate our approximation methods for the tractable cases, and present a relaxation-based heuristic for the theoretically intractable case, which can provide high-quality and efficient solutions for real-world datasets. - Finding Path Motifs in Large Temporal Graphs Using Algebraic Fingerprints
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2020-10-01) Thejaswi, Suhas; Gionis, Aristides; Lauri, JuhoWe study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multiset of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several applications, for example, in recommending tours for tourists or detecting abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we consider, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution scales to massive graphs with up to a billion edges for a multiset query with 5 colors and up to 100 million edges for a multiset query with 10 colors, despite the problems being non-deterministic polynomial time-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and is highly optimized. For instance, in a real-world graph dataset with >6 million edges and a multiset query with 10 colors, we can extract an optimal solution in <8 minutes on a Haswell desktop with four cores. - Pattern detection in large temporal graphs using algebraic fingerprints
A4 Artikkeli konferenssijulkaisussa(2020) Thejaswi, Suhas; Gionis, AristidesIn this paper, we study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multi-set of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several interesting applications, for example, recommending tours for tourists, or searching for abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we define, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution can scale to massive graphs with up to hundred million edges, despite the problems being NP-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and highly optimized. For example, in a real-world graph dataset with more than six million edges and a multi-set query with ten colors, we can extract an optimal solution in less than eight minutes on a haswell desktop with four cores. - Scalable algorithm designs for mining massive datasets
School of Science | Doctoral dissertation (article-based)(2022) Thejaswi, SuhasCombinatorial problems are everywhere, starting from operations-research problems such as finding connectivity for efficient network design, placement of facilities such that the travelling distance from clients to facilities is minimized, detecting patterns in database transactions and genetic material, contact tracing in epidemic- propagation networks, and efficient advertising in social media, among others. Many of these aforementioned problems are known to be computationally intractable, meaning that, no efficient solutions are known to solve these problems. Despite being intractable, these problems have enormous impact on the society and our day-to-day activities, which prompts us to find effective solutions that can practically solve large instances of these real-world problem scenarios. A broad topic of this thesis is to design algorithms that can practically solve a set of computationally-intractable problems under restrictive settings, using mathematical abstractions and computational tools. In particular, the problems of our interest include pattern detection in graphs and data clustering with fairness constraints. Pattern detection in graphs is the problem of deciding the existence of a smaller subgraph in a usually larger graph. The problem has applications in understanding complex biological and metabolic systems, discovering controversies in social media discussions, studying epidemic propagation in contact networks, among others. Clustering with fairness constraints is the problem of grouping similar objects together in to clusters, while simultaneously taking a fairness notion into consideration. The fairness notion arise from different societal norms, including, avoiding discrimination based on gender, race, ethnicity and financial status, among others. This thesis is organized around many algorithmic techniques and computational tools for the design of algorithms that can practically solve large instances of intractable problems. These computational tools include, parameterized algorithms, approximation algorithms, heuristics, and implementation engineering. We introduce novel-problem formulations in the field of pattern detection and data clustering, analyze the computational complexity of problems that we introduce, and present algorithmic solutions that can solve these problems under restrictive settings, that is, when certain parameters of the problem are small. In cases where it is not possible to design efficient algorithmic solutions, we present careful validations in the form of computational-complexity results, lower bound on the running time, and lower bound on the approximation factor. Finally, we present carefully engineered implementation of our proposed algorithmic solutions, and validate our scalability claims using both synthetic and real-world datasets.