Efficient and trustworthy methods for knowledge discovery

dc.contributorAalto Universityen
dc.contributor.advisorAslay, Cigdem, Prof., Aarhus University, Denmark
dc.contributor.authorCiaperoni, Martino
dc.contributor.departmentTietotekniikan laitosfi
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.labData Miningen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorGionis, Aristides, Prof., KTH Royal Institute of Technology, Sweden and Aalto University, Department of Computer Science, Finland
dc.description.abstractData are building blocks to information and, subsequently, they are vital input to knowledge. Today, in the midst of the digital era, vast quantities of highly-complex data are being collected and processed at an unprecedented scale. This abundance of data has highlighted the importance of efficient and effective knowledge-discovery algorithms to identify patterns hidden in the data with the ultimate aim of uncovering valuable knowledge and shape our understanding of the world around us. To capitalize on the opportunities offered by massive amounts of data as well as modern computing power, for many years, research in knowledge discovery and related areas has introduced algorithms that are increasingly efficient and effective, but also more and more opaque and unpredictable. Recently, growing interest in the ethical dimensions of algorithms has drawn attention to the limitations of opaque algorithms and has emphasized a need for trustworthy algorithms particularly when such algorithms are used to support high-stakes decision making. In order to be trustworthy, algorithms should solve a clearly defined problem via a clear sequence of instructions, they should not be utterly unsuccessful in any particular case and they should be easy to understand and interpret for humans so that no harmful biases can be hidden. In this thesis, we pursue the goal of developing novel knowledge-discovery algorithmic methods that are not only highly efficient to face the challenges and opportunities posed by modern data, but also trustworthy. In particular, we propose efficient and trustworthy methods for a collection of popular knowledgediscovery tasks. First, we consider tasks of exact inference in Bayesian networks and hidden Markov models. Trustworthy approaches for such tasks exist. However, their applicability may be severely limited by time or memory requirements. Therefore, we propose novel methods to reduce the time or memory resources that are needed by existing approaches for the considered exact inference tasks. Beside exact inference tasks, we also consider two different knowledge-discovery tasks that arise naturally in modern data: multi-label classification and community search in temporal graphs. Regarding multi-label classification, we propose an efficient and accurate rule-based multi-label classifier that drastically improves upon the interpretability of existing solutions. For community search in temporal graphs, we formalise the task for the first time, and we propose a solution that guarantees high efficiency and interpretability. In designing knowledge-discovery methods, we often rely on existing database-management and probabilistic methods. Methods for database management are valuable to address the large dimension and high complexity of modern data, while probabilistic methods are essential to methodologically handle uncertainty in the data.en
dc.format.extent126 + app. 114
dc.identifier.isbn978-952-64-1558-1 (electronic)
dc.identifier.isbn978-952-64-1557-4 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.opnBonifati, Angela, Prof., Lyon 1 University, France
dc.publisherAalto Universityen
dc.relation.haspart[Publication 1]: Cigdem Aslay, Martino Ciaperoni, Aristides Gionis, Michael Mathioudakis. Workload-aware materialization for efficient variable elimination on Bayesian networks. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, p. 1152-1163, April 2021. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202109299359. DOI: 10.1109/ICDE51399.2021.00104
dc.relation.haspart[Publication 2]: Martino Ciaperoni, Cigdem Aslay, Aristides Gionis, Michael Mathioudakis. Workload-Aware Materialization of Junction Trees. In Proceedings 25th International Conference on Extending Database Technology (EDBT 2022), Edinburgh, UK, p. 65-77, March-April 2022. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202207014366. DOI: 10.5441/002/edbt.2022.06
dc.relation.haspart[Publication 3]: Martino Ciaperoni, Aristides Gionis, Athanasios Katsamanis, Panagiotis Karras. SIEVE: A Space-Efficient Algorithm for Viterbi Decoding. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD 2022), Philadelphia, PA, USA, pp. 1136–1145, June 2022. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202401041085. DOI: 10.1145/3514221.3526170
dc.relation.haspart[Publication 4]: Martino Ciaperoni, Athanasios Katsamanis, Panagiotis Karras. When Dijkstra met Bellman: Fixed-Length Path Optimization by Best-First Search. Submitted to Proceedings of the VLDB Endowment, 2023.
dc.relation.haspart[Publication 5]: Martino Ciaperoni, Han Xiao, Aristides Gionis. Concise and interpretable multi-label rule sets. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2022), Orlando, FL, USA, pp. 71-80, November-December 2022. DOI: 10.1109/ICDM54844.2022.00017
dc.relation.haspart[Publication 6]: Edoardo Galimberti, Martino Ciaperoni, Alain Barrat, Francesco Bonchi, Ciro Cattuto, Francesco Gullo. Span-core Decomposition for Temporal Networks. ACM Transactions on Knowledge Discovery from Data, Volume 15, Issue 1, pp. 1–44, December 2020. DOI: 10.1145/3418226
dc.relation.ispartofseriesAalto University publication series DOCTORAL THESESen
dc.revDas, Gautam, Prof., University of Texas at Arlington, USA
dc.revBoehm, Matthias, Prof., Technische Universität Berlin, Germany
dc.subject.keywordknowledge discoveryen
dc.subject.keywordtrustworthy algorithmsen
dc.subject.keywordscalable algorithmsen
dc.subject.otherComputer scienceen
dc.titleEfficient and trustworthy methods for knowledge discoveryen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2024-01-26_1137
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
2.75 MB
Adobe Portable Document Format