Efficient and trustworthy methods for knowledge discovery

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2024-01-26
Degree programme
126 + app. 114
Aalto University publication series DOCTORAL THESES, 208/2023
Data are building blocks to information and, subsequently, they are vital input to knowledge. Today, in the midst of the digital era, vast quantities of highly-complex data are being collected and processed at an unprecedented scale. This abundance of data has highlighted the importance of efficient and effective knowledge-discovery algorithms to identify patterns hidden in the data with the ultimate aim of uncovering valuable knowledge and shape our understanding of the world around us. To capitalize on the opportunities offered by massive amounts of data as well as modern computing power, for many years, research in knowledge discovery and related areas has introduced algorithms that are increasingly efficient and effective, but also more and more opaque and unpredictable. Recently, growing interest in the ethical dimensions of algorithms has drawn attention to the limitations of opaque algorithms and has emphasized a need for trustworthy algorithms particularly when such algorithms are used to support high-stakes decision making. In order to be trustworthy, algorithms should solve a clearly defined problem via a clear sequence of instructions, they should not be utterly unsuccessful in any particular case and they should be easy to understand and interpret for humans so that no harmful biases can be hidden. In this thesis, we pursue the goal of developing novel knowledge-discovery algorithmic methods that are not only highly efficient to face the challenges and opportunities posed by modern data, but also trustworthy. In particular, we propose efficient and trustworthy methods for a collection of popular knowledgediscovery tasks. First, we consider tasks of exact inference in Bayesian networks and hidden Markov models. Trustworthy approaches for such tasks exist. However, their applicability may be severely limited by time or memory requirements. Therefore, we propose novel methods to reduce the time or memory resources that are needed by existing approaches for the considered exact inference tasks. Beside exact inference tasks, we also consider two different knowledge-discovery tasks that arise naturally in modern data: multi-label classification and community search in temporal graphs. Regarding multi-label classification, we propose an efficient and accurate rule-based multi-label classifier that drastically improves upon the interpretability of existing solutions. For community search in temporal graphs, we formalise the task for the first time, and we propose a solution that guarantees high efficiency and interpretability. In designing knowledge-discovery methods, we often rely on existing database-management and probabilistic methods. Methods for database management are valuable to address the large dimension and high complexity of modern data, while probabilistic methods are essential to methodologically handle uncertainty in the data.
Supervising professor
Gionis, Aristides, Prof., KTH Royal Institute of Technology, Sweden and Aalto University, Department of Computer Science, Finland
Thesis advisor
Aslay, Cigdem, Prof., Aarhus University, Denmark
knowledge discovery, trustworthy algorithms, scalable algorithms
Other note
  • [Publication 1]: Cigdem Aslay, Martino Ciaperoni, Aristides Gionis, Michael Mathioudakis. Workload-aware materialization for efficient variable elimination on Bayesian networks. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, p. 1152-1163, April 2021.
    DOI: 10.1109/ICDE51399.2021.00104 View at publisher
  • [Publication 2]: Martino Ciaperoni, Cigdem Aslay, Aristides Gionis, Michael Mathioudakis. Workload-Aware Materialization of Junction Trees. In Proceedings 25th International Conference on Extending Database Technology (EDBT 2022), Edinburgh, UK, p. 65-77, March-April 2022.
    DOI: 10.5441/002/edbt.2022.06 View at publisher
  • [Publication 3]: Martino Ciaperoni, Aristides Gionis, Athanasios Katsamanis, Panagiotis Karras. SIEVE: A Space-Efficient Algorithm for Viterbi Decoding. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD 2022), Philadelphia, PA, USA, pp. 1136–1145, June 2022.
    DOI: 10.1145/3514221.3526170 View at publisher
  • [Publication 4]: Martino Ciaperoni, Athanasios Katsamanis, Panagiotis Karras. When Dijkstra met Bellman: Fixed-Length Path Optimization by Best-First Search. Submitted to Proceedings of the VLDB Endowment, 2023.
  • [Publication 5]: Martino Ciaperoni, Han Xiao, Aristides Gionis. Concise and interpretable multi-label rule sets. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2022), Orlando, FL, USA, pp. 71-80, November-December 2022.
    DOI: 10.1109/ICDM54844.2022.00017 View at publisher
  • [Publication 6]: Edoardo Galimberti, Martino Ciaperoni, Alain Barrat, Francesco Bonchi, Ciro Cattuto, Francesco Gullo. Span-core Decomposition for Temporal Networks. ACM Transactions on Knowledge Discovery from Data, Volume 15, Issue 1, pp. 1–44, December 2020.
    DOI: 10.1145/3418226 View at publisher