Advances in Analysing Temporal Data

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2017-05-26
Degree programme
99 + app. 199
Aalto University publication series DOCTORAL DISSERTATIONS, 63/2017
Modern technical capabilities and systems have resulted in an abundance of data. A significant portion of all that data is of temporal nature. Hence, it becomes imperative to design effective and efficient algorithms, and solutions that enable searching and analysing large databases of temporal data. This thesis contains several contributions related to the broad scientific field of temporal-data analysis. First, we present a distance function for pairs of event-interval sequences, together with proofs of important properties, such as that the function is a metric, and a lower-bounding function. An embedding-based indexing method is proposed for searching through large databases of event-interval sequences, under this distance function. Second, we study the problem of subsequence search for event-interval sequences. This includes hardness results, an exact worst-case exponential-time algorithm, two upper bounds and a scheme for approximation algorithms. In addition, an equivalence is established between graphs and event-interval sequences. This equivalence allows to derive hardness results for several problems of event-interval sequences. Most importantly, it raises the question which techniques, results, and methods from each of the fields of graph mining and temporal data mining can be applied to the other that would advance the current state of the art. Third, for the problem of subsequence search, we propose an indexing method based on decomposing event-interval sequences into 2-interval patterns. The proposed indexing method is benchmarked against other approaches. In addition, we examine different variations of the problem and propose exact algorithms for solving them. Fourth, we describe a complete system that enables the clustering of a stream of graphs. The graphs are clustered into groups based on their distances, via approximating the graph edit distance. The proposed clustering algorithm achieves a good clustering with few graph comparisons. The effectiveness and usefulness of the systems is demonstrated by clustering function call-graphs of binary executable files for the purpose of malware detection. Finally, we solve the problem of summarising temporal networks. We assume that networks operate in certain modes and that the total sequence of interactions can be modelled as a series of transitions between these modes. We prove hardness results and provide heuristic procedures for finding approximate solutions. We demonstrate the quality of our methods via benchmarking and performing case-studies on datasets taken from sports and social networks.
Supervising professor
Gionis, Aristides, Associate Prof., Aalto University, Department of Computer Science, Finland
temporal data, event-interval sequences, temporal networks, streams, graphs
Other note
  • [Publication 1]: O. Kostakis. Classy: fast clustering streams of call-graphs. Data Mining and Knowledge Discovery, Volume 28, Issue 5, pp 1554–1585, September 2014.
    DOI: 10.1007/s10618-014-0367-9 View at publisher
  • [Publication 2]: O. Kostakis, A. Gionis. Subsequence Search in Event-Interval Sequences. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, Pages 851–854, August 2015.
    DOI: 10.1145/2766462.2767778 View at publisher
  • [Publication 3]: O. Kostakis, P. Papapetrou. Finding the longest common sub-pattern in sequences of temporal intervals. Data Mining and Knowledge Discovery, Volume 29, Issue 5, pp 1178-1210, September 2015.
    DOI: 10.1007/s10618-015-0404-3 View at publisher
  • [Publication 4]: O. Kostakis, N. Tatti, A. Gionis. Discovering recurring activity in temporal networks. Accepted for publication in Data Mining and Knowledge Discovery, 31 pages, Submission date 2016
  • [Publication 5]: O. Kostakis, P. Papapetrou. On Searching and Indexing Sequences of Temporal Intervals. Accepted for publication in Data Mining and Knowledge Discovery, 44 pages, Submission date 2016.
    DOI: 10.1007/s10618-016-0489-3 View at publisher