dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.advisor | Mannila, Heikki, Prof., Aalto University, Department of Information and Computer Science, Finland | |
dc.contributor.author | Lijffijt, Jefrey | |
dc.date.accessioned | 2013-12-03T10:01:22Z | |
dc.date.available | 2013-12-03T10:01:22Z | |
dc.date.issued | 2013 | |
dc.identifier.isbn | 978-952-60-5475-9 (electronic) | |
dc.identifier.isbn | 978-952-60-5474-2 (printed) | |
dc.identifier.issn | 1799-4942 (electronic) | |
dc.identifier.issn | 1799-4934 (printed) | |
dc.identifier.issn | 1799-4934 (ISSN-L) | |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/11798 | |
dc.description.abstract | Many types of data, e.g., natural language texts, biological sequences, or time series of sensor data, contain sequential structure. Analysis of such sequential structure is interesting for various reasons, for example, to detect that data consists of several homogeneous parts, that data contains certain recurring patterns, or to find parts that are different or surprising compared to the rest of the data. The main question studied in this thesis is how to identify global and local patterns in event sequences. Within this broad topic, we study several subproblems. The first problem that we address is how to compare event frequencies across event sequences and databases of event sequences. Such comparisons are relevant, for example, to linguists who are interested in comparing word counts between two corpora to identify linguistic differences, e.g., between groups of speakers, or language change over time. The second problem that we address is how to find areas in an event sequence where an event has a surprisingly high or low frequency. More specifically, we study how to take into account the multiple testing problem when looking for local frequency deviations in event sequences. Many algorithms for finding local patterns in event sequences require that the person applying the algorithm chooses the level of granularity at which the algorithm operates, and it is often not clear how to choose that level. The third problem that we address is which granularities to use when looking for local patterns in an event sequence. The main contributions of this thesis are computational methods that can be used to compare and explore (databases of) event sequences with high computational efficiency, increased accuracy, and that offer new perspectives on the sequential structure of data. Furthermore, we illustrate how the proposed methods can be applied to solve practical data analysis tasks, and describe several experiments and case studies where the methods are applied on various types of data. The primary focus is on natural language texts, but we also study DNA sequences and sensor data. We find that the methods work well in practice and that they can efficiently uncover various types of interesting patterns in the data. | en |
dc.format.extent | 116 | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | en |
dc.publisher | Aalto University | en |
dc.publisher | Aalto-yliopisto | fi |
dc.relation.ispartofseries | Aalto University publication series DOCTORAL DISSERTATIONS | en |
dc.relation.ispartofseries | 205/2013 | |
dc.subject.other | Computer science | en |
dc.title | Computational methods for comparison and exploration of event sequences | en |
dc.type | G4 Monografiaväitöskirja | fi |
dc.contributor.school | Perustieteiden korkeakoulu | fi |
dc.contributor.school | School of Science | en |
dc.contributor.department | Tietojenkäsittelytieteen laitos | fi |
dc.contributor.department | Department of Information and Computer Science | en |
dc.subject.keyword | pattern mining | en |
dc.subject.keyword | event sequence | en |
dc.subject.keyword | statistical significance | en |
dc.subject.keyword | multiple testing | en |
dc.subject.keyword | sliding window | en |
dc.subject.keyword | window length | en |
dc.identifier.urn | URN:ISBN:978-952-60-5475-9 | |
dc.type.dcmitype | text | en |
dc.type.ontasot | Doctoral dissertation (monograph) | en |
dc.type.ontasot | Väitöskirja (monografia) | fi |
dc.contributor.supervisor | Rousu, Juho, Prof., Aalto University, Department of Information and Computer Science, Finland | |
dc.opn | Goethals, Bart, Prof., University of Antwerp, Dept. of Math and Computer Science, Belgium | |
dc.rev | Geerts, Floris, Prof., Universiteit Antwerpen, Belgium; Boulicaut, Jean-François, Prof., Institut National des Sciences Appliquées de Lyon, France | |
dc.date.defence | 2013-12-16 |
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.