Algorithms for Order-Preserving Matching
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Doctoral thesis (article-based)
| Defence date: 2016-06-10
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2016
Major/Subject
Mcode
Degree programme
Language
en
Pages
70 + app. 43
Series
Aalto University publication series DOCTORAL DISSERTATIONS, 101/2016
Abstract
String matching is a widely studied problem in Computer Science. There have been many recent developments in this field. One fascinating problem considered lately is the order-preserving matching (OPM) problem. The task is to find all the substrings in the text which have the same length and relative order as the pattern, where the relative order is the numerical order of the numbers in a string. The problem finds its applications in the areas involving time series or series of numbers. More specifically, it is useful for those who are interested in the relative order of the pattern and not in the pattern itself. For example, it can be used by analysts in a stock market to study movements of prices. In addition to the OPM problem, we also studied its approximate variation. In approximate order-preserving matching, we search for those substrings in the text which have relative order similar to the pattern, i.e., relative order of the pattern matches with at most k mismatches. With respect to applications of order-preserving matching, approximate search is more meaningful than exact search. We developed various advanced solutions for the problem and its variant. Special emphasis was laid on the practical efficiency of the solutions. Particularly, we introduced a simple solution for the OPM problem using filtration. We proved experimentally that our method was effective and faster than the previous solutions for the problem. In addition, we combined the Single Instruction Multiple Data (SIMD) instruction set architecture with filtration to develop competent solutions which were faster than our previous solution. Moreover, we proposed another efficient solution without filtration using the SIMD architecture. We also presented an offline solution based on the FM-index scheme. Furthermore, we proposed practical solutions for the approximate order-preserving matching problem and one of the solutions was the first sublinear solution on average for the problem.Description
Supervising professor
Tarhio, Jorma, Prof., Aalto University, Department of Computer Science, FinlandThesis advisor
Tarhio, Jorma, Prof., Aalto University, Department of Computer Science, FinlandKeywords
string matching, indexing, SIMD, filtration
Other note
Parts
-
[Publication 1]: Tamanna Chhabra and Jorma Tarhio. A filtration method for order-preserving matching. Information Processing Letters, 116(2): 71–74, 2016.
DOI: 10.1016/j.ipl.2015.10.005 View at publisher
- [Publication 2]: Tamanna Chhabra, M. Oguzhan Kulekci, and Jorma Tarhio. Alternative algorithms for order-preserving matching. In Proceedings of the Prague Stringology Conference, Prague, Czech Republic, 36–46, August 2015.
- [Publication 3]: Tamanna Chhabra, Simone Faro, and M. Oguzhan Kulekci. Engineering order-preserving pattern matching with SIMD parallelism. Software–Practice and Experience, 2015.
-
[Publication 4]: Tamanna Chhabra, Emanuele Giaquinta, and Jorma Tarhio. Filtration algorithms for approximate order-preserving matching. In Proceedings of the String Processing and Information Retrieval – 22nd International Symposium, SPIRE, London, UK, Lecture Notes in Computer Science 9309: 177–187, September 2015.
DOI: 10.1007/978-3-319-23826-5_18 View at publisher