Algorithms for Order-Preserving Matching

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Doctoral thesis (article-based) | Defence date: 2016-06-10

Date

2016

Major/Subject

Mcode

Degree programme

Language

en

Pages

70 + app. 43

Series

Aalto University publication series DOCTORAL DISSERTATIONS, 101/2016

Abstract

String matching is a widely studied problem in Computer Science. There have been many recent developments in this field. One fascinating problem considered lately is the order-preserving matching (OPM) problem. The task is to find all the substrings in the text which have the same length and relative order as the pattern, where the relative order is the numerical order of the numbers in a string. The problem finds its applications in the areas involving time series or series of numbers. More specifically, it is useful for those who are interested in the relative order of the pattern and not in the pattern itself. For example, it can be used by analysts in a stock market to study movements of prices.  In addition to the OPM problem, we also studied its approximate variation. In approximate order-preserving matching, we search for those substrings in the text which have relative order similar to the pattern, i.e., relative order of the pattern matches with at most k mismatches. With respect to applications of order-preserving matching, approximate search is more meaningful than exact search. We developed various advanced solutions for the problem and its variant. Special emphasis was laid on the practical efficiency of the solutions. Particularly, we introduced a simple solution for the OPM problem using filtration. We proved experimentally that our method was effective and faster than the previous solutions for the problem. In addition, we combined the Single Instruction Multiple Data (SIMD) instruction set architecture with filtration to develop competent solutions which were faster than our previous solution. Moreover, we proposed another efficient solution without filtration using the SIMD architecture. We also presented an offline solution based on the FM-index scheme. Furthermore, we proposed practical solutions for the approximate order-preserving matching problem and one of the solutions was the first sublinear solution on average for the problem.

Description

Supervising professor

Tarhio, Jorma, Prof., Aalto University, Department of Computer Science, Finland

Thesis advisor

Tarhio, Jorma, Prof., Aalto University, Department of Computer Science, Finland

Keywords

string matching, indexing, SIMD, filtration

Other note

Parts

  • [Publication 1]: Tamanna Chhabra and Jorma Tarhio. A filtration method for order-preserving matching. Information Processing Letters, 116(2): 71–74, 2016.
    DOI: 10.1016/j.ipl.2015.10.005 View at publisher
  • [Publication 2]: Tamanna Chhabra, M. Oguzhan Kulekci, and Jorma Tarhio. Alternative algorithms for order-preserving matching. In Proceedings of the Prague Stringology Conference, Prague, Czech Republic, 36–46, August 2015.
  • [Publication 3]: Tamanna Chhabra, Simone Faro, and M. Oguzhan Kulekci. Engineering order-preserving pattern matching with SIMD parallelism. Software–Practice and Experience, 2015.
  • [Publication 4]: Tamanna Chhabra, Emanuele Giaquinta, and Jorma Tarhio. Filtration algorithms for approximate order-preserving matching. In Proceedings of the String Processing and Information Retrieval – 22nd International Symposium, SPIRE, London, UK, Lecture Notes in Computer Science 9309: 177–187, September 2015.
    DOI: 10.1007/978-3-319-23826-5_18 View at publisher

Citation