Longest common substrings with k mismatches

Thumbnail Image

Access rights

openAccess

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Date

2015

Major/Subject

Mcode

Degree programme

Language

en

Pages

643-647

Series

INFORMATION PROCESSING LETTERS, Volume 115, issue 6-8

Abstract

The longest common substring with k -mismatches problem is to find, given two strings S1S1 and S2S2, a longest substring A1A1 of S1S1 and A2A2 of S2S2 such that the Hamming distance between A1A1 and A2A2 is ≤k . We introduce a practical O(nm)O(nm) time and O(1)O(1) space solution for this problem, where n and m are the lengths of S1S1 and S2S2, respectively. This algorithm can also be used to compute the matching statistics with k -mismatches of S1S1 and S2S2 in O(nm)O(nm) time and O(m)O(m) space. Moreover, we also present a theoretical solution for the k=1k=1 case which runs in O(nlog⁡m)O(nlog⁡m) time, assuming m≤nm≤n, and uses O(m)O(m) space, improving over the existing O(nm)O(nm) time and O(m)O(m) space bound of Babenko and Starikovskaya.

Description

VK: Tarhio, J.

Keywords

Combinatorial problems, Hamming distance, Longest common substring, String algorithms

Other note

Citation

Flouri , T , Giaquinta , E , Kobert , K & Ukkonen , E 2015 , ' Longest common substrings with k mismatches ' , Information Processing Letters , vol. 115 , no. 6-8 , pp. 643-647 . https://doi.org/10.1016/j.ipl.2015.03.006