Skip to content
Sri Yogesh Dorbala edited this page Feb 15, 2021 · 3 revisions

Abstract

Pattern matching algorithm - find a needle(pattern) in haystack(text). Runs in O(n) avg/worst case time complexity.

Idea

  • Reduce the number of comparisons made when checking for a match for the pattern in the text.
  • Implemented by preprocessing the pattern and constructing a longest prefix which is also a suffix (LPS).

LPS

Example s: "acccbaaacccbaac" lps for s: [0,0,0,0,0,1,1,1,2,3,4,5,6,7,2]

lps[i] returns the longest prefix which is also the suffix for s(0..i)

Logic

  • Starting from the 1st index(end), we slide and check if there is a match from the 0th index(begin) of s.

  • In case of match, we increment both the indices, and make lps for the endIndex as the beginIndex ==> at this position, we have found a beginIndex length where the s(0..beginIndex) is the same as s(endIndex-beginIndex..endIndex)

In the above example, consider endIndex = 9, beginIndex= 2, lps[9] = 3. s(0..2) = acc is the same as s(7..9) = acc

  • If it's not a match, if beginIndex > 0, we move the beginIndex to the lps[beginIndex-1] ==> find the largest prefix which is also a prefix at position beginIndex-1. This way we don't have to compare the lps[beginIndex-1] chars as they have already been compared before.

Again in the above example, consider endIndex = 15, beginIndex = 7. s[beginIndex] = a, s[endIndex] = c. Since they don't match, you step back beginIndex = lps[beginIndex-1] = lps[6] = 1. This is because all the chars before beginIndex which is s[0..beginIndex-1] = a, already match in s[endIndex-beginIndex..endIndex-1] = a. Hence we only compare from the beginIndex which is s[1] = c with endIndex which is s[15] = c and since they match we have lps[15] = newBeginIndex (which is beginIndex+1) = 2

  • And finally, if its not a match and if beginIndex is 0, we cannot go back any more and set lps for that endIndex as 0 and move forward on the endIndex.

Algo for implementing the LPS

private void PreprocessLPS(string s, int[] lps)
{
    int i=0,j=1;
    while (j < s.Length)
    {
        if (s[i] == s[j])
        {
            lps[j] = ++i;
            j++;
        }
        else if (lps[i] == 0) j++;
        else i = lps[i-1];
    }
}

Pattern Matching

Example text: "mississippi", pattern: "issip", lps on pattern [0,0,0,1,0]

  • On similar lines to LPS, you have a textIndex and patternIndex, and you perform comparisons in the sliding window.

  • In case of a match, you increment both the indices.

  • If it's not a match, and if the patternIndex > 0, you move the patternIndex to lps[patternIndex-1].

Ex: when textIndex = 5, patternIndex = 4, the chars don't match and you move the patternIndex to lps[4-1] = 1. This is because the substring pattern(0..patternIndex-1) which is "i" is already checked for and is present at text(textIndex-patternIndex..textIndex) = "i". So, you check from patternIndex = 1, and textIndex = 5

  • And finally, if it's not a match and if the patternIndex is 0, we cannot go back any more in the pattern, hence increment the textIndex and start searching afresh

Algo for the Pattern Match

// returns the first index of the needle position in the haystack
public int StrStr(string haystack, string needle) 
{
    if (needle.Length == 0) return 0;
    
    int[] lps = new int[needle.Length];
    PreprocessLPS(needle, lps);
    
    int i=0,j=0;
    while (i < haystack.Length)
    {            
        if (haystack[i] == needle[j])
        {
            i++; j++;
            if (j == needle.Length) return i-needle.Length;
        }
        else if (j == 0) i++; 
        else j = lps[j-1];
    }
    return -1;
}

Practise

https://leetcode.com/problems/implement-strstr/
https://leetcode.com/problems/longest-happy-prefix/
https://leetcode.com/problems/shortest-palindrome/

Clone this wiki locally