Knuth Morris Pratt Pattern Search Algorithm

kmp search algorithm

What is Knuth Morris Pratt or KMP algorithm ?

KMP is an algorithm which is used in the applications of pattern matching like, find a Word ‘W’ in a String ‘S’ in O(M+N) time complexity M being the length of the string and N being the length of the word.

Here is the video explanation



In a brute force approach it would time O(M*N) time complexity with KMP it can be optimized in O(M+N).

Let’s look into how a naive brute force approach work.

Naive Approach

In the above-given string “ABCABCEE” and the word to match is “ABCE” if you look into the picture after Iteration 4, the comparison started from index 1 instead of index 4, so in a naive brute force approach every time the mismatch happened it starts comparing at the next index so the worst-case time complexity for this is O(M*N) as we are comparing the whole word “ABCE” for every letter in the given string “ABCABCEE” until we find the match.

Now lets look into how KMP works

Knuth Morris Pratt KMP

With KMP approach if you look into the iteration 5 the comparison started at Index 4 instead of starting from Index 0 like in the naive approach, which reduces the times of comparison and the total time complexity comes to O(M+N).

How to do we achieve this?

The idea is to reduce the number of backtracking when there is a letter mismatch between the given String S and Word W.

We can do this if we know if there is a prefix in W that occurs in S more than once after at least one character is found.

Algorithm:

  • Initialize an array of size word W
  • Populate the array based on the prefix index like below

Word: ABCDA

00001

Word: CACBCA

00012

In the above word ABCDA, letter A is already there at index 0 so when there is a mismatch we would start comparison form index 1. Similarly in the word CACBCA the prefix CA has been repeated to when there is a mismatch we would start the comparison from index 3.
This way we reduce the number of times we backtrack, thus resulting in O(M+N).

Java Solution:

First we will write a function to generate the above array

public static int[] generatePattern(String word) {
    int wordLength = word.length();
    int len = 0;
    int i = 1;
    int[] wordPatternArray = new int[wordLength];
    wordPatternArray[0] = 0;

    while (i < patternLength) {
        if (word.charAt(i) == word.charAt(len)) {
            len++;
            wordPatternArray[i] = len;
            i++;
        } else {
            if (len != 0) {
                len = wordPatternArray[len - 1];
            } else {
                wordPatternArray[i] = len;
                i++;
            }
        }
    }
    return wordPatternArray;
}

Now we will implement the KMP function

public static List<Integer> doKMPSearch(String text, String word) {
    int[] wordPatternArray = generatePattern(word);

    int textIndex = 0;
    int wordIndex = 0;

    List<Integer> foundIndexes = new ArrayList<>();

    while (textIndex < text.length()) {
        if (word.charAt(wordIndex) == text.charAt(textIndex)) {
            wordIndex++;
            textIndex++;
        }
        if (wordIndex == word.length()) {
            foundIndexes.add(textIndex - wordIndex);
            wordIndex = wordPatternArray[wordIndex - 1];
        }

        else if (textIndex < text.length() && word.charAt(wordIndex) != text.charAt(textIndex)) {
            if (wordIndex != 0)
                wordIndex = wordPatternArray[wordIndex - 1];
            else
                textIndex = textIndex + 1;
        }
    }
    return foundIndexes;
}