What is Knuth Morris Pratt or KMP algorithm ?
KMP is an algorithm which is used in the applications of pattern matching like, find a Word ‘W’ in a String ‘S’ in O(M+N) time complexity M being the length of the string and N being the length of the word.
Here is the video explanation
In a brute force approach it would time O(M*N) time complexity with KMP it can be optimized in O(M+N).
Let’s look into how a naive brute force approach work.
In the above-given string “ABCABCEE” and the word to match is “ABCE” if you look into the picture after Iteration 4, the comparison started from index 1 instead of index 4, so in a naive brute force approach every time the mismatch happened it starts comparing at the next index so the worst-case time complexity for this is O(M*N) as we are comparing the whole word “ABCE” for every letter in the given string “ABCABCEE” until we find the match.
Now lets look into how KMP works
With KMP approach if you look into the iteration 5 the comparison started at Index 4 instead of starting from Index 0 like in the naive approach, which reduces the times of comparison and the total time complexity comes to O(M+N).
How to do we achieve this?
The idea is to reduce the number of backtracking when there is a letter mismatch between the given String S and Word W.
We can do this if we know if there is a prefix in W that occurs in S more than once after at least one character is found.
Algorithm:
Word: ABCDA
| 0 | 0 | 0 | 0 | 1 |
Word: CACBCA
| 0 | 0 | 0 | 1 | 2 |
In the above word ABCDA, letter A is already there at index 0 so when there is a mismatch we would start comparison form index 1. Similarly in the word CACBCA the prefix CA has been repeated to when there is a mismatch we would start the comparison from index 3.
This way we reduce the number of times we backtrack, thus resulting in O(M+N).
Java Solution:
First we will write a function to generate the above array
public static int[] generatePattern(String word) {
int wordLength = word.length();
int len = 0;
int i = 1;
int[] wordPatternArray = new int[wordLength];
wordPatternArray[0] = 0;
while (i < patternLength) {
if (word.charAt(i) == word.charAt(len)) {
len++;
wordPatternArray[i] = len;
i++;
} else {
if (len != 0) {
len = wordPatternArray[len - 1];
} else {
wordPatternArray[i] = len;
i++;
}
}
}
return wordPatternArray;
} Now we will implement the KMP function
public static List<Integer> doKMPSearch(String text, String word) {
int[] wordPatternArray = generatePattern(word);
int textIndex = 0;
int wordIndex = 0;
List<Integer> foundIndexes = new ArrayList<>();
while (textIndex < text.length()) {
if (word.charAt(wordIndex) == text.charAt(textIndex)) {
wordIndex++;
textIndex++;
}
if (wordIndex == word.length()) {
foundIndexes.add(textIndex - wordIndex);
wordIndex = wordPatternArray[wordIndex - 1];
}
else if (textIndex < text.length() && word.charAt(wordIndex) != text.charAt(textIndex)) {
if (wordIndex != 0)
wordIndex = wordPatternArray[wordIndex - 1];
else
textIndex = textIndex + 1;
}
}
return foundIndexes;
} Question : Given an array of integers A, return the largest integer that only occurs once.…
Jump search algorithm is a pretty new algorithm to search for an element in a…
Binary Search is a Logarithmic search which finds the target element in a sorted array…
Roman numerals are represented by seven different symbols: I, V, X, L, C, D and M. Symbol Value I 1 V 5 X…
Given n non-negative integers a1, a2, ..., an , where each represents a point at coordinate (i, ai). n vertical lines are drawn such…
Given a binary tree, return the bottom-up level order traversal of its nodes' values. (ie, from left…
This website uses cookies.