HN2new | past | comments | ask | show | jobs | submitlogin

The explanation is a bit lengthy, but the idea is simple.

Consider the normal naive way to search for a string in a document. First you search for the next instance of the first letter of your search string, then try to match the entire string at that position. If the match fails then you advance your search position by one character and repeat (looking for next occurrence of first letter, etc).

The Knuth-Morris-Pratt algorithm improves upon this naive approach by usually advancing by more than one character after a failed match, thereby speeding up the search. It does this by taking advantage of its knowledge of the search string and the position at which the match failed.

To get the idea, consider searching for the string "explosion" .. first we find the next "e", then try matching the rest of the word. Say we match "explo" then fail (perhaps the document had "explode"), so now we want to start over and find the next "e" in the document... What KMP would do here is note that the document matched the first 5 letters "explo", none of which (other than 1st letter) are an "e", so it can advance by 5 characters, not just 1, to start looking for the next "e".

The amounts it can advance at each failed match position are pre-calculated to be efficient.



That's the idea, but what makes it a bit more complicated is that a string can contain a prefix of itself, like "mama" in the article, or in the worst case "aaaaaaa".


Yes, so for example if you were searching for "elephant" and matched "ele", then you could only advance by 2 since the 2nd "e" might be beginning of "elephant", but if you matched "elep" than you could advance by 4 since "ep" is NOT a prefix of "elephant".

But still, it all comes down to how far can you advance at any given match failure position.


>> The amounts it can advance at each failed match position are pre-calculated to be efficient.

Wouldn't knowing that require having already run the search?


No, check the example given. To know how much we can advance, we only needed to examine the search string "explosion" in advance.


That was a great explanation. Thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: