Algorithms / Prefix Function Kmp
Least You Need to Know: Prefix Function, Borders, and KMP Intuition
The **prefix function** tracks how much of a pattern already matches itself. That self-overlap information is what lets KMP avoid restarting from scratch after mismatches in pattern matching problems.
The least you need to know
- A border is a non-full string that is both a prefix and a suffix of the same string.
- The prefix function stores the length of the longest border for each pattern prefix.
- After a mismatch, KMP reuses the largest valid overlap instead of discarding all previous progress.
- KMP avoids moving the text pointer backward because pattern self-overlap tells you where matching can continue.
- Prefix-function reasoning is about reusing structure inside the pattern, not about hashing or randomization.
Key notation
pi[i]
length of the longest proper prefix of pattern[0..i] that is also a suffix
border
string that is both a prefix and a suffix
fallback
move to a shorter previously known border after mismatch
Tiny worked example
- Consider pattern `ababaca`.
- When a mismatch happens after matching `ababa`, you do not restart at pattern index `0` blindly.
- You fall back to the longest border already known for the matched prefix.
- That reuses overlap such as `aba`, which is why KMP keeps the scan efficient.
Common mistakes
- Students often confuse a border with any repeated substring; it must align as both prefix and suffix.
- Students may think KMP skips text characters arbitrarily, but it only uses overlap justified by the pattern itself.
- Students sometimes forget that the full string is not counted as a proper border.
How to recognize this kind of problem
- If the prompt asks for exact pattern matching with many self-overlaps, think KMP or prefix function.
- If restarting from zero seems wasteful after mismatch, ask what overlap can be reused.
- If the important structure lives inside the pattern itself, prefix-function reasoning may fit better than hashing.