Algorithms / Rolling Hash Substrings
Least You Need to Know: Rolling Hashes and Substring Fingerprints
A **rolling hash** gives a compact fingerprint for a substring so adjacent windows can be compared or updated quickly. Interview prompts use it for repeated-substring checks, duplicate-window detection, and fast candidate comparison.
The least you need to know
- A rolling hash updates the previous window summary instead of recomputing the full substring from scratch.
- Hash equality is evidence that substrings may match, but collisions mean equality is not a proof by itself.
- Prefix-hash formulas let you derive a substring fingerprint from precomputed information.
- Modular arithmetic keeps hash values bounded while supporting algebraic updates.
- Rolling hashes are most useful when many equal-length substring comparisons are needed.
Key notation
H[i]
prefix hash up to position i under a fixed convention
base
multiplier used to weight character positions
mod
modulus used to bound hash values
Tiny worked example
- Suppose you hash every length-5 substring of a long string.
- After computing the first window, the next window can remove the outgoing character contribution, multiply or shift as needed, and add the incoming character.
- That reuses old work, which is why rolling hashes help avoid recomputing all 5 characters each time.
- If two window hashes match, you may still verify the substrings directly when correctness matters.
Common mistakes
- Students often treat a matching hash as a proof that two substrings are identical.
- Students sometimes forget that outgoing characters must be removed with the correct positional weight.
- Students may choose rolling hash for one-off comparisons where simpler direct comparison is enough.
How to recognize this kind of problem
- If the prompt compares many equal-length substrings, think rolling hash.
- If adjacent windows differ by one outgoing and one incoming character, think incremental update.
- If the task asks for repeated pattern detection over long text, a substring fingerprint can be a good first filter.