Relations / Sql Indexing Cardinality
Least You Need to Know: Indexes, Selectivity, and Cardinality Intuition
Index and query-plan interview questions are mostly about search-space pruning. Good indexes cut down candidate rows quickly; bad selectivity, tiny tables, or the wrong leading columns can erase that advantage.
The least you need to know
- Indexes help most when the query condition is selective and avoids scanning most rows.
- Low-selectivity columns such as a two-value status flag often make poor standalone indexes.
- For very small tables, a sequential scan can be cheaper than using an index.
- Cardinality estimates guide join order because they approximate how many rows each step will produce.
- For composite indexes, leading-column order matters because many engines use left-prefix matching behavior.
Key notation
selectivity
fraction of rows a predicate keeps
cardinality
estimated number of rows in an intermediate result
(a, b)
composite index whose leading column is a
Tiny worked example
- If a million-row users table has an index on `email`, an equality lookup by one exact email is highly selective.
- If the same table has an index on `is_active` with values only `true` or `false`, each lookup may still touch a huge fraction of rows.
- In a composite index on `(country, created_at)`, queries filtering by `country` can often use the index directly, while filtering only by `created_at` may not use the leading prefix efficiently.
- Query planners compare such estimates when choosing scan and join strategies.
Common mistakes
- Students often assume every index is automatically helpful.
- Students often ignore row-count estimates when reasoning about join order.
- Students often forget that composite-index order changes which predicates the index supports well.
How to recognize this kind of problem
- If the predicate narrows to one or a few rows, an index is usually promising.
- If a condition matches a huge chunk of the table, the index may not win.
- When asked why the planner joins one table first, compare intermediate result sizes.