Relations / Sql Aggregation Nulls
Least You Need to Know: GROUP BY, Aggregation, and NULL Semantics
Aggregation questions are about partitioning rows into groups, then summarizing each group carefully. The common traps are filtering at the wrong stage, forgetting how `NULL` behaves, and accidentally duplicating rows before aggregating.
The least you need to know
- A `GROUP BY` query partitions rows by the grouping columns and computes aggregates per group.
- `WHERE` filters individual rows before grouping, while `HAVING` filters groups after aggregates are computed.
- `COUNT(*)` counts rows, but `COUNT(column)` ignores rows where that column is `NULL`.
- `NULL` is not equal to ordinary values and must be checked with `IS NULL` or `IS NOT NULL`.
- Joining before aggregating can duplicate rows and change sums, counts, and averages.
Key notation
GROUP BY k
partition rows by key k before aggregating
COUNT(*)
count every row in the group
COUNT(col)
count only rows where col is not NULL
Tiny worked example
- To count orders per customer, group order rows by `customer_id` and compute `COUNT(*)` in each group.
- If you only want customers with at least three orders, that threshold applies after grouping, so it belongs in `HAVING`.
- If a nullable coupon column is counted with `COUNT(coupon_code)`, rows whose coupon is `NULL` do not contribute.
- If you join orders to order_items before counting orders, each order can duplicate into many item rows, so the aggregate may need `COUNT(DISTINCT order_id)` instead.
Common mistakes
- Students often write aggregate filters in WHERE even though the aggregate value does not exist yet.
- Students often treat NULL like an ordinary comparable value.
- Students often forget that joins can change aggregate results by duplicating base rows.
How to recognize this kind of problem
- If the condition says 'at least 5 per user', think grouping first, then HAVING.
- If the counted column can be NULL, compare COUNT(column) against COUNT(*).
- If a one-to-many join happens before aggregation, check for duplicate contribution.