Algorithms / Indicator Variables

Least You Need to Know: Indicator Variables and Linearity of Expectation

Indicator variables turn counting questions into algebra. Define one indicator for each event of interest, sum them to count how many events occur, and then use linearity of expectation to compute the expected count. The key advantage is that independence is often unnecessary.

The least you need to know

An indicator variable takes value 1 when an event happens and 0 otherwise.
A total count can often be written as a sum of indicators.
The expectation of an indicator is the probability of its event.
Linearity of expectation says `E[X+Y]=E[X]+E[Y]` even without independence.
This method is standard for expected counts such as collisions, inversions, and matches.

Key notation

I_E indicator variable of event `E`, equal to 1 if `E` happens and 0 otherwise

E[I_E] probability that event `E` happens

linearity of expectation expectation of a sum equals the sum of expectations

Tiny worked example

Let `X` be the number of collisions in a hash table.
Define `I_{i,j}=1` when keys `i` and `j` collide, else 0.
Then `X = sum I_{i,j}` over all pairs.
Taking expectations gives `E[X] = sum E[I_{i,j}] = sum P(i and j collide)`.

Common mistakes

Students often overcomplicate expected counts by trying to compute the whole distribution.
Linearity of expectation does not require the indicators to be independent.
The main step is expressing the count as a sum of indicators.

How to recognize this kind of problem

The prompt asks for an expected number of bad pairs, matches, collisions, or inversions.
There is a natural yes/no event for each object or pair of objects.
A sum of 0-1 variables counts the total of interest exactly.