Markov chains are beautiful. Sure, we can cite it as math that actually can model real-life processes, but there is this incredibly rich field of theoretical results underlying these stochastic processes that make them so nice to study.

If you’re familiar with a bit of probability, a (discrete-time) Markov chain is a sequence of random variables \( (X_n)\) that has the Markov property, i.e. *the probability of moving to the next state in the sequence depends only on the state you’re currently in*.

Even seemingly simple models pop up in more complex problems, one such classical Markov chain being the **coupon collector problem**. The setup is as follows:

A collector is trying to obtain a complete set of \(n\) coupons. Each of the coupons are equally likely to be the next one he acquires, and each draw is independent. How many coupons does the collector need before they have the complete set?

Alternatively, I like to think that a pokemon trainer is trying to catch all \( n\) unique pokemon in an area. The trainer is guaranteed to catch a pokemon at each encounter, where each pokemon is equally likely to appear, independent of all other encounters. How many pokemon does the trainer need to catch before they’ve caught them all?

It’s natural to calculate the expected value of \( \tau = \tau_n \), which is the number of coupons collected before the collector has completed the set. The trick is to break this value up into a telescoping sum:
\[ \mathbb{E} \{\tau_n\} = \sum_{i=1}^n \mathbb{E} \{\tau_{i} - \tau_{i-1}\},\]
where by similar notation, \( \tau_i\) is the number of coupons collected before the collector has \(i \) *unique* coupons (we’re breaking the process up into these “checkpoints”).

Let’s focus on this quantity \( \tau_{i} - \tau_{i-1}\). After collecting \( \tau_{i-1}\) coupons, there are \( n-i +1 \) coupon types that haven’t been collected yet. The collector fails to get a new type of coupon with probability \( (i-1)/n\), until the \( \tau_{i} \)th coupon when they manage to get a new type. This is the distribution of a geometric random variable with parameter \( (n-i +1)/n \). Recalling the expected value of a geometric is the reciprocal of its parameter, \[ \sum_{i=1}^n \mathbb{E} \{\tau_{i} - \tau_{i-1}\}, = n \sum_{i=1}^n \frac{1}{n-i+1} = n \sum_{i=1}^n \frac{1}{k} = n H_k,\] which means \( \mathbb{E} \{\tau\} = \Theta(n \log n) \)! Super clean.

With just a bit more math, it turns out that we can precisely show that it is very unlikely for \( \tau\) to stray from its mean. If \( c >0\) is some constant, then the probability that the collector needs more than \(k = \left \lceil n \log n + cn \right \rceil \) coupons to complete the set is the probability that they *don’t* have the \( i\)th coupon type by the time they’ve drawn the \(k\)th coupon, for some \( 1 \leq i \leq n\). If this an event we denote as \( A_i\), then we can fix a union bound
\[ \mathbb{P} \{\tau > k\} = \mathbb{P} \left \{\bigcup_{i=1}^n A_i \right \} \leq \sum_{i=1}^n \mathbb{P} \{A_i\} = \sum_{i=1}^n \left ( 1- \frac{1}{n} \right )^k \leq n \; \mathrm{exp} \left ( - \frac{n \log n + cn}{n}\right ),\]
where the last inequality is a result of the basic inequality \( 1-x \leq e^{-x}\) for all \( x\). After simplifying,
\[ \mathbb{P} \{\tau > k\} \leq e^{-c}.\]

Coupon collector is one of my favourite Markov chains because it is a simple situation with known properties about the distribution \( \tau \), and yet spawns in many problems for complex Markov chains! We’ll give an example here with card shuffling.

The **top-to-random shuffle** is a (terrible) way of shuffling a deck of \(n \) cards where one repeatedly takes the top card and makes a random, independent, and uniformly distributed insertion in the deck. Suppose we’re interested in the time \( \tau \) when the top card we inserted was the *original bottom card* when we first started shuffling. The movement of the bottom card through the deck is coupon collector! If the original bottom card is now \( k\) cards up in the deck, the probability that a new card will be inserted underneath is exactly \(k/n\) until there are \( k+ 1 \) cards beneath it. So the distribution of \( \tau \) is the same as above.

To truly appreciate the coupon collector’s versatility, one needs to be familiar with the notions of *couplings*, *strong stationary times*, and *mixing time* which are related to studying the **long-term behaviour** of Markov chains (for a future post!). These elusive quantities can be difficult— or impossible— to calculate directly, but making a connection to coupon collector makes for beautifully concise “\( O(n \log n) \), Q.E.D.” proofs.

For references to the material discussed above, check out *Markov Chains and Mixing Times* by Levin et al.!