Wednesday, April 29, 2015

Crux Move #10: Condition On

A CRUX MOVE is a math trick that comes up so often someone should make a list of them and post it on the Internet. One might be all that stands between you and solving your problem. Keep these moves in your back pocket and you'll improve your math game.

I haven't posted a crux move in a while but they are always lurking on the back burner. The challenge is digging up good examples. I made a list of the moves but, because I am a dope, I forgot to make a list of problems that nicely illustrate them. However, I occasionally stumble across one in my travels and then another Crux Move post drops from LabKitty like a newborn dew-fresh kitten.

Today's crux move is specific to probability work.



crux move logo
Crux Move #10
Condition On

Sometimes in probability, inserting just the right conditional probability into an expression will get a problem unstuck. (This is an example of the general trick of making a problem more complicated to make it easier. We used to call this "squonking" back in engineering school, for reasons I hope to explain at a later date.) Sheldon Ross refers to this trick as "conditioning on" in his textbooks (presumably, everybody refers to this as "conditioning on," but Ross is just bananas for it).

Conditioning On is based on the Theorem of Total Probability (ToTP). For any two discrete random variables X and Y we have

  P(X) = Σ P(X|Y=y) ⋅ P(y)

where the sum goes over all possible values of y (aka the entire "support" of Y). For X and Y continuous, the sum changes to an integral and the probabilities to densities:

  P(X) = ∫   f(X|Y=y) ⋅ f(y) dy

where the integral limits are the entire support of Y.

Consider an example. If X is a fair coin toss, we know that P(H) -- the probability of throwing a heads -- is equal to 1/2. So the ToTP better give us 1/2 no matter what we condition on. Let's condition on "next president of America" which we will assume has support { Hillary Clinton, Jeb Bush, LabKitty } with probabilities 0.49, 0.49, and 0.02. Then, applying the ToTP:

  P(H) =  Σ P(H | Y=y) ⋅ P(y)
   = P(H | Y="Clinton") ⋅ 0.49
    + P(H | Y="Bush") ⋅ 0.49
     + P(H | Y="LabKitty") ⋅ 0.02

But the conditional probabilities all reduce to P(H) because who is elected president has nothing to do with the outcome of coin toss (the converse may not be true). Thus we have:

  P(H) = 1/2 ⋅ (0.49 + 0.49 + 0.01)
  = 1/2 ⋅ 1.0
   = 1/2

and we get the correct result.

I made this example silly to demonstrate a point: The beauty of the ToTP is that we're free to condition on any Y we like. The difficulty is identifying a Y that is useful. This leads us to more meaty examples, which we now consider.

E X A M P L E   1

Suppose we have independent continuous random variables X and Y with densities fX and fY. What is P(X<Y)?

The brute-force approach would be to integrate the joint densities over the appropriate region of the plane:

   P(X<Y) = ∫ ∫ x<y  fX(x) ⋅ fY(y) dx dy

This is correct; it's just not very illuminating because we're not explicitly given the pdfs. Using Condition On allows us to go (a bit) further:

  P(X<Y) = ∫  P(X<Y | Y=y) ⋅  fY(y) dy
   = ∫  P(X<y | Y=y) ⋅  fY(y) dy
   = ∫  P(X<y) ⋅  fY(y) dy
   = ∫  FX(y) ⋅  fY(y) dy

where FX is the (cumulative) distribution of X and the integral goes over all values of y. Note the substitution of (the constant) y for Y in line 2, and the jettisoning of conditional probability in line 3 because it no longer has an effect on the event of interest (the value of Y has no effect on the probability that X is less than a given constant). Such machinations are frequently encountered when applying the trick.

E X A M P L E   2

Suppose we have independent continuous random variables X and Y with densities fX and fY. Find FX+Y

The brute-force approach would again be to integrate in the plane. Instead, condition on the event Y=y:

  FX+Y(a) = P(X+Y < a)
   = ∫  P(X+Y < a | Y=y) ⋅  fY(y) dy
   = ∫  P(X+y < a | Y=y) ⋅  fY(y) dy
   = ∫  P(X+y < a) ⋅  fY(y) dy
   = ∫  P(X < a–y) ⋅  fY(y) dy
   = ∫  FX(a–y) ⋅  fY(y) dy

The integral is over all values of y. This form of an integral is aka the convolution of FX and fY. Once again, note the interplay of random variables and constants and the jettisoning of conditional probability when it no longer serves a purpose.

A version of Condition On also exists for expectation. Specifically, we substitute an expectation with an expectation of a conditional expectation. That is, for any random variables X and Y, we have:

   E(X) = E(E(X|Y))

Recall for a discrete random variable X: E(X) = Σ x ⋅ P(X=x), where the sum is over all possible values of x. For a function of X, say g(X), we have: E(g(X))) = Σ g(x) ⋅ P(X=x). E(X|Y) is a function of the random variable Y whose value at y is E(X|Y=y). Ergo, we have: E(E(X|Y)) = Σ E(X|Y=y) ⋅ P(Y=y). For Y continuous we change the sum to an integral and the probability to a density: E(E(X|Y)) = ∫  E(X|Y=y) ⋅ fY(y). (A good exercise is to prove these expressions indeed reduce to E(X)).

The classic application of this crux move is the lost miner problem. Here's the version appearing in my copy of Ross:

E X A M P L E   3

A miner is trapped in a mine containing three doors. The first door leads him to a tunnel which takes him to safety after two hours of travel. The second door leads him to a tunnel which returns him to the mine after three hours of travel. The third door leads him to a tunnel which returns him to the mine after five hours. Assuming the miner is equally likely to pick one of the three doors, what is the expected length of time until the miner reaches safety?

The difficulty here of course is the recursion, with doors 2 and 3 leading the miner back to where he started. (If all three doors led to safety, the solution would simply be 1/3 ⋅ 2 + 1/3 ⋅ 3 + 1/3 ⋅ 5.) Let X be the length of time until the miner reaches safety. We seek E(X). Use the crux move. Let Y be door chosen initially. We have:

  E(X) = E(E(X|Y))

   = E(X|Y=1) ⋅ P(Y=1)
   + E(X|Y=2) ⋅ P(Y=2)
     + E(X|Y=3) ⋅ P(Y=3)

  = 1/3 ⋅ E(X|Y=1)
   + 1/3 ⋅ E(X|Y=2)
     + 1/3 ⋅ E(X|Y=3)

If the miner chooses door 2, then he wastes three hours and the problem starts all over again. So E(X|Y=2) = 2 + E(X). Similarly, E(X|Y=3) = 5 + E(X). Substituting:

  E(X) = 1/3 ⋅ E(X|Y=1)
    + 1/3 ⋅ [3 + E(X)]
    + 1/3 ⋅ [5 + E(X)]

After a little bit of algebra wrangling, we find E(X) = 10 hours.

Note in contrast to our previous examples, this problem is, I believe, unsolvable without the crux move.

Our final example demonstrates a bit of virtuosity with the move.

E X A M P L E   4

What is the expected value of the sum of a random number of random variables?

The expected value of a sum is no problem; it's just the sum of the expectations. The weirdness here is that the number of variables in the sum is also random. Call the random variables Yi and let N be number of them to be summed. We will assume the Yi are independent and identically distributed and that Yi and N are also independent. We seek:

  E( Σ1,N Yi )

The 1,N subscript is the sum limits (1 to N) and is just my incompetent HTML formatting (the N should be on top of the summation symbol, but I don't know how to do that). For convenience, let's denote this sum by the random variable X. We seek E(X). We apply the trick and find E(X) using E(E(X|something)), where, as stated earlier, we can condition on whatever something we like. Here, let's condition on N:

  E(X) = E(E(X|N)) =  Σ1,∞ {E(X|N=n)} ⋅ P(N=n)

Expand the stuff inside the curly brackets. X is the sum Σ1,N Yi. Ergo, (X|N=n) is just the sum Σ1,nYi -- the upper sum limit changes from N to n. We're now just dealing with a boring vanilla sum. The expectation of a sum is just the sum of the expectations, so E(X|N=n) = n ⋅ E(Y). Substitute this result into the previous expression:

  E(X) = Σ1,∞ n ⋅ E(Y) ⋅ P(N=n)
    = E(Y) ⋅  Σ1,∞ n ⋅ P(N=n)

But the stuff left inside the sum is just E(N). So we have shown:

  E(X) = E(Y) ⋅ E(N)

The answer is probably what you expected; it's just rather slippery to prove. The general form of this result goes by the name Wald's equation.

Previous Move: Leibniz's Rule
Next Move: Stupid Regression Tricks

No comments:

Post a Comment