Sunday, August 13, 2017

Mick Jagger does not understand Conditional Probability

LabKitty Drink and Derive Logo

Funny, got no money.
I throw sixes and sevens and nines.


I went to engineering school with some very smart women. Case in point was a Yugoslavian prodigy doing her doctoral dissertation on nonlinear viscoplastic constitutive modeling and I'll bet some of those aren't real words. Yet, her most enigmatic quirk was not a Macedonian heritage nor a frightening alacrity with tensor calculus, but rather an unholy obsession with Mick Jagger. Yes, I mean in a vant to have his babies kind of way. And, no, I don't mean Rock n' Roll Circus Ed Sullivan prime Mick Jagger. I mean the AARP member recently seen grinding against Christina Aguilera and Martin Scorsese believed it was ethical to film the act. I haven't been this squicked out since watching Steven Tyler mack on Tal Wilkenfeld in that Jeff Beck YouTube promo.

Oh, I kid because I love. Still, I must find some way to assuage my profound embafflement, for I'm told this is not a uncommon affliction. Women of all stripes have an unholy obsession with Mick Jagger, regardless of age, color, creed, or alacrity with tensor calculus. Apparently he continues to receive sufficient underpants to build an underpants fort every day of the week and have underpants fort fights with Tom Jones. All from sisters, little and otherwise, who should know better. Meanwhile, the number of strange women offering themselves to LabKitty is quite modest, and I have a blog and everything.

It's a right mystery, one that makes a mockery of my four university degrees, adult orthodontia, and common sense. Fortunate for me, then, that Mick has questionable math skills, one heartbreaker who will forever sway beyond the reach of his sticky fingers. Thus my vessel of ego repair appears. I sing of the probability libel on display in Tumbling Dice, a song he penned circa 1972 and you don't much hear on the radio anymore because Clear Channel decided they're going to play Adele over and over until we all lay dead with chopsticks jammed in our earholes. Why be all about the variety when J. Peterman can simply assure you it is true? But I digress.

What's that? you ask. Probability libel? Kitty, whatever do you mean?

Yes, it's time for another LabKitty browlfsplainer! Read on.



Those of us who admire Mr. Jagger more as a songwriter (at least until Goats Head Soup, after which I feel his muse went off the rails, Saint of Me notwithstanding, which is a bitchin' tune and I will fight anyone who claims otherwise in the parking lot) and less as a gene gun will readily recognize the lyrics opening my current ranting. These feature prominently in Tumbling Dice, the throughline of which bemoans the singer's recent downturn in the gambling arts. Specifically craps, other dice games (read: DnD) not known for players who will be making underpants forts, I assure you. There's a reason Scarlet Blade is so popular among my ilk.

To explore what is specifically wrong with these lyrics, we must first understand the rules of craps. Short answer: You bet some money. Then, you tumble two die until either a Good Number turns up, in which case your bet is doubled and returned, or a Bad Number turns up, in which case your bet is forfeit. So far, so good. But we need to examine these good and bad numbers further, and for that we require detailed rules of the game.

Although there may exist craps variants specific to your locale, for simplicity we shall take the Wikipedia craps page as gospel. Yes, in full knowing taking any Wikipedia page as gospel is asking for trouble (Aristotle was not "Belgian").

Willypete describes craps as follows:

    0) Place bet.

    Here the game divides into 2 stages:

    1a) If you throw a 7 or 1, you win!
    1b) If you throw a 2, 3 or 12, you lose.

    Any other value becomes your "point."

    Continue to throw until:

    2a) You throw your point again. You win!
    2b) You throw a 7. You lose.

Simple, ne pas? Our task is to reduce this cornucopia of back-alley imagery that has inspired everyone from Lloy Price to Liam Lynch into sterile, unfeeling mathematics. To do so, we must identify the support and distribution, fancy words for the possible outcomes of throwing two die and the probability of each.

The possible outcomes of throwing two die (by "outcome" we mean the number of dots showing face up when the dice come to rest, not an outcome like "Stagger Lee shanks you") are:

    S = [ 2 3 4 5 6 7 8 9 10 11 12 ]

This set is the support, which is why I called it S (crafty!). The distribution comprises the collective probabilities of the elements of S. These are also not difficult to suss. There are 36 possible outcomes (six possible outcomes on the first die x six on the second), so you can just count the number of ways each number in S can happen and divide by 36 (this, of course, assuming the dice are "fair," as in Pr(heads) = Pr(tails) = 0.5 for a "fair" coin, where "Pr(foo)" is my notation for "the probability of foo").

For example, there are two ways to tumble a 3 -- either (1,2) or (2,1) -- so Pr(3) = 2/36. I leave the remaining entries as an exercise for the reader, but here they are:

    Pr(S) = [ 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 ]

We now compute the probability of winning a game of craps given this data. A "win" is either (1a) or (2a) in the rules above. The hiccup is that the rules change after the first throw. We can account for this by exhaustively listing everything that can possibly happen on the first throw, and weight each of those things by the chance of winning if that thing happens:

    Pr(win) = Pr(2) ⋅ Pr(win | 2)
        + Pr(3) ⋅ Pr(win | 3)
        + Pr(4) ⋅ Pr(win | 4)
        + Pr(5) ⋅ Pr(win | 5)
        + Pr(6) ⋅ Pr(win | 6)
        + Pr(7) ⋅ Pr(win | 7)
        + Pr(8) ⋅ Pr(win | 8)
        + Pr(9) ⋅ Pr(win | 9)
        + Pr(10) ⋅ Pr(win | 10)
        + Pr(11) ⋅ Pr(win | 11)
        + Pr(12) ⋅ Pr(win | 12)

In words: The probability of winning is the equal to the probability of throwing a two weighted by the probability of winning if you throw a two plus the probability throwing a three weighted by the probability of winning if you throw a three and so on. "Pr(foo | bar)" is my notation for "the probability of foo given that bar happened."

Three of these entries we can cross off immediately. You lose if you throw a two or a three or a twelve, so Pr(win | 2) = Pr(win | 3) = Pr(win | 12) = 0. Also, you win if you throw a seven or eleven so Pr(win | 7) = Pr(win | 11) = 1. We're left with:

    Pr(win) = Pr(4) ⋅ Pr(win | 4)
        + Pr(5) ⋅ Pr(win | 5)
        + Pr(6) ⋅ Pr(win | 6)
        + Pr(7)
        + Pr(8) ⋅ Pr(win | 8)
        + Pr(9) ⋅ Pr(win | 9)
        + Pr(10) ⋅ Pr(win | 10)
        + Pr(11)

We now need to find the probability of wins for all the throws that take us into Stage II -- that is, all of the Pr(win | point) terms. This is the probability of throwing the point before throwing a seven. Rewrite the expression to make this explicit:

    Pr(win) = Pr(4) ⋅ Pr(4-before-7)
        + Pr(5) ⋅ Pr(5-before-7)
        + Pr(6) ⋅ Pr(6-before-7)
        + Pr(7)
        + Pr(8) ⋅ Pr(8-before-7)
        + Pr(9) ⋅ Pr(9-before-7)
        + Pr(10) ⋅ Pr(10-before-7)
        + Pr(11)

In general, we seek Pr(point-before-7). Let's crawl around in this thing's pelt to get a feel for it. Suppose our point is 5. Here's a few ways Stage II could play out:

    5 (win)
    7 (lose)
    3 5 (win)
    3 7 (lose)
    8 9 12 5 (win)
    8 9 12 7 (lose)
    3 9 2 6 11 8 4 12 10 2 8 6 3 5 (win)
    3 9 2 6 11 8 4 12 10 2 8 6 3 7 (lose)

Gack. This is the worst kind of probability problem -- the kind where you can't just write down all of the possible outcomes and count (because there's infinitely many of them). Looking down and finding my safety net has vanished I begin to hyperventilate, until the merciful wash of aqua vitae beats my poodle brain tranquil again like a claw hammer made of GABA agonist. This brings a headache that rises with the sun and a craving for Dennys. But also clarity: To crack this problem we must put away childish notions of counting. We must give ourselves willingly to Stochastia, the fertile land whence all probability exam questions dwell, and swim in her warm blood waters.

There exists a light, sensei once told us, in which any problem appears simple. We must shine this light. Try this: Note for every sequence of rolls that ends in our point, an almost identical sequence exists that ends in 7. A doppelganger, with all entries equal save for the last. Imagine sorting these Stage II outcomes into two piles -- one containing the sequences that end in our point and the other with those that end in 7. Imagine putting these infinite piles into an infinite hat and pulling out a sequence at random. Convince yourself this single draw is equivalent to playing Stage II one roll at a time. Remember: Only the last roll matters -- throwing anything other than point or 7 does nothing. The game continues and dice do not have memory.

Now, the two sequence piles are of equal size, but that does not mean there is an equal chance of picking from one versus the other. Pr(7) is greater than the probability of any other outcome -- go look at Pr(S), above -- so it's more likely you're going to pick a sequence that ends in 7. Think of the slips of paper ending in 7 as being a little bigger or a little stickier, and therefore we are more likely to grab one of them compared to one that ends in our point. Compared to, is the crux here. It is the relative sizes of Pr(point) and Pr(7) that determine the probability of a Stage II win. The larger Pr(point) is relative to Pr(7), the better our chances.

How do we quantify this? As a first guess, we might write:

    Pr(point-before-seven) = Pr(point) / Pr(7)

Is this correct? No, as a little reflection should convince you. Suppose Pr(point) = Pr(7) -- yes, yes, for regulation dice Pr(7) > Pr(everything else) but right now I'm using thought experiment dice. This formula gives Pr(point-before-seven) = 1. That is, it predicts you always win Stage II. That can't be right. Unless Pr(7) = 0, 7 has a some chance to appear, and so you must have some chance to lose. We conclude this equation is wrong.

What we should have written is Pr(point-before-seven) equals the probability of "point" relative to the entire sample space of [ point 7 ]. Here is the correct expression:

    Pr(point-before-seven) = Pr(point) / (Pr(point) + Pr(7))

This equation behaves sensibly. It increases with increasing Pr(point) and decreases with increasing Pr(7). It is a proper probability, always between 0 and 1. Also, if Pr(point) = Pr(7), it predicts Pr(point-before-seven) = 0.5, which sounds correct to my ear.

Footnote: It is wholly the relative size of Pr(point) and Pr(7) that determine Pr(point-before-7). Their individual values are irrelevant. You have the same probability of winning Stage II if Pr(point) = 0.00001 and Pr(7) = 0.00002 (with the rest of the support soaking up 0.99997) as you do if Pr(point) = 0.1 and Pr(7) = 0.2 (with the rest of the support soaking up 0.7). Weird!

We can now plug all this into our craps equation, and compute Pr(win):

    Pr(win) = Pr(4) ⋅ Pr(4) / (Pr(4) + Pr(7))
        + Pr(5) ⋅ Pr(5) / (Pr(5) + Pr(7))
        + Pr(6) ⋅ Pr(6) / (Pr(6) + Pr(7))
        + Pr(7)
        + Pr(8) ⋅ Pr(8) / (Pr(8) + Pr(7))
        + Pr(9) ⋅ Pr(9) / (Pr(9) + Pr(7))
        + Pr(10) ⋅ Pr(10) / (Pr(10) + Pr(7))
        + Pr(11)

Substitute for all the values of Pr(foo) using the distribution we made earlier (i.e., Pr(4) = 3/36 and so on) and crunch the numbers. My calculator says:

    Pr(win) = 0.4929

That is, you will win a little less then half of the time. Or, to put it another way, you will lose just a little more than half of the time. This is why casinos offer craps: they will always get your money, provided you play long enough. Still, as far as games of chance go, craps is one of the fairer games going.

We now turn to this:

    Funny, got no money.
    I throw sixes and sevens and nines.


How does the constraint of throwing only sixes and sevens and nines alter our chance of winning? To get familiar with this new problem, consider if the lyrics had been:

    Funny, got no money.
    I throw sixes and eights and nines.


then, I claim, you always win. That is, Pr(win) = 1 under this constraint. Why? Well, if you never throw a two, twelve, seven, or eleven, then whatever you throw in Stage I will be your point. if you never throw a seven, eventually you will throw your point again in Stage II. You always win. No math required.

Alas, Mick slipped a pesky seven into the mix, so it's back to the blackboard.

It's the same game, just with a reduced support of [ 6 7 9 ] and a concomitant modified distribution. What is the probability of throwing a 6, 7, or 9 given that you only throw sixes and sevens and nines as the lyric demands? Are the probabilities 5/36, 6/36, and 4/36 as before? Some reflection suggests that is incorrect, for these three events now describe all possible outcomes and 5/36 + 6/36 + 4/36 don't add to 1, as the sum of probabilities describing all possible outcomes must. Perhaps, then, the answer is 1/3 for each -- there are three outcomes in the sample space, ergo the probability of each must be 1/3. That is also incorrect. Even though the sample space has been reduced, there are still more ways to throw a seven than either a six or a nine and so it deserves a greater probability.

What we require is conditional probability, a tool the astute reader will realize we have used already although we did not call it out by name. Conditional probability appears in our calculation of the winning probability, in terms like Pr(win | 5) or in words the probability of a win given that we threw a five, or in still more words the probability of a win given the condition that we threw a five. Ergo, "conditional" probability.

We seek the probability of throwing a six or a seven or a nine given the condition that those are all we throw. Let's list all the ways of obtaining each of these outcomes:

    ways of throwing a 6: (1,5), (2,4), (3,3), (4,2), (5,1)
    ways of throwing a 7: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1)
    ways of throwing a 9: (3,6), (4,5), (5,4), (6,3)

There are 15 combinations total, so in this reduced sample space we have Pr(6) = 5/15, Pr(7) = 6/15, and Pr(9) = 4/15. Easy peasy. Conditional probability is a vast and useful and complicated tool, and is also the foundation of Bayesian thinking, but this simple calculation is all we require of it today.

We now just plug these numbers into our equation for Pr(win). The lyrics-restricted sample space reduces the equation to:

    Pr(win) = Pr(6) ⋅ Pr(6) / (Pr(6) + Pr(7))
        + Pr(7)
        + Pr(9) ⋅ Pr(9) / (Pr(9) + Pr(7))

Substitute the conditional probabilities we just calculated:

    Pr(win) = 5/15 ⋅ 5/15 / (5/15 + 6/15)
        + 6/15
        + 4/15 ⋅ 4/15 / (4/15 + 6/15)

For which my calculator says:

    Pr(win) = 0.6582

Comparing this to the probability of winning at craps when played in the ordinary way, we find the chance of winning has gone from 0.4929 to 0.6582. Throwing sixes and sevens and nines increases our chances of winning. You might say we have advantageously contracted the support, which we will indeed say because, like nonlinear viscoplastic constitutive modeling and more whiskey please, it's just the way engineers talk.

More importantly, we've pushed past the break-even point: 0.6582 > 0.5, so in the long run we will win more times than we lose. And the longer we play, the more we win. We essentially have a license to print money. This is not something Mr. Jagger should be bellyaching about. Throwing sixes and sevens and nines will make him a man of wealth and taste. It should make him happy. His heart should be pounding like a big bass drum.

Conditional probability. It's a bitch.

Postscript

So what have we learned today? Not much. We learned how to play craps. We learned Aristotle was not Belgian. We learned underpants forts are a lesser-known perk of stardom. And, as always, we learned whiskey is the fastest route to mathematical clarity.

Finally, we learned you should not turn to rock n' roll for technical advice. It is a life choice fraught with misinformation, from the Rolling Stones' fallacious conditional probability to the Bareknuckle Ladies' drooling autotrophs. And don't even get me started on Jim Morrison (five to one, baby, is one in six, Mr. LizardKing) Also, there ain't no such thing as a fuelie head for a 396, a story for another time perhaps. I'm starting to wonder if Bruce Springsteen really is from Jersey.

Anyway, to be honest I prefer the Linda Ronstadt cover.

No comments:

Post a Comment