LabKitty: A Primer on Stochastic Differential Equations

A LabKitty Primer is an introduction to some technical topic I am smitten with. They differ from most primers as to the degree of mathematical blasphemy on display. I gloss over details aplenty and work only the simplest cases. I'll show mistakes and dead ends and things any normal person would try that don't work. Mostly I want to get you, too, excited about the topic, perhaps enough to get you motivated to learn it properly. And, just maybe, my little irreverent introduction might help you understand a more sober treatment when the time comes.

The appearance of Margot Robbie in the recently leaked footage of Suicide Squad got me thinking about Margot Robbie in The Wolf of Wall Street, and that of course got me thinking about the Black Scholes financial model. (Aside: I don't understand what about Suicide Squad requires a 2016 release; the movie could be Harley Quinn reading the phone book for 90 minutes and nerds would flock to see it. But I digress.)

The Black Scholes model is an example of a stochastic differential equation (SDE). It's also an example of mathematics destroying the world, sorta like how an atom bomb is really just a Markov chain except this time we dropped it on ourselves. For Black Scholes is one of the cornerstones of the derivatives trading biz (the other three corners are cocaine, prostitutes, and suborned congressmen). Scholes won a Nobel prize for his work and the very next year his brokerage firm had to be bailed out for something like a $4 billion. If that doesn't make you hate math, I don't know what will.

Continuing my runaway train of thought, SDE are also an example of something I call Horace mathematics, after the Roman poet Horace who penned: Parturient montes, nascetur ridiculus mus (the mountains will be in labor and an absurd mouse will be born). SDE are an absurd mouse indeed. The gulf between their fearsome reputation and how simple they are to implement on a computer is wide enough to fit all of the planets in the solar system (which is approximately the distance from the Earth to the moon, by the way). Wider than the time between Cleopatra and the building of the pyramids (who lived closer to modern times than to that of the pharaohs). Wider than the gap between Tyrannosaurus and Stegosaurus (who lived closer to us than to Stegosaurus).

Seriously. I haven't been let down this hard since sea monkeys.

Meet the SDE

SDE complete the differential equation Trinity: ordinary, partial, stochastic. They are used to model a process that isn't wholly random -- forcing us down into the Random Process bunker where we break out the party hats if we can come up with a probably distribution that describes the thing -- but there is a random element present that demands representation. A boring deterministic so-and-so tainted by some kind of Gaussian weirdness, but we're not quite ready to throw the baby out with the bath water. Instead, we desperately cling to differential equations like shipwreck survivors to bobbing flotsam in the angry sea.

In Black Scholes, it's the boring determinism of compound interest tainted by cocaine, prostitutes, and suborned congressmen which is used to generate a model of random-ish stock price fluctuations. I don't really grok financial lingo ("optioning"? "arbitrage"? "suborned"?) so let's consider a simpler growth model to introduce our topic.

Let's consider literal growth. Bacteria in a petri dish, ungulates in a nature preserve, humans commanded to be fruitful and multiply. Pick your favorite. We seek to predict the size of the population as a function of time. The simplest description we can concoct is the exponential model. We assume the population growth rate is proportional to the population size and we write the following ordinary differential equation

dn/dt = cn (1)

Here, n = n(t) is the size of the population and c is some constant measured experimentally. (Yes, n is an integer, but for a large enough population you can assume n(t) is a continuous function and the sky doesn't fall down.)

The solution to Equation 1 is the familiar exponential

n(t) = n0 exp(ct) (2)

Here, n0 is the size of the population at time zero and is assumed to be given.

It would be nice if population growth were this simple, but as you know the world is full of unpredictability. Wars. Famine. Disease. Broken condoms. Surprise quintuplets. A school bus gets T-boned by a dump truck. We cross a few n off the list. Your trollop of a housecat drops another litter. We add a few surplus n to the list. Our gut tells us exponential growth still exists somewhere underneath this mess -- we'd like to retain Eq. 1 if possible -- but the growth curve is no longer smooth. It's our job to translate that fuzziness into mathematics and include it in the model.

Our starting point is Euler's method. We rewrite Equation 1 after multiplying both sides by dt:

dn = cn dt (3)

Footnote: Rewriting a first order ODE in this differential form is such a knee-jerk response that it's easy to forget doing so is technically wrong. The trick is only permitted here because it gives the right answer.

Since our goal is solving SDE on a computer, let's change this ODE into something we can solve on a computer. We swap out the differentials for differences and obtain:

Δn = cn Δt (4)

Or, making what's happening inside Δn explicit:

n[next] – n[now] = c n[now] Δt
⇔ n[next] = n[now] + c n[now] Δt (5)

Given the starting population n₀, we can now compute the population growth numerically by iterating Eq. 5. Here's some Matlab code which does that (it also generates the analytic result for comparison):

c = 0.1;
n0 = 10;
t_max = 50;
npnts = 100;
dt = t_max / npnts;
n = [ ];
n(1) = n0;

for index = 1:npnts
n(index+1) = n(index) + c * n(index) * dt;
end

t = [0:dt:t_max];
analytic_n = n0 * exp(c*t);

clf;
plot(t,n,'bo'); hold on;
plot(t,analytic_n,'r');
axis([0 t_max 0 max(n)]);
grid on;

This is Euler's method. Here's the output:

The numerical solution is plotted as blue circles superimposed on the analytic solution, n₀ exp(ct), plotted in red.

Now we add randomness. We do this by literally adding randomness

Δn = cn Δt + randomness (6)

We need to pause and think about what this randomness should look like (as one of my professors used to say: Just because you're doing mathematics doesn't mean common sense isn't applicable). We assume the following:

(1) There will be good times and bad times, but they even out in the long run. (This does NOT refer to the overall trend of exponential growth, which is ever upward, just the random tweaks that are occurring as time passes.) Sometimes randomness adds a few n, other times it subtracts a few, but the average effect is zero.

(2) The randomness is proportional to Δt. The longer the time step, the more opportunities there are for tomfoolery and the less certain we are about how much to tweak n. If you know the population today, you have more confidence in a prediction of the population tomorrow than the population in a hundred years. In probability, such uncertainty goes by the name variance.

(3) Really big tweaks (in either direction) are rare. Wars happen (big negative tweak), but rarely (small probability). Duggars happen (big positive tweak) but rarely (small probability).

So, in your mind's eye imagine the pdf describing the randomness. By Property (1) it's centered at zero. By Property (3), it trails off to nothing at the left and right. Finally, by (2) the pdf is Gaussian with variance equal to Δt.

Footnote: Yes, I sneaked in the "Gaussian" part hoping you wouldn't notice. It comes from the Central Limit Theorem, which states the distribution of the sum a large number of random variables is Gaussian, no matter what the distribution of the individual random variables. In our case, the individual random variables is the tomfoolery and their summed effect is the net tweak we apply to n. We don't know how many individual incidents of tomfoolery there are, but for any reasonable number the CLT guarantees the net effect is Gaussian. Consult any probability textbook for a proof of the Central Limit Theorem (it's actually more straightforward than you might imagine).

Summing up, our randomness is Gaussian distributed with zero mean and variance Δt. This is usually indicated N(0,Δt) -- "N" for "normal" -- however, the SDE community has its own private kooky notation and writes this as ΔW. In equation form

Δn = cn Δt + ΔW (7)

The W stands for (Norbert) Wiener, ΔW is a called a Wiener increment, and a process described by something like Equation 7 is called a Wiener process.

Footnote: At some point in the proceedings we're going to have to come to grips with the fact that it's impossible to say "Wiener increment" in a classroom of undergraduates without setting some of them to tittering.

I have, as usual, explained something not terribly complicated using a great many words. So, perhaps we should just get on with the code. The only change needed to Euler's method is to jigger each step with some N(0,Δt) noise. That is, we change the line

n(index+1) = n(index) + c * n(index) * dt;

n(index+1) = n(index) + c * n(index) * dt + sqrt(dt) * randn;

The Matlab function randn() returns an N(0,1) random variable. If you recall your probability theory, you recall multiplying this by the square root of Δt converts it to an N(0,Δt) random variable (i.e., var(aX) = a^2 var(X), for any constant a. We want the variance to be Δt, so we multiply by √Δt).

Here's some sample output:

This is four runs of the program, with the output plotted as blue circles superimposed on the (nonrandom) analytic solution for reference. I've reduced t_max to 10 in these plots -- if the population is given time to grow large then the plot gets scaled down and you can't see the randomness.

Congratulations! You now know how to solve a stochastic differential equation. I'm not kidding. Adding random jigger to Euler's method even has a fancy name: the Euler-Maruyama method. Workhorse of computational SDE, for which I presume Dr. Maruyama got tenure and a paycheck, albeit implementing it on something like a PDP-10.

If you're thinking that can't possibly be all there is to SDE, I assure you, madam, it is. Or it is computationally speaking. If you want the pencil and paper kind of solutions, that's a different story. Things are no longer sunshine and rainbows and kittens riding unicorns. Now, the unicorns gore the kittens and the kittens riding the unicorns wear spurs.

The mathematics is about to turn nasty.

The Mathematics Turns Nasty

Recall we began with dn/dt = cn, which we then converted to dn = cn dt to arrive at Δn = cn Δt. It stands to reason that we should be able to work backwards from Δn = cn Δt + ΔW and arrive at dn/dt = cn + dW/dt. Can we?

Alas, we cannot. You always, always, always, see SDE written in differential form. For the stochastic exponential growth model, we would write

dN = cN dt + dW (8a)

I'm now writing N(t) instead of n(t) to emphasize population size is now a random variable. At any time t, we know what it can be, but not what it will be. That's (sort of) the very definition of a random variable.

You always, always, always, see SDE written in differential form. (Did I just say that? I think I just said that.) Why? The crux of the matter is dW.

Consider, for a moment, blindly dividing Equation 8a through by dt to obtain:

dN/dt = cN + dW/dt (8b)

Can we make sense of this? It kinda feels like a system of differential equations, except systems usually don't have derivatives on the RHS and we don't have an equation for dW/dt anyway. We have one equation and two unknowns. Oddly enough, this is not the worst part of Equation 8b.

The worst part is dW/dt doesn't exist. Gaussian noise is nowhere differentiable. You might start to suspect this just by looking at it.

Here, look at it:

You can prove dW/dt doesn't exist rigorously, but that would take us into an unnecessary detour. It's unnecessary because the finite time step of the Euler-Maruyama method allows us to nonchalantly wave at all this horror as it goes by like Temple Grandin at a slaughterhouse. For a finite time step, dW becomes ΔW, and ΔW is just the difference between two Gaussian random variables. That presents no existential crisis whatsoever. If bar is N(0,t1) and foo is N(0,t2) then their difference is Gaussian distributed with mean zero and variance t2–t1 = Δt. That's like Probability 101. It's only if you insist on taking the limit of foo – bar as Δt goes to zero that bad stuff happens. This does not result in dW/dt as it would in ordinary calculus; the limit does not exist in the ordinary sense.

You might think we should not be using a computational version of mathematics that has such a fatal flaw at its very core no matter how simple the computational version may seem. The good news is someone eventually figured out a way to define dW/dt as a proper derivative (that someone was the Japanese mathematician Kiyoshi Ito) and so SDE got their bona fides and unbanned from the Woolworth. This is similar to the story of the Dirac delta function, which Dirac was using to solve problems in quantum mechanics for years before somebody finally put it on a firm mathematical foundation.

Ito took his derivative out for a spin and invented a new integral and a new chain rule and much else besides, and you need to master this new and miserable calculus if you want to find analytic solutions of SDE. Not only does this make SDE harder than ODE, but the old bugaboo of differential equations does not go away. Alas, for most SDE, just like most ODE and PDE, an analytic solution does not exist. Hence, we turn to the computer. To be sure there are issues there as well -- of stability and convergence und so weiter -- but the basic result is easy to understand. As I showed above, it's a simple modification of Euler's method.

SDE are, at their heart, Horace mathematics.

Recommended Reading

LabKitty puts the "primer" in "LabKitty Primer." Technically, I suppose I also put the "LabKitty" in "LabKitty Primer." But you get my point. LabKitty you furry tease, you might be now shouting, where is my Ito calculus? What about mean square convergence? Or convergence in probability? Or convergence in distribution? Or Brownian Motion? Or the Langevin equation? Or the Fokker-Planck equations? What gives? (I hear these protests being hurled by Ben Yahtzee in my head.)

Nolo contendere. I freely admit my coverage is stunningly abridged, a literary hemisphere-ectomy that excised all of the hard stuff. My primary excuse is I was under a tight deadline, for the Suicide Squad intro has a finite shelf life as a pop culture reference and I had to get this posted toot sweet if I was to have any hope of getting ranked in the search results. Or at least that's what my SEO for Dummies book says.

In hindsight that justification seems a little spurious, for this is the Internet and the Internet is eternal. As far as I know, you could be reading this thousands of years in the future and have no idea who or what a Margot Robbie is because you're too busy foraging for food and dodging Morlocks. Yet, if that were true it raises tough questions about why you would be reading a primer on stochastic differential equations in the first place.

What I'm saying is here's some books I think do a nice job of explaining stochastic differential equations beyond just sticking one a computer.

Mathematical Methods in Biology -- Logan and Wolesensky

SDE typically don't feature in textbooks that aren't specifically about SDE. Logan and Wolesensky is a notable exception. It's a rather impressive effort given that they begin almost at a keep this end pointed away from your face level of sophistication on page one, and by the end they're talking SDE. To be sure, the coverage has gaps, with more than a few hand waving arguments of the you mouth breathers wouldn't understand it variety putting in an appearance. But their presentation is an approachable (and short) introduction to many of the analytic results a certain someone I could mention glossed over. The stuff in the other 300 pages is pretty swell too. That being said, my copy has some wicked typos. Stay frosty.

An Introduction to Stochastic Processes with Applications to Biology -- Linda Allen

Linda Allen (Texas Tech) is the reigning queen of stochastic epidemiology. There's epidemiology in AItSPwAtB, but there are many other things besides. There's other kinds of models in AItSPwAtB, but there are SDE as well. The level of sophistication on display is much higher than Logan and Wolesensky. Allen is, after all, a mathematician. Still, she doesn't rub your face in it. I dare say she's the rare kind of mathematician who actually wants you to understand what she's going on about. I also like Edward Allen's Modeling with Ito Stochastic Differential Equations and similar praise and caution applies. IIRC, Edward and Linda are husband and wife. What are the odds of two people getting married who just both happen to be experts in stochastic modeling? That there is some Lincoln/Kennedy level of weirdness.

The Theoretical Biologist's Toolbox -- Marc Mangel

Mangel is the man -- Mangel Sensei we call him here at the LabKitty -- for all things mathematical biology. His TTBT is a rare gem of a textbook that really really wants to get you into mathematical modeling but refuses to sugarcoat the material. It also features a stout chapter on stochastic differential equations. It's not an easy read -- as if anything about SDE is -- but it's well worth the effort for Mangel's inimitable style and insights.

Brownian Motion Calculus -- Ubbo Wiersema

I have a love/hate relationship with Ubbo; the presentation is decidedly unlike any other but it only works for me sometimes. It's definitely the most approachable SDE textbook I have found. There's any number of a-ha! moments inside. But there are, for me at least, just as many WTF moments too. Sometimes I find the plain vanilla approach of one of the Allens gets a point across better. Ubbo's book is also squarely aimed at finance majors, although the early going is generally enjoyable and reasonably light on jargon. YMMV.

Wednesday, July 22, 2015

A Primer on Stochastic Differential Equations

No comments:

Post a Comment