LabKitty: A Primer on Variational Calculus

A LabKitty Primer is an introduction to some technical topic I am smitten with. They differ from most primers as to the degree of mathematical blasphemy on display. I gloss over details aplenty and mostly work only the simplest cases. I show the dead ends and things any normal person would try that don't work. Mostly I want to get you, too, excited about the topic, perhaps enough to get you motivated to learn it properly. And, just maybe, my little irreverent introduction might help you understand a more sober treatment when the time comes.

Variational calculus (VC) is the Cirque du Soleil of mathematics. It has a reputation of impenetrable weirdness. It can do things seemingly nothing else can. And even though you recognize the basic ingredients that go into it (juggling, tumbling, French), they are taken to such extremes you can't but help feel you don't know them at all.

The motivating idea is simple: find a function that maximizes or minimizes some expression. Often, that expression is an integral involving the function. The concept is not entirely alien to the student of calculus. You are taught how to find function extrema back in Calc-I: recall taking derivatives and setting them equal to zero. There the function was given and you sought special values of the independent variable(s). But finding a function that satisfies a given expression is also not new. I dare say that's what it means to solve a differential equation.

Variational calculus leverages the synergy of these two ideas. A mathematical peanut butter cup, as it were. And VC allows you to attack problems that can be solved in no other way. The type of problems that are at the heart of engineering and technology. Find the airfoil shape that minimizes drag. Find the transistor layout that maximizes chip speed. Find the protein conformation that maximizes drug uptake. Beyond the human realm, we find almost every fundamental law of nature expressed in the language of variational calculus. The motion of a particle minimizes the difference between its potential and kinetic energy. Soap bubbles assume the shape that has minimal surface area. Light propagates along the path that minimizes travel time. Variational calculus is like reading the mind of God.

Usefulness aside, variational calculus is a jewel in its own right. Mathematical poetry. The basic result can be written down in about half a page. Within that half page is nothing you can't understand if you've taken a couple semesters of calculus. But there are some impressive leaps of intuition on display, some of which lead to calculus contortions you might never thought possible. It's a little like hearing Eddie Van Halen play Eruption for the first time, just when you were getting a handle on Polly Wolly Doodle. This is calculus for grown-ups. I will do all I can to help, but if you have one doubt, one shadow along the way, you will be flung into the abyss like the knights crossing the footbridge at the end of Holy Grail. Make it to the other side, however, and the reward is great.

Enough cheerleading. Let's get on with the show.

Strange Derivatives

Before we go any further, a word on some unpleasantness that's about to happen.

If I tell you y(x) = sin(x) and ask for dy/dx you would probably be insulted. Clearly, dy/dx = cos(x). Instead, if I ask what is dy/d(sin(x)), you may be less insulted and more confused. Are you crazy, woman? perhaps you are thinking, which is what I thought the first time I came across this in the classroom.

One way forward is to think about the meaning of the derivative. If we change the independent variable a little, the function also changes a little. The ratio of those changes is (approximately) the derivative. If sin(x) changes by, say, one unit, and y = sin(x), then y also changes by one unit. This suggests that dy/d(sin(x)) is 1. (The derivative dy/dx -- our first example -- is a more complicated because the change in y now depends on the value of x. As we already know, that dependency is cos(x) and we have dy/dx = cos(x).)

We can make our problem look more familiar by using substitution. Let u = sin(x), then y = u, and dy/d(sin(x)) = dy/du. That's familiar territory: dy/du = 1, confirming the hand-waving argument above.<

We take derivatives with respect to functions a great deal in variational calculus, with one final twist: we won't be given the function. That is, we're going to encounter derivatives like d(something)/df, where f is some unknown function of x. This is no different than substitution, except here we are substituting a single letter for a function we don't yet know. We will even encounter strange animals like d(something)/df', where f' is the derivative of the unknown function. Hopefully, with this brief detour making you comfortable with concept of "taking derivatives with respect to a function" we will be able to press on when this weirdness appears. You won't be required to find the answer, but you'll need to be able to understand the question. And there's one more thing.

One More Thing

Go stare at this doodad on integration by parts for a couple of minutes. We'll need it later.

Begin your Tour Here

Now we are ready to begin.

In the intro, I described the core concept of variational calculus as finding a function that minimizes or maximizes some given expression. The classic example is finding the shortest path between two points a and b. We already know this is a line, so that provides an independent check on the result. If we take the trouble to develop all the machinery of variational calculus and it doesn't spit out the right answer for such a simple problem, then something is horribly wrong.

What we're really asking is the path of minimum arc length. From first or perhaps second semester calculus, we know how to compute arc length. It's the following integral

∫_a,b √ [ 1 + (y')² ] dx [1]

We need to come up with some kind of calculus voodoo so that when applied to this problem it spits out y(x) = mx + b, the equation of a line.

Footnote: I'm pretty stupid when it comes to HTML, which explains my strange integral formatting: ∫_a,b blerg dx is the integral of blerg from a to b. The "b" should be a superscript on the integral sign. I don't know how to do that. Better get used to it now.

Footnote: Every variational calculus textbook ever written starts with this example. I guess that's understandable; the problem is simple to state and you've known the answer since grade school. Still, the integral for arc length is kind of wonky (square root of one plus the square of the derivative). It'd be nice begin with something plainer. You might think minimizing ∫ y(x) dx would be problem zero. But that just gives y(x) = 0, which is rather boring (by "minimum" we mean in absolute value). So we do the arc length problem. C'est la vie.

Let's introduce general notation so I can refer to various bits and pieces as we go. I'll try to use the same notation as Wikipedia so if you go there someday things will look familiar.

Equation [1] has the form:

   J[y] = ∫_a,b F(x,y(x),y'(x)) dx [2]

Note J[y] is a function of a function (emphasized by using square parens). More accurately, J takes a function and returns a number. Mathematicians call such a beast a functional. In the example we're developing, J takes a path y(x) and returns its length (7, 42, √π, whatever). F(x,y(x),y'(x)) is just the integrand, which may include y, its first derivative, and the independent variable x. The integrand doesn't necessarily includes all of these -- only y'(x) appears explicitly in the arc length integrand -- but the convention is to write F as shown. More advanced treatments get into complications like higher-order derivatives or more dimensions or variable endpoints. This makes the details uglier and the textbooks thicker, but it really doesn't add anything new conceptually. Equation [2] will suffice for our needs.

We would like to identify the function y(x) that minimizes J[y]. It's not clear how to proceed. It feels like we should take some kind of a derivative and set it equal to zero. But what? And how? As such, we apply one of the Golden Rules of mathematics: if you can't solve a problem, recast it as a problem you can solve.

I won't keep you in suspense, but you might pause here and ask yourself: what other kind of calculus problem involves finding a function that satisfies some given relation? It might occur to you that is exactly what a differential equation demands of you. And that's how we're going to attack the problem. We're going to convert minimizing the functional into an equivalent differential equation.

I'm not saying that by generating a differential equation we'll be home free -- we might simply create a different problem we can't solve because the differential equation is too hard. But the problem as it stands is hopeless, and "hard" is better than "hopeless."

Hey Rocky, watch me pull a rabbit out of my hat

What I'm about to show you is one of the coolest tricks in all of mathematics. I wouldn't have come up with it in a million years. I will warn you up front: it makes no sense whatsoever. The fact that it leads to a solution is astonishing. Pure magic. Whoever came up with it (I guess that would be Euler) certainly deserves his fame.

The trick goes as follows: If we knew the minimal function, it would be possible to write any function as the minimal function plus some other function. The "some other function" is just the difference between our guess and the correct answer. In symbols:

   y(x) = f(x) + δf(x)

Here, y(x) is any function, f(x) is the minimal function we're looking for, and δf(x) is the difference between the two.

How does that help? Wait, it gets worse.

Let's break up δf(x) and write it as an ordinary variable ε times yet another function η(x)

   y(x) = f(x) + ε ⋅ η(x)

The function η(x) must be differentiable (i.e., smooth) Also, we assume η(a) = η(b) = 0. Why? Well, we don't know much about f(x) yet, but we do know it starts at a and ends at b. Therefore, y(x) must also start at a and end at b. If η(x) isn't zero at the endpoints, it would knock y(x) off them. Ergo, we must have η(a) = η(b) = 0. This will be very important later. (Aside: η is pronounced "eta.")

You can think of η(x) as describing the shape of how our function y(x) differs from the minimal function and ε describes the magnitude of the wrongness. In formal language, f(x) + ε⋅η(x) is a one-parameter family of functions; our guess y(x) is simply one member of this family (see below).

Perhaps you're thinking: ...wait a sec. I can come up with a y(x) that doesn't look anything like y(x) in this figure, no matter what value of epsilon you pick! Yep, that's true. But that just means we would use a different η(x) for your function. And note: no matter what η(x) is, y(x) transmogrifies into f(x) when ε = 0 (because then y(x) = f(x) + 0⋅η(x) = f(x)). The guess becomes the answer.

Now, you are probably thinking: Oh, hooray. We started with one unknown function, f(x). We now have two unknown functions and an unknown variable. How does THIS help?

By writing y(x) as f(x) + ε⋅η(x) we have converted the functional J[y] into an ordinary function J(ε). The function f(x) is fixed. For any guess y(x), the function η(x) is fixed. Therefore, when we plug y(x) into J, the only variable is ε. Furthermore, when ε is zero, J[y] is minimized, because we have y(x) = f(x), the minimal path. We know from ordinary calculus that at the minimum of a function the derivative is equal to zero. In other words, the derivative of J(ε) with respect to ε evaluated at ε = 0 is equal to zero.

the functional value as a function of epsilon

Some more explanation words: Imagine all possible functions y(x) as living in a two-dimensional "function space." Along one dimension is function shape. At some point on this "shape axis" is, say, the line y(x) = x. Over there are the polynomials. A little further down the shape axis are the trig functions. Then come hyperbolics. Further out are strange shapes we don't have names for. And so on. The other dimension of the function space is just ε -- call it the "magnitude" axis. For any shape, changing ε just makes the function bigger or smaller. Alas, I don't know how to define a shape axis, and even if I did there's no obvious value of s at which J does anything special or interesting or useful.

On the other hand, not only is it straightforward to define a magnitude axis, we know a value of ε at which J does do something special. For any function shape (i.e., fixed s), we know that dJ/dε = 0 at ε = 0. Again, this is because we wrote y(x) = f(x) + ε⋅η(x). So at ε = 0, y(x) = f(x) -- the function that minimizes the functional J[y]. Ergo, we have dJ/dε = 0 at ε = 0. (I suppose we should write this as the partial derivative ∂F/∂ε = 0, but the derivation usually doesn't get into this wacky idea of a shape axis and a magnitude axis, and just considers J to be a function of a single variable ε.)

This is the crux of the derivation, so pause here and make sure you follow the logic. It might help if I point out we are not at a point: Eureka! There's the answer! We are at a point: Hmm. Let's see if this leads somewhere.

Summarizing what we have so far in symbols:

   J[y] = ∫_a,b F(x,f(x)+ε⋅η(x), f'(x)+ε⋅η'(x)) dx = J(ε)

and

   dJ/dε |_ε=0 = 0

Note if y(x) = f(x) + ε⋅η(x) then, taking the derivative of both sides, we obtain y'(x) = f'(x) + ε⋅η'(x) which is what I substituted for y'(x) as the third argument in the integrand.

We now need the following derivative:

d/dε ∫_a,b F(x,f(x)+ε⋅η(x), f'(x)+ε⋅η'(x)) dx |_ε=0 = 0

Move the derivative inside the integral:

∫_a,b d/dε [ F(x,f(x)+ε⋅η(x), f'(x)+ε⋅η'(x) ]_ε=0 dx = 0 [3]

Footnote: There's conditions an integrand must satisfy in order to move the derivative inside the integral. You can find them in any advanced calculus textbook. Long story short: most functions you run into satisfy the conditions.

Apply the chain rule to compute the derivative of the integrand:

dF/dε = ∂F/∂x ⋅ dx/dε

+ ∂F/∂f ⋅ df/dε

+ ∂F/∂f' ⋅ df'/dε

But y(x) = f(x)+ε⋅η(x) so dy/dε= η(x). Also y'(x) = f'(x)+ε⋅η'(x) so dy'/dε= η'(x). Also x is not a function of ε so, dx/dε = 0. Substituting all this, we obtain:

∫_a,b [ ∂F/∂f ⋅ η(x) + ∂F/∂f' ⋅ η'(x) ] dx = 0 [4]

I've dropped the |_ε=0 that was in [3] because ε doesn't appear in [4].

We are almost done. But before jumping to the end, let's look at a wrong turn we might be tempted to try at this point.

A Brief Detour into a Dead End

The integral in [4] is zero for all values of f(x) and η(x). The standard trick when handed this sort of thing is to conclude the integrand itself is identically zero. So we could write:

   ∂F/∂f ⋅ η + ∂F/∂f' ⋅ η' = 0

Unfortunately, this doesn't get us anywhere. There's two functions here: f(x) -- the function we're looking for -- plus this bastard function η(x) we're still dragging around. Put simply: one equation, two unknowns. It isn't that the idea wrong. The problem is it's not useful.

The Home Stretch

We need some way to get η(x) out of the picture. Is there anywhere η(x) is zero? Yes there is: at the end points, i.e., at the integral limits. When do we evaluate things in an integral at the integral limits? After performing the integration.

Let's rewrite Equation [4] in a little simpler form:

∫_a,b [ η ⋅ ∂F/∂f + η' ⋅ ∂F/∂f' ] dx = 0

Now I'm going to write this as the sum of two integrals to make it easier to describe the final magic trick:

∫_a,b η ⋅ ∂F/∂f dx + ∫_a,b η' ⋅ ∂F/∂f' dx = 0 [5]

Recall the doodad on integration by parts I told you to look at earlier. The gist is:

∫_a,b foo' ⋅ bar dx = foo ⋅ bar |_a,b – ∫_a,b foo ⋅ bar' dx

Which we can apply to the second integral in [5] to get:

∫_a,b η' ⋅ ∂F/∂f' dx = η ⋅ ∂F/∂f' |_a,b– ∫_a,b η ⋅ d/dx [ ∂F/∂f' ] dx

However, η(x) is equal to zero at the endpoints! The first term on the RHS vanishes and we can write Equation [5] as:

∫_a,b η ⋅ ∂F/∂f dx + ∫_a,b η ⋅ d/dx [ ∂F/∂f' ] dx = 0

Or, putting everything back into one integral and rearranging:

∫_a,b η { ∂F/∂f + d/dx [ ∂F/∂f' ] } dx = 0

Now we break out the argument that if an integral is zero, it must be that the integrand is zero:

   η(x) { ∂F/∂f + d/dx [ ∂F/∂f' ] } = 0

Last time we tried this argument, it went nowhere. But now we have something nice to work with. We coaxed η(x) out front; the integrand has the form of a product η(x) { stuff }. And η(x) can be anything. It follows that the quantity in the brackets must be identically zero (see footnote below):

   ∂F/∂f + d/dx [ ∂F/∂f' ] = 0

We have obtained a differential equation for f(x). The function f(x) that satisfies this differential equation also minimizes the functional L[f] in Equation [2]. We have established that connection above, step by tiny step.

This is the famous Euler-Lagrange equation. We may or may not be able to solve it when we apply it to a given problem. But at least we have some idea how to solve an ODE. Our chances of minimizing a functional have gone from hopeless to possible.

Footnote: A formal proof of the final step used in the derivation appears in most VC textbooks and goes by the fancy name The Fundamental Lemma of Variational Calculus. We have a product of two functions foo and bar. If we play a game where you give me a bar and I can pick any foo I want and you lose the game if the integral of foo { bar } is not zero, then your only sure bet is picking bar = 0. The proof of the FLoVC is a little more involved, but that's basically the idea.

Footnote: Looking this over during proofreading, I became concerned that by breaking things down into such a large collection of small steps I have made the derivation look more complicated than it is. To be sure, there are subtleties and twists that may take some effort to internalize. However, the lengthiness may indeed give the appearance of unwarranted inscrutableness. The derivation of the Euler-Lagrange equation on Wikipedia is about a half-dozen lines; you might take a gander at it if you're having trouble seeing the forest in my trees.

Finally: Minimizing Arc Length

We now return to the arc length example and demonstrate the Euler-Lagrange equation does indeed predict the shortest path between two points is a line.

Our functional is the following:

   L[y] = ∫_a,b √ [ 1 + ( y' )² ] dx

i.e., F(x, y, y') = √ [ 1 + ( y' )² ]. We now know the y(x) that minimizes the functional satisfies the following differential equation:

   ∂F/∂y + d/dx [ ∂F/∂y' ] = 0 [6]

Footnote: Earlier, I used f(x) as the minimal function and the Euler-Lagrange equation [6] was written in terms of f and f'. Now I'm writing it in terms of y and y'. Hopefully you can grok that it doesn't matter what letter we use. Confusing, perhaps, but mathematics runneth over with this sort of evil so I thought I would include an example here as a bonus lesson.

The first term in [6] is zero (because y doesn't appear explicitly in F). We're left with:

   d/dx [ ∂F/∂y' ] = 0

or:

   ∂F/∂y' = a

for some constant "a" (anytime you're given "derivative of wawa is zero" you know immediately wawa = constant).

Computing the partial of √ [ 1 + (y')² ] with respect to y' we obtain y' / √ [ 1 + (y')² ].

Easy stupid mistake: You may be tempted to write ∂F/∂y' = y' ⋅ y'' / √ [ 1 + (y')² ] which is WRONG. That is not ∂F/∂y', that is ∂F/∂x. If it helps, let u = y'(x) and compute ∂F/∂u.

Substituting:

   y' / √ [ 1 + (y')² ] = a

Rearrange this to obtain:

   y' = a / √ [ 1 − a² ]

The RHS is still just a constant. Let's rename it "m":

   y' = m

Integrating, we obtain:

   y = mx + b

where "b" is a constant of integration. This is the equation of a line. Yay!

Footnote: The Wikipedia page on VC includes this example but solves the differential equation in a different way. I'm just showing you an alternate approach. Pick the one you like.

Epilogue

We have proven the shortest path between two points is a straight line, and it only took 10,000 words and a dozen pages of calculus. I'm being facetious of course; the true value of what we've done is the wider world of problems waiting for you. There's the brachistochrone problem of bending a wire to make a bead slide from one end to the other in shortest time. There's the isoperimetric problem of finding the shape with a given perimeter that encloses maximal area (note to finite element jocks: "isopARAmetric" in FEM means something different). In the modern world, there's the problem of vectoring a fighter jet to intercept an incoming bomber in minimal time, or selecting the flight path for a commercial jet that uses minimum fuel. And of course there is modern physics, quantum or otherwise, where you can't get two steps in the door without bumping into a variational something or other.

The details get more detailed and the differential equations get harder to solve, but you now have all the conceptual tools you'll need. I suppose you could have taken the Euler-Lagrange equation as a starting point and gone from there. Hopefully, however, the effort to look behind the curtain was worth it. You've seen an example of how calculus is stretched and extruded to solve a new class of problems. If you can't solve a problem, recast it as a problem you can solve. That principle is probably 99% of mathematical progress (the other 1% is whiskey). These sort of exercises help to up your game, no matter what the specific application. As my advisor used to say: If you want to make the problems you're working on seem easier, work on harder problems.

Recommended Reading

To wrap things up, let me point you to a few conventional treatments I have found helpful and you might also. Disclaimer: I have no affiliation with the authors or institutions listed in these sources, most of whom would be shocked and appalled by my approach to explaining mathematics.

Calculus of Variations -- Bobby Weinstock

Weinstock's Calculus of Variations is the alpha and omega of variational calculus textbooks. It's inexpensive, short, and superbly written. Includes applications from classical and quantum mechanics, elasticity, optics, and of course the arc length problem. To be honest, I'm not sure why people kept writing VC textbooks after Weinstock was published. Faculty gotta eat, I suppose. Bonus: buy it used and get old-book smell for free.

Lecture Notes -- University of Houston

Google any variation (zing!) on the phrase "variation of calculus" and you'll get a mountain of potential help. This is just one search result I particularly liked. It looks to be a collection of lecture slides, in pdf form. Lots of worked examples. Gets into a few of the mathematical formalities I glossed over without being a pain about it. Very nice.

AVMM Class Notes -- University of Colorado

Apparently the University of Colorado is/was developing a variational calculus course in its aerospace engineering program and created this pdf collection of class notes. The website lists them as "in progress." There's a few typos and missing figures, but most seem reasonably complete. I think the author is Carlos Felippa, who is a familiar name in computational mechanics and an outstanding educator. That being said, the treatment doesn't shy away from formal mathematics and eventually heads off into some advanced topics. Think of it as VC from an engineer's perspective. YMMV.

Wednesday, July 16, 2014

A Primer on Variational Calculus

2 comments: