LabKitty: Lagrange Multipliers

Lagrange multipliers get my vote as the most wonderful thing in all of calculus, narrowly edging out the red dress Donny Kerabatsos' sister wore to the Calc II final. That was also wonderful, although I don't remember it too clearly, busy as I was bleeding from the ear holes at the time.

Speaking of innocent flowers, who decided Lagrange multipliers should be taught in terms of parallelifying vectors in function space? Orthogonal gradients and directional derivatives and level curves and dot products?? Remember it that way and a month from now you'll be all: What was that Lagrange multiplier thingy about again?

Remember it my way and a month from now you'll be all: Jengo!

(I suppose now you're expecting a side-by-side comparison of the two approaches, listing in detail the strengths and weaknesses of each and in so doing see a case pleaded for the superior. I don't really see that happening. LabKitty is an unhinged Internet demigod, not some Madison Avenue billboard pirate. The product here is revolution, mister, not soap flakes.)

So, come, join me at the freeway on-ramp clutch of discarded refrigerator boxes for a lecture on how to use, and remember, Lagrange multipliers

The goal is constrained optimization. Let's use 2 independent variables for illustrative purposes. We have a function, say, f(x,y). We'd like to maximize or minimize the function. But we also have to satisfy a constraint of the form g(x,y) = c. Form the augmented function F(x, y, λ) = f(x,y) + λ[g(x,y) – c]. Set all possible first derivatives equal to zero: ∂F/∂x = ∂F/∂y = ∂F/∂λ = 0. Solve for (x,y).

That's it. That's all you have to remember. It would fit on a tattoo: Augment / Take-Derivatives. No gradients. No vectors. You don't even have to draw a picture. The crafty part is that the derivative with respect to λ gives you back the constraint(s). Some authors don't note this and just write down the constraint. However, as Sun Tzu teaches us: good notation is the key to easy remembering. Also: always kill the male offspring of your enemies (somewhat less applicable, presently).

Footnote: "Solve for (x,y)" can peg the difficulty meter anywhere from easy to impossible depending on the problem. In general, you're facing simultaneous nonlinear equations and there's no cut-and-dry method for solving such a beast like there is for linear systems. You can only get good at it by lots of practice -- Lagrange multiplier problems will be nothing but unpleasant unless you up your game. And things tend to get uglier the more dimensions you're working in.

Bonus: Here's two sets of lecture notes that discuss Lagrange multiplier simultaneous equation solving (including practice problems): PDF-1, PDF-2. If any of the links have gone south (maybe Berkeley finally fell into the bay, you never know) be a dear and shoot me an email?

Footnote: You can also write the augmented function with a minus sign, e.g., F(x,y,λ) = f(x,y) – λ[g(x,y) – c]. If I do it one way and you do it the other way, we'll get the same answer but my λ will be the negative of yours. Also, what I call an "augmented function" some people call a "Lagrangian" but there is also a Lagrangian in physics which is sorta something different so maybe we should stick with augmented function. Also, optimization wonks instead use the term "objective function." FYI.

Footnote: As you prolly know, λ is called a Lagrange multiplier. Why? Because it was invented by Lagrange and it multiplies. Rumor has it Lagrange called λ a "topogononical factonator" in his original draft. We should give thanks someone stepped in and selected marginally-better terminology.

Footnote: You usually don't care what λ is. You just use it to get what you want. But there are applications where λ has a physical interpretation. Google "physical meaning of lagrange multiplier" for inroads.

Let's do some examples

Examples

1) Find the point on the hyperbola xy = 3 that is closest to the origin.

Minimizing the distance to the origin also minimizes the square of the distance to the origin, and we work with the latter to avoid a square root. So: minimize f(x,y) = x² + y² subject to the constraint xy = 3. Form the augmented function F(x,y,,λ) = x² + y² + λ[xy – 3]. Taking derivatives:

∂F/∂x = 2x – λy
∂F/∂y = 2y – λx
∂F/∂λ = xy – 3

Setting these equal to zero and solving simultaneously gives (x,y) = (√3, √3) or (-√3, -√3). Plugging either of those into f(x,y) gives the same result, so take your pick. (Technically, the procedure only gives critical values. You have to plug them into the given formula to determine whether you have a minimum or a maximum. In this case, f(x,y) doesn't have a maximum value (it's a distance) so the critical point(s) must be a minimum.)

Footnote: This example is from Denis Auroux's MIT lecture on iTunesU (see Postscript). Professor Auroux uses the vector explanation of Lagrange multipliers, so you can decide for yourself which approach is easier to remember.

2) Maximize f(x,y) = – [ x ln(x) + y ln(y) ] subject to x + y = 1.

Form the augmented function – x ln(x) – y ln(y) + λ[ x + y – 1]. Take derivatives:

∂F/∂x = – (1 + ln(x)) + λ
∂F/∂y = – (1 + ln(y)) + λ
∂F/∂λ = x + y – 1

Setting these equal to zero and solving, we get x = y = 0.5. Is this a max or a min? Let's test some other (x,y) pairs. Say (1,0). That satisfies the constraint, and f(1,0) = 0 < f(0.5,0.5) = 0.693. So our critical point is a max. QED.

Footnote: This example demonstrates the Maximum Entropy Principle (MAXENT) which you may see more of in future posts, considering that it's the bestest thing in the whole wide world. You can interpret x and y as the probability, respectively, of tossing heads or tails with a coin (note these must add to 1). We demonstrated that assigning equal probability to these two events maximizes the information entropy of the distribution, f(x,y). For a fair coin, MAXENT gives the same result as Laplace's Principle of Insufficient Reason which simply assigns equal probability unless you have a reason not to. However, MAXENT is a much more powerful tool. Famously, Claude Shannon used it to bring order to the chaotic cat herd of communications theory (with help from Edwin Jaynes). Check out David Applebaum's Probability and Information for a readable introduction (although maybe not the best introductory probability textbook. Confidential to David Applebaum: Dude, don't introduce measure on a Boolean algebra if you're just going to immediately jettison it in favor of power sets and ordinary integrals. It scares the bejezus out of people unnecessarily. Also: I love you man, but don't make up your own notation for conditional probability. It makes for angry reading.)

One more.

3) What is the largest volume box that can be constructed having base dimensions twice asdlfaj0230nrb...

Sorry, I just can't do it. If I see one more Lagrange multiplier exercise that involves building a biggest box or the cheapest fence or something something inside a sphere I'm gonna barf. Let's see if we can't come up with an exercise a little more relevant to 21st-century life.

3) An MQ-1 Predator drone flies a fixed circular orbit R kilometers in radius around grid location (0,0) in northeast Abzurdistan where a high-value target is known to be hiding. If the CIA estimates the target's current position at grid location (X,Y), where should the Predator drop ordinance to maximize the probability of a kill if it can only target locations along its flight path?

The unconstrained solution would be to bomb the suspected location of the target, which is (X,Y). However, we're told the drone can only target locations along its flight path (thanks, Obama). So, in Lagrange-speak, we seek to minimize (x–X)² + (y–Y)² subject to the constraint x² + y² = R² where X, Y, and R are constants. Form the augmented function (x – X)² + (y – Y)² + λ[ x² + y² – R²]. Take derivatives:

∂F/∂x = x – X + λx
∂F/∂y = y – Y + λy
∂F/∂λ = x² + y² – R²

Setting these equal to zero, we find x = X / (1+λ) and y = Y / (1+λ) (you have to check λ = –1 as a separate case, which corresponds to (X,Y) = (0,0)). Letting d = X² + Y² we have λ = –1 + √ d/R² . Taking the positive and negative square root gives two opposite points on a diameter of the flight path; the one closer to (X,Y) is the target. Depending on the location of (X,Y), the ordinance blast radius, and the strength of the intel, we might blow up nothing; all we're doing is obtaining the highest probability of mission success.

Admittedly, using Lagrange multipliers here was overkill (no pun intended).You can solve this problem by drawing a line from (0,0) to (X,Y) and target where that line crosses the flight path. However, it's easy to sexy this up: consider instead some pdf f(x,y) describing the target's probable location in the countryside and seek the maximum of f(x,y) in reach of the drone. No doubt there is at this moment some CIA wonk sitting in a cubical at RAND doing this very thing, and much more besides.

Postscript

Okay, I watched Denis Auroux's Lagrange multiplier lecture on iTunes and he crafts the vector thing into a reasonably lucid explanation. So I'm willing to amend my ranting as follows: as far as deriving or drawing the method, the vector thing is acceptable. As for remembering or applying it, we do things my way. N'est-ce pas?

Wednesday, April 30, 2014

Lagrange Multipliers

2 comments: