A CRUX MOVE is a math trick. One might be all that stands between you and solving your problem. Keep a collection of them at your fingertips and your mathematical life will get easier.
Suppose you have / are in the process of fitting a least squares line to an (x,y) cloud of data points. The following truths will often come in handy:
#11.1 y-bar = α-hat + β-hat ⋅ x-bar
That is, the regression line passes thought the mean of the data. Here, x-bar and y-bar are, respectively, the x and y mean of the data, and α-hat and β-hat are, respectively, the y-intercept and slope of the fitted line.
#11.2 Let ei = y-hati - yi. We have:
Σ ei = Σ xi ⋅ ei = Σ y-hati ⋅ ei = 0
That is, the sum of the errors, the sum of the errors weighted by xi, or the sum of errors weighted by the fitted values of y equals zero (all sums are over i and go from 1 to n, where n is the number of data points). Here, xi and yi are the x and y data, and y-hati are the predicted values of y for a given value of x (i.e., the values of y that lie on the regression line). Ergo, ei is the error in data point i.
#11.3 Σ y-hati = Σ yi
That is, the sum of the fitted y values is the same as the sum of the y data.
I tackle a few applications (and explain my odd *-hat notation) after the jump.
E X A M P L E 1
Find α-hat.
Finding the y-intercept of the regression line is the whole raison d'être of Crux Move 11.1. First, the slope β-hat is obtained from the Normal Equation: β-hat = SSxy / SSxx where SSxy = Σ (xi - x-bar)(yi - y-bar) and SSxx = Σ (xi - x-bar)2. We then find the intercept from y-bar = α-hat + β-hat ⋅ x-bar:
α-hat = y-bar − β-hat ⋅ x-bar
Easy peasy.
Footnote: I use the notation *-hat to represent estimates of population parameters (e.g., I write β-hat for the regression slope, which, as you may recall, is really just an approximation of the population parameter β). Usually such quantities are written with a literal hat (^) drawn on top of the letter, but I couldn't figure out the HTML to do that reliably because I am dum. Ditto for x-bar and y-bar, regarding putting bars over things.
E X A M P L E 2
Define three quantities:
SST = Σ (yi - y-bar)2
SSR = Σ (y-hati - y-bar)2
SSE = Σ (yi - y-hati)2
In words, the total sum of squares (SST), the regression sum of squares (SSR), and the error sum of squares (SSE). A famous and useful expression describes the relation of these three quantities:
SST = SSR + SSE
Let's derive this. We begin with the weird (which is itself a crux move):
(yi - y-bar) = (yi - y-bar)
Add and subtract a y-hat from the RHS (which is itself a crux move):
(yi - y-bar) = (yi - y-bar) + y-hati - y-hati
Rearrange:
(yi - y-bar) = (y-hati - y-bar) + (yi - y-hati)
Now square both sides and sum:
Σ (yi - y-bar)2 = Σ [ (y-hati - y-bar)2
+ (yi - y-hati)2
+ 2 (y-hati - y-bar) (yi - y-hati) ]
or:
Σ (yi - y-bar)2 = Σ (y-hati - y-bar)2
+ Σ (yi - y-hati)2
+ 2 Σ (y-hati - y-bar) (yi - y-hati)
that is:
SST = SSR + SSE + 2 Σ (y-hati - y-bar) (yi - y-hati)
If we can show the third term of the RHS is equal to zero, we are done. We have:
2 Σ (y-hati - y-bar) (yi - y-hati)
= 2 Σ y-hati (yi - y-hati) − 2 ⋅ y-bar Σ (yi - y-hati)
= 2 Σ y-hati ⋅ ei − 2 ⋅ y-bar Σ ei
But by Crux Move 11.2, both sums in this expression are zero. Ergo, we are done.
Footnote: The error sum of squares is aka the residual sum of squares, but I needed a name that didn't begin with the letter R because we already had an SSR. So I used "error," hence SSE.
E X A M P L E 3
Derive the useful identity: SSR = β-hat ⋅ SSxy
Begin with the definition:
SSR = Σ (y-hati - y-bar)2
Substitute for y-hat and y-bar:
SSR = Σ ( [ α-hat + β-hat ⋅ xi ] − [ α-hat + β-hat ⋅ x_bar] )2
= Σ (β-hat ⋅ xi − β-hat ⋅ x-bar)2
= Σ β-hat2 (xi - x-bar)2
= β-hat2 Σ (xi - x-bar)2
= β-hat2 ⋅ SSxx
= β-hat ⋅ β-hat ⋅ SSxx
= β-hat (SSxy / SSxx ) SSxx
= β-hat ⋅ SSxy
QED. In the next-to-last step, we deployed the definition of β-hat from the Normal Equation (β-hat = SSxy / SSxx).
Footnote: I find an easy (well, easier) way to remember this expression is: SSR is proportional to SSxy and the constant of proportionality happens to be β-hat. Interestingly, the Normal Equation tells us SSxy is itself proportional to SSxx. The constant of proportionality is, again, β-hat (i.e., SSxy = β-hat ⋅ SSxx). Crikey, is there anything β-hat can't do? It's like the Swiss army knife of regression.
Previous Move: Condition On
No comments:
Post a Comment