Saturday, May 6, 2017

Crux Move #11: Stupid Regression Tricks


A CRUX MOVE is a math trick. One might be all that stands between you and solving your problem. Keep a collection of them at your fingertips and your mathematical life will get easier.

Suppose you have / are in the process of fitting a least squares line to an (x,y) cloud of data points. The following truths will often come in handy:

#11.1   y-bar = α-hat + β-hat ⋅ x-bar

That is, the regression line passes thought the mean of the data. Here, x-bar and y-bar are, respectively, the x and y mean of the data, and α-hat and β-hat are, respectively, the y-intercept and slope of the fitted line.

#11.2   Let ei = y-hati - yi. We have:
  Σ ei = Σ xi ⋅ ei = Σ y-hati ⋅ ei = 0

That is, the sum of the errors, the sum of the errors weighted by xi, or the sum of errors weighted by the fitted values of y equals zero (all sums are over i and go from 1 to n, where n is the number of data points). Here, xi and yi are the x and y data, and y-hati are the predicted values of y for a given value of x (i.e., the values of y that lie on the regression line). Ergo, ei is the error in data point i.

#11.3   Σ y-hati = Σ yi

That is, the sum of the fitted y values is the same as the sum of the y data.

I tackle a few applications (and explain my odd *-hat notation) after the jump.


crux move keys
Crux Move #11
Stupid Regression Tricks



E X A M P L E   1
Find α-hat.

Finding the y-intercept of the regression line is the whole raison d'être of Crux Move 11.1. First, the slope β-hat is obtained from the Normal Equation: β-hat = SSxy / SSxx where SSxy = Σ (xi - x-bar)(yi - y-bar) and SSxx = Σ (xi - x-bar)2. We then find the intercept from y-bar = α-hat + β-hat ⋅ x-bar:

    α-hat = y-bar − β-hat ⋅ x-bar

Easy peasy.

Footnote: I use the notation *-hat to represent estimates of population parameters (e.g., I write β-hat for the regression slope, which, as you may recall, is really just an approximation of the population parameter β). Usually such quantities are written with a literal hat (^) drawn on top of the letter, but I couldn't figure out the HTML to do that reliably because I am dum. Ditto for x-bar and y-bar, regarding putting bars over things.


E X A M P L E   2
Define three quantities:

    SST = Σ (yi - y-bar)2
    SSR = Σ (y-hati - y-bar)2
    SSE = Σ (yi - y-hati)2

In words, the total sum of squares (SST), the regression sum of squares (SSR), and the error sum of squares (SSE). A famous and useful expression describes the relation of these three quantities:

    SST = SSR + SSE

Let's derive this. We begin with the weird (which is itself a crux move):

    (yi - y-bar) = (yi - y-bar)

Add and subtract a y-hat from the RHS (which is itself a crux move):

    (yi - y-bar) = (yi - y-bar) + y-hati - y-hati

Rearrange:

    (yi - y-bar) = (y-hati - y-bar) + (yi - y-hati)

Now square both sides and sum:

    Σ (yi - y-bar)2 = Σ [ (y-hati - y-bar)2
               + (yi - y-hati)2
                     + 2 (y-hati - y-bar) (yi - y-hati) ]

or:

    Σ (yi - y-bar)2 = Σ (y-hati - y-bar)2
               + Σ (yi - y-hati)2
                     + 2 Σ (y-hati - y-bar) (yi - y-hati)

that is:

    SST = SSR + SSE + 2 Σ (y-hati - y-bar) (yi - y-hati)

If we can show the third term of the RHS is equal to zero, we are done. We have:

    2 Σ (y-hati - y-bar) (yi - y-hati)
         = 2 Σ y-hati (yi - y-hati) − 2 ⋅ y-bar Σ (yi - y-hati)
             = 2 Σ y-hati ⋅ ei − 2 ⋅ y-bar Σ ei

But by Crux Move 11.2, both sums in this expression are zero. Ergo, we are done.

Footnote: The error sum of squares is aka the residual sum of squares, but I needed a name that didn't begin with the letter R because we already had an SSR. So I used "error," hence SSE.


E X A M P L E   3
Derive the useful identity: SSR = β-hat ⋅ SSxy

Begin with the definition:

    SSR = Σ (y-hati - y-bar)2

Substitute for y-hat and y-bar:

    SSR = Σ ( [ α-hat + β-hat ⋅ xi ] − [ α-hat + β-hat ⋅ x_bar] )2
             = Σ (β-hat ⋅ xi − β-hat ⋅ x-bar)2
             = Σ β-hat2 (xi - x-bar)2
             = β-hat2 Σ (xi - x-bar)2
             = β-hat2 ⋅ SSxx
             = β-hat ⋅ β-hat ⋅ SSxx
             = β-hat (SSxy / SSxx ) SSxx
             = β-hat ⋅ SSxy

QED. In the next-to-last step, we deployed the definition of β-hat from the Normal Equation (β-hat = SSxy / SSxx).

Footnote: I find an easy (well, easier) way to remember this expression is: SSR is proportional to SSxy and the constant of proportionality happens to be β-hat. Interestingly, the Normal Equation tells us SSxy is itself proportional to SSxx. The constant of proportionality is, again, β-hat (i.e., SSxy = β-hat ⋅ SSxx). Crikey, is there anything β-hat can't do? It's like the Swiss army knife of regression.

Previous Move: Condition On

No comments:

Post a Comment