LabKitty: Crux Move #11: Stupid Regression Tricks

A CRUX MOVE is a math trick. One might be all that stands between you and solving your problem. Keep a collection of them at your fingertips and your mathematical life will get easier.

Suppose you have / are in the process of fitting a least squares line to an (x,y) cloud of data points. The following truths will often come in handy:

#11.1 y-bar = α-hat + β-hat ⋅ x-bar

That is, the regression line passes thought the mean of the data. Here, x-bar and y-bar are, respectively, the x and y mean of the data, and α-hat and β-hat are, respectively, the y-intercept and slope of the fitted line.

#11.2 Let e_i = y-hat_i - y_i. We have:
Σ e_i = Σ x_i ⋅ e_i = Σ y-hat_i ⋅ e_i = 0

That is, the sum of the errors, the sum of the errors weighted by x_i, or the sum of errors weighted by the fitted values of y equals zero (all sums are over i and go from 1 to n, where n is the number of data points). Here, x_i and y_i are the x and y data, and y-hat_i are the predicted values of y for a given value of x (i.e., the values of y that lie on the regression line). Ergo, e_i is the error in data point i.

#11.3 Σ y-hat_i = Σ y_i

That is, the sum of the fitted y values is the same as the sum of the y data.

I tackle a few applications (and explain my odd *-hat notation) after the jump.

Crux Move #11
Stupid Regression Tricks

E X A M P L E 1

Find α-hat.

Finding the y-intercept of the regression line is the whole raison d'être of Crux Move 11.1. First, the slope β-hat is obtained from the Normal Equation: β-hat = SS_xy / SS_xx where SS_xy = Σ (x_i - x-bar)(y_i - y-bar) and SS_xx = Σ (x_i - x-bar)². We then find the intercept from y-bar = α-hat + β-hat ⋅ x-bar:

α-hat = y-bar − β-hat ⋅ x-bar

Easy peasy.

Footnote: I use the notation *-hat to represent estimates of population parameters (e.g., I write β-hat for the regression slope, which, as you may recall, is really just an approximation of the population parameter β). Usually such quantities are written with a literal hat (^) drawn on top of the letter, but I couldn't figure out the HTML to do that reliably because I am dum. Ditto for x-bar and y-bar, regarding putting bars over things.

E X A M P L E 2

Define three quantities:

    SST = Σ (y_i - y-bar)²
    SSR = Σ (y-hat_i - y-bar)²
    SSE = Σ (y_i - y-hat_i)²

In words, the total sum of squares (SST), the regression sum of squares (SSR), and the error sum of squares (SSE). A famous and useful expression describes the relation of these three quantities:

    SST = SSR + SSE

Let's derive this. We begin with the weird (which is itself a crux move):

    (y_i - y-bar) = (y_i - y-bar)

Add and subtract a y-hat from the RHS (which is itself a crux move):

    (y_i - y-bar) = (y_i - y-bar) + y-hat_i - y-hat_i

Rearrange:

    (y_i - y-bar) = (y-hat_i - y-bar) + (y_i - y-hat_i)

Now square both sides and sum:

    Σ (y_i - y-bar)² = Σ [ (y-hat_i - y-bar)²
               + (y_i - y-hat_i)²
                     + 2 (y-hat_i - y-bar) (y_i - y-hat_i) ]

or:

    Σ (y_i - y-bar)² = Σ (y-hat_i - y-bar)²
               + Σ (y_i - y-hat_i)²
                     + 2 Σ (y-hat_i - y-bar) (y_i - y-hat_i)

that is:

    SST = SSR + SSE + 2 Σ (y-hat_i - y-bar) (y_i - y-hat_i)

If we can show the third term of the RHS is equal to zero, we are done. We have:

    2 Σ (y-hat_i - y-bar) (y_i - y-hat_i)
         = 2 Σ y-hat_i (y_i - y-hat_i) − 2 ⋅ y-bar Σ (y_i - y-hat_i)
             = 2 Σ y-hat_i ⋅ e_i − 2 ⋅ y-bar Σ e_i

But by Crux Move 11.2, both sums in this expression are zero. Ergo, we are done.

Footnote: The error sum of squares is aka the residual sum of squares, but I needed a name that didn't begin with the letter R because we already had an SSR. So I used "error," hence SSE.

E X A M P L E 3

Derive the useful identity: SSR = β-hat ⋅ SS_xy

Begin with the definition:

    SSR = Σ (y-hat_i - y-bar)²

Substitute for y-hat and y-bar:

    SSR = Σ ( [ α-hat + β-hat ⋅ x_i ] − [ α-hat + β-hat ⋅ x_bar] )²
             = Σ (β-hat ⋅ x_i − β-hat ⋅ x-bar)²
             = Σ β-hat² (x_i - x-bar)²
             = β-hat² Σ (x_i - x-bar)²
             = β-hat² ⋅ SS_xx
             = β-hat ⋅ β-hat ⋅ SS_xx
             = β-hat (SS_xy / SS_xx ) SS_xx
             = β-hat ⋅ SS_xy

QED. In the next-to-last step, we deployed the definition of β-hat from the Normal Equation (β-hat = SS_xy / SS_xx).

Footnote: I find an easy (well, easier) way to remember this expression is: SSR is proportional to SS_xy and the constant of proportionality happens to be β-hat. Interestingly, the Normal Equation tells us SS_xy is itself proportional to SS_xx. The constant of proportionality is, again, β-hat (i.e., SS_xy = β-hat ⋅ SS_xx). Crikey, is there anything β-hat can't do? It's like the Swiss army knife of regression.

Previous Move: Condition On

Saturday, May 6, 2017

Crux Move #11: Stupid Regression Tricks

No comments:

Post a Comment