In: Statistics and Probability
What are some ways to identify that a multiple regression model is needed (versus only one independent variable)? Provide an example of a situation that would require multiple regression.
Solution:
A polynomial term–a quadratic (squared) or cubic (cubed) term turns a linear regression model into a curve. But because it is X that is squared or cubed, not the Beta coefficient, it still qualifies as a linear model. This makes it a nice, straighforward way to model curves without having to model complicated non-linear models. But how do you know if you need one–when a linear model isn’t the best model? Well, first, a quadratic term creates a curve with one “hump” a U or inverted U shape. The curve does not need to contain both sides of the U. In can contain just part of it. A cubic has two humps–one facing upward and the other down. The curve goes down, back up, then back down again (or vice-versa). There are three main situations that indicate a linear relationship may not be a good model. 1. Most important is the theoretical one. There are some relationships that a researcher will hypothesize is curvilinear. Clearly, if this is the case, include a polynomial term. 2. The second chance is during visual inspection of your variables. This is one of those reasons for always doing univariate and bivariate inspections of your data before you begin your regression analyses. A simple scatter plot can reveal a curvilinear relationship. 3. Inspection of residuals. If you try to fit a linear model to curved data, a scatterplot of residuals (Y axis) on the predictor (X axis) will have patches of many positive residuals in the middle, but patches of negative residuals at either end (or vice versa). This is a good sign that a linear model is not appropriate, and a polynomial may do better. When we have a model with multiple different covariates, each beta [term] can generally be afforded its own interpretation. For example, if: GPAˆcollege=β0+β1GPAhighschool+β2class rank+β3SAT, |
then we can assign separate interpretations to each beta/term. For instance, if a student's high school GPA were 1 point higher--all else being equal--we would expect their college GPA to be β1
points higher.
It is important to note, however, that it is not always permissible to interpret a model in this manner. One obvious case is when there is an interaction amongst some of the variables, as it would not be possible for the individual term to differ and still have all else held constant--of necessity, the interaction term would change as well. Thus, when there is an interaction, we do not interpret main effects but only simple effects, as is well understood.
The situation with power terms is directly analogous, but unfortunately, does not seem to be widely understood. Consider the following model:
y^=β0+β1x+β2x2
(In this situation, x is intended to represent a prototypical continuous covariate.) It is not possible for x to change without x2 changing also, and vice versa. Simply put, when there are polynomial terms in a model, the various terms based on the same underlying covariate are not afforded separate interpretations. Thex2 (x, x17, etc.) term does not have any independent meaning. The fact that a p-power polynomial term is 'significant' in a model indicates that there are p−1 'bends' in the function relating x and y. It is unfortunate, but unavoidable, that when curvature exists, the interpretation becomes more complicated, and possibly less intuitive. To assess the change in y^ as x changes, we will have to use calculus. The derivative of the above model is:
dydx=β1+2β2x
which is the instantaneous rate of change in the expected value of y as x changes, all else being equal. This is not so clean as the interpretation of the very top model; importantly, the instantaneous rate of change in y depends on the level of x from which the change is assessed. Furthermore, the rate of change in y is an instantaneous rate; that is, it is itself continuously changing throughout the interval from xold toxnew. This is simply the nature of a curvilinear relationshi