In: Economics
How do we decide that variables to include on the right hand side? List 4 important
1) Junk variables may be statistically significant due to random
chance. Suppose your acceptable
significance level is .05. If you introduce ten junk variables,
there is about a 40 percent chance that at least one will be
significant, just due to random chance.
If you don’t know what is junk and what is not, you will often find
yourself claiming that junk variables really matter. If someone
tries to reproduce your findings using different data, they will
usually be unable to reproduce your junk result. Your shoddy
methods are then exposed for all to see.
2) A junk variable that is correlated with another valid
predictor may, by sheer luck, also have a
strong correlation with the LHS variable. This could make the valid
predictor appear
insignificant, and you may toss it out of the model. (This is
related to multicollinearity, which I
will discuss later in this note.) The bigger the kitchen sink, the
better the chance that this
happens. The bottom line: when you add junk variables, “stuff”
happens. “Stuff” is not good.
3) Adding some variables to your model can affect how you interpret
the coefficients on others.
This occurs when one RHS variable is, itself, a function of
another. This is not as serious a
problem as (1) and (2), but does require you to be careful when you
describe your findings. The
next section shows you how the interpretation of one variable can
change as you add others.