In: Statistics and Probability
The classical linear regression model (CLRM) assumptions, are the following:
1.The model parameters are linear, meaning the regression coefficients don’t enter the function being estimated as exponents (although the variables can have exponents).
2.The values for the independent variables are derived from a random sample of the population, and they contain variability.
3.The explanatory variables don’t have perfect collinearity (that is, no independent variable can be expressed as a linear function of any other independent variables).
4.The error term has zero conditional mean, meaning that the average error is zero at any specific value of the independent variable(s).
5.The model has no heteroskedasticity (meaning the variance of the error is the same regardless of the independent variable’s value).
6.The model has no autocorrelation (the error term doesn’t exhibit a systematic relationship over time).
Violation of the classical assumptions one by one
Assumption 1: X fixed in repeated samples. Since we cannot usually
control
X by experiments we have to say our results are "conditional on
X."
There are no identifiable biases associated with the failure of
this assumption, which seems to be mainly handy for doing the
mathematical proof of the GaussMarkov theorem.
Assumption 2:
E(ut) = 0
This is more of convenient simplification. The only thing that
happens if
E(ut) = f is that the estimate of a is biased with expectation a +
f instead of a.
Assumption 3: Independence of disturbance terms.
The consequences of this violation are:
1. OLS is no longer the most efficient estimator
2. Standard errors are no longer unbiased so hypothesis tests may
be invalid.
Assumption 4: No heteroskedastic disturbances.
The consequences are similar to autocorrelation:
1. OLS is no longer the most e¢ cient estimator
2. Standard errors are no longer unbiased so hypothesis tests may
be invalid.
Assumption 5: Errors uncorrelated with the regressors.
In some ways this is more serious, since, unlike 3 and 4 this
results in bias in the parameter estimates.
So we may get bias because
1. The measurement error in Y is correlated with X
2. The excluded factors may be correlated with X
3. The "errors" in X are correlated with u:
In addition, if there are other causal relationships so that X =
f(Y ) we may get correlation between X and u: This is known as
simultaneous equations bias.