In: Economics
TRUE/FALSE and explain Homoskedasticity of the error term is crucially needed to show that the OLS estimators are unbiased.
Homoskedastic
What Is Homoskedastic?
Homoskedastic (also spelled "homoscedastic") refers to a condition in which the variance of the residual, or error term, in a regression model is constant. That is, the error term does not vary much as the value of the predictor variable changes. However, the lack of homoskedasticity may suggest that the regression model may need to include additional predictor variables to explain the performance of the dependent variable.
How Homoskedastic Works
Homoskedasticity is one assumption of linear regression modeling. If the variance of the errors around the regression line varies much, the regression model may be poorly defined. The opposite of homoskedasticity is heteroskedasticity just as the opposite of "homogenous" is "heterogeneous." Heteroskedasticity (also spelled “heteroscedasticity”) refers to a condition in which the variance of the error term in a regression equation is not constant.
"When considering that variance is the measured difference between the predicted outcome and the actual outcome of a given situation, determining homoskedasticity can help to determine which factors need to be adjusted for accuracy."
Special Considerations
A simple regression model, or equation, consists of four terms. On the left side is the dependent variable. It represents the phenomenon the model seeks to "explain." On the right side are a constant, a predictor variable, and a residual, or error, term. The error term shows the amount of variability in the dependent variable that is not explained by the predictor variable.
Example of Homoskedastic
For example, suppose you wanted to explain student test scores using the amount of time each student spent studying. In this case, the test scores would be the dependent variable and the time spent studying would be the predictor variable.
The error term would show the amount of variance in the test scores that was not explained by the amount of time studying. If that variance is uniform, or homoskedastic, then that would suggest the model may be an adequate explanation for test performance—explaining it in terms of time spent studying.
But the variance may be heteroskedastic. A plot of the error term data may show a large amount of study time corresponded very closely with high test scores but that low study time test scores varied widely and even included some very high scores. So the variance of scores would not be well-explained simply by one predictor variable—the amount of time studying. In this case, some other factor is probably at work, and the model may need to be enhanced in order to identify it or them.
Further investigation may reveal that some students had seen the answers to the test ahead of time or that they had previously taken a similar test, and therefore didn't need to study for this particular test.
To improve on the regression model, the researcher would, therefore, add another explanatory variable indicating whether a student had seen the answers prior to the test. The regression model would then have two explanatory variables: time studying, and whether the student had prior knowledge of the answers. With these two variables, more of the variance of the test scores would be explained and the variance of the error term might then be homoskedastic, suggesting that the model was well-defined.
How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity:
Var (u|X) = σ 2 .
This means that the variance of the error term u is the same, regardless of the predictor variable X. If assumption (23) is violated, e.g. if Var (u|X) = σ 2h(X), then we say the error term is heteroskedastic.
• Assumption (23) certainly holds, if u and X are assumed to be independent. However, (23) is a weaker assumption.
• Assumption (23) implies that σ 2 is also the unconditional variance of u, referred to as error variance: Var (u) = E(u 2 ) − (E(u))2 = σ 2 . Its square root σ is the standard deviation of the error.
• It follows that Var (Y |X) = σ 2 .
Variance of the OLS estimator
How large is the variation of the OLS estimator around the true parameter?
• Difference βˆ 1 − β1 is 0 on average
• Measure the variation of the OLS estimator around the true parameter through the expected squared difference, i.e. the variance: Var ( βˆ 1 ) = E((βˆ 1 − β1) 2 ) (24)
• Similarly for βˆ 0: Var ( βˆ 0 ) = E((βˆ 0 − β0) 2 ).
Variance of the slope estimator βˆ 1 follows from (22):
Var ( βˆ 1 ) = 1 N2(s 2 x ) 2 ∑ N i
=1 (xi − x) 2Var (ui)
= σ 2 N2(s 2 x ) 2 ∑ N i
=1 (xi − x) 2 = σ 2 Ns2 x
The variance of the slope estimator is the larger, the smaller the number of observations N (or the smaller, the larger N). • Increasing N by a factor of 4 reduces the variance by a factor of 1/4.
Dependence on the error variance σ 2 :
• The variance of the slope estimator is the larger, the larger the error variance σ 2 . Dependence on the design, i.e. the predictor variable X:
• The variance of the slope estimator is the larger, the smaller the variation in X, measured by s 2 x .