In: Statistics and Probability
QUESTION 1
(a) Using words and/or equations, briefly explain the following concepts.
(i) Degrees of freedom
(ii) The difference between the stochastic error term and the
residual. Make sure that you define both terms, state how they are
similar, state how they are different and provide examples of an
equation with a stochastic error term and one that contains a
residual.
(b) What are the major consequences of including an irrelevant
variable in a regression equation?
(c) You are a labour economist and wish to explain salaries of
various workers. In the process, you omit an important variable
“experience”. What is the sign of the bias on the coefficient of
the included variable “age”?
(d) Do you think that unbiased estimates are always better than
biased ones? Why or why not?
a)
i)Degrees of fredom:
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called number of degrees of freedom.
ii)Difference between stochastic error term and residual
stochastic error term | residual |
It is the differnce between the true value of the observation and its population mean. | It is the differnce between the true value of the observation and its sample mean. |
since the value of stochastic error is based on the population mean, it s obsrvable. | since the value of residual is based on the sample mean, it is difficult to observe its value |
the sum of error term is not necessarily 0. | the sum of residual is always 0. |
b) Sometimes due to enthusiasm and to make the model more realistic, the analyst may include some explanatory variables that are not very relevant to the model. Such variables may contribute very little to the explanatory power of the model. This may tend to reduce the degrees of freedom ( ) n k , and consequently, the validity of inference drawn may be questionable. For example, the value of the coefficient of determination will increase, indicating that the model is getting better, which may not really be true.
Let the true model be Y=Xβ+ε, E(ε)=0, V(ε)=σ2I
which comprise k explanatory variable. Suppose now r additional explanatory variables are added to the model and resulting model becomes
Y=Xβ+Zγ +δ
where Z is a n*r matrix of n observations on each of the r explanatory variables and γ is r *1 vector of regression coefficient associated with Z and δ is disturbance term. This model is termed as a false model.
d) Yes, We look at mean square error rather than variance However, there are situations where you would choose a biased estimator over an unbiased one even if they have the same variability. It depends on how the estimator is used and what the costs/values of the different types of errors are.