In: Statistics and Probability
Suppose we wish to study the effect of education on an individual’s hourly wage using a sample of individuals. For each individual i in our sample, let wi denote hourly wage, let ei denote years of post-high school education, let si denote sex (suppose si = 1 for females and si = 0 for males). Consider estimating the relationship:
wi = α + βei + ei
where α and β are unobservable population parameters and i is the component of wages not attributable to education (i.e. the error term).
a) Describe intuitively how we might approach the problem of figuring out α and β.
b) Denote by ˆα and βˆ possible estimates of α and β. Then we can write:wi = ˆα + βeˆ i + eˆi . wˆi = ˆα + βeˆ i
where ˆi is the residual and ˆwi is the fitted/predicted value, both based on ˆα and βˆ. Set up an appropriate optimization problem from which we can derive optimal choices of ˆα and βˆ.
c) Show that βˆ∗ ≡ Cov(wi,ei)/ V ar(ei) is an unbiased estimator of β only if strict exogeneity holds.
(a). A simple way to approach the problem is to estimate the parameters and using the data from a sample which is the representation of the population in question. These estimates of and will stand as the representation of the actual values of and . The value of the estimates will be more close to the actual value of the parameters, if we take the sample such that it is a better representation of the population.
(b). The appropriate optimisation problem to derive the optimal choices of and is the Least Squares method.
Here, we have the residuals at each data point, which is equal to,
We then sum up the values of all the residuals to get :
Here, E is a function of the parameters and , since, observe that the predicted values are obtained by using the values of and .
We now need to find the values of and for which E is minimum. The condition for E to be minimum is given by:
and
We solve for the above two conditions, to get the desired value of and .
This will be the optimal solution after optimization.
(c).