In: Statistics and Probability
Suppose we wish to study the effect of education on an individual’s hourly wage using a sample of individuals. For each individual i in our sample, let wi denote hourly wage, let ei denote years of post-high school education, let si denote sex (suppose si = 1 for females and si = 0 for males). Consider estimating the relationship:
wi = α + βei + ei
where α and β are unobservable population parameters and i is the component of wages not attributable to education (i.e. the error term).
a) Describe intuitively how we might approach the problem of figuring out α and β.
b) Denote by ˆα and βˆ possible estimates of α and β. Then we can write:wi = ˆα + βeˆ i + eˆi . wˆi = ˆα + βeˆ i
where ˆi is the residual and ˆwi is the fitted/predicted value, both based on ˆα and βˆ. Set up an appropriate optimization problem from which we can derive optimal choices of ˆα and βˆ.
c) Show that βˆ∗ ≡ Cov(wi,ei)/ V ar(ei) is an unbiased estimator of β only if strict exogeneity holds.
(a). A simple way to approach the problem is to estimate the
parameters
and
using the data from a sample which is the representation of the
population in question. These estimates of
and
will stand as the representation of the actual values
of
and
. The value of the estimates will be more close to the actual value
of the parameters, if we take the sample such that it is a better
representation of the population.
(b). The appropriate optimisation problem to derive the optimal
choices of
and
is the Least Squares method.
Here, we have the residuals at each data point, which is equal to,
We then sum up the values of all the residuals to get :
Here, E is a function of the parameters
and
, since, observe that the predicted values are obtained by using
the values of
and
.
We now need to find the values of
and
for which E is minimum. The condition for E to be minimum is given
by:
We solve for the above two conditions, to get the desired value
of
and
.
This will be the optimal solution after optimization.