In: Economics
Suppose that
Yi=?0+?1Xi+ui and that E[ui|Xi] = 0 and therefore OLS is an unbiased estimator.
a) Show that Zi=Xi is a valid instrument for Xi , i.e. it is both relevant and exogenous.
b) Show that the 2SLS estimator of ?1 using Xi as an instrument for Xi is exactly equal to the OLS estimator of ?1
c) Let Zi=X2i and assume Xi is normally distributed N(?,?²). Is Zi exogenous? Is Zi relevant? Explain how the answer to these questions depends on the values of (?,?²)
Three important threats to internal validity are:
All three problems result in E(u|X)
0.
Instrumental variables regression can eliminate bias when E(u|X)
0 –
using an instrumental variable (IV), Z
Yi = 0
+ 1Xi
+ ui
An endogenous variable is one that is correlated with u
An exogenous variable is one that is uncorrelated with u
In IV regression, we focus on the case that X is endogenous and
there is an instrument, Z, which is exogenous.
Digression on terminology: “Endogenous” literally means
“determined within the system.” If X is jointly determined with Y,
then a regression of Y on X is subject to simultaneous causality
bias. But this definition of endogeneity is too narrow because IV
regression can be used to address OV bias and errors-in-variable
bias. Thus we use the broader definition of endogeneity
above.
Two Conditions for a Valid Instrument
Yi = 0
+ 1Xi
+ ui
For an instrumental variable (an “instrument”) Z to be valid, it
must satisfy two conditions:
1. Instrument relevance: corr(Zi,Xi)
0
2. Instrument exogeneity: corr(Zi,ui) =
0
Suppose for now that you have such a Zi
The IV estimator with one X and one Z
Explanation #1: Two Stage Least Squares (TSLS)
As it sounds, TSLS has two stages – two regressions:
(1) Isolate the part of X that is uncorrelated with u by regressing
X on Z using OLS:
Xi = 0
+ 1Zi
+ vi (1)
Two Stage Least Squares, ctd.
(2) Replace Xi by Xˆi in the
regression of interest:
regress Y on Xˆi using OLS:
Yi = 0
+ 1Xˆi
+ ui (2)
Two Stage Least Squares: Summary
Suppose Zi, satisfies the two conditions for a valid
instrument:
1. Instrument relevance: corr(Zi,Xi)
0
2. Instrument exogeneity: corr(Zi,ui) =
0
Two-stage least squares:
Stage 1: Regress Xi on Zi (including an
intercept), obtain the predicted values
Xˆi
Stage 2: Regress Yi on Xˆi
(including an intercept); the coefficient on
Xˆi is the TSLS estimator, ˆTSLS1
ˆTSLS1
is a consistent estimator of ?1.
The IV Estimator, one X and one Z, ctd.
Explanation #2: A direct algebraic derivation
Yi = 0
+ 1Xi
+ ui
Thus,
cov(Yi, Zi) = cov(0
+ 1Xi
+ ui, Zi)
= cov(0,
Zi) + cov(1Xi,
Zi) + cov(ui, Zi)
= 0 + cov(1Xi,
Zi) + 0
= 1cov(Xi,
Zi)
where cov(ui, Zi) = 0 by instrument
exogeneity; thus
1
= cov(Yi , Zi)/cov(Xi ,
Zi)
The IV Estimator, one X and one Z, ctd.
1
= cov(Yi , Zi)/cov(Xi ,
Zi)
The IV estimator replaces these population covariances with sample
covariances:
ˆTSLS1
= SYZ/SXZ
sYZ and sXZ are the sample covariances. This
is the TSLS estimator – just a different derivation!