Question

In: Economics

Suppose that Yi=?0+?1Xi+ui and that E[ui|Xi] = 0 and therefore OLS is an unbiased estimator. a)...

Suppose that

Yi=?0+?1Xi+ui and that E[ui|Xi] = 0 and therefore OLS is an unbiased estimator.

a) Show that Zi=Xi is a valid instrument for Xi , i.e. it is both relevant and exogenous.

b) Show that the 2SLS estimator of ?1 using Xi as an instrument for Xi is exactly equal to the OLS estimator of ?1

c) Let Zi=X2i and assume Xi is normally distributed N(?,?²). Is Zi exogenous? Is Zi relevant? Explain how the answer to these questions depends on the values of (?,?²)

Solutions

Expert Solution

Three important threats to internal validity are:

  • Omitted variable bias from a variable that is correlated with X but is unobserved (so cannot be included in the regression) and for which there are inadequate control variables;
  • Simultaneous causality bias (X causes Y, Y causes X);
  • Errors-in-variables bias (X is measured with error)

All three problems result in E(u|X) 0.
Instrumental variables regression can eliminate bias when E(u|X) 0 – using an instrumental variable (IV), Z
Yi = 0 + 1Xi + ui

  • IV regression breaks X into two parts: a part that might be correlated with u, and a part that is not. By isolating the part that is not correlated with u, it is possible to estimate 1.
  • This is done using an instrumental variable, Zi, which is correlated with Xi but uncorrelated with ui.

An endogenous variable is one that is correlated with u
An exogenous variable is one that is uncorrelated with u
In IV regression, we focus on the case that X is endogenous and there is an instrument, Z, which is exogenous.

Digression on terminology: “Endogenous” literally means “determined within the system.” If X is jointly determined with Y, then a regression of Y on X is subject to simultaneous causality bias. But this definition of endogeneity is too narrow because IV regression can be used to address OV bias and errors-in-variable bias. Thus we use the broader definition of endogeneity above.
Two Conditions for a Valid Instrument
Yi = 0 + 1Xi + ui
For an instrumental variable (an “instrument”) Z to be valid, it must satisfy two conditions:
1. Instrument relevance: corr(Zi,Xi) 0
2. Instrument exogeneity: corr(Zi,ui) = 0
Suppose for now that you have such a Zi
The IV estimator with one X and one Z
Explanation #1: Two Stage Least Squares (TSLS)
As it sounds, TSLS has two stages – two regressions:
(1) Isolate the part of X that is uncorrelated with u by regressing X on Z using OLS:
Xi = 0 + 1Zi + vi (1)

  • Because Zi is uncorrelated with ui, 0 + 1Zi is uncorrelated with ui. We don’t know 0 or 1 but we have estimated them, so…
  • Compute the predicted values of Xi,Xiˆ, where Xiˆ= ˆ0 + ˆ1Zi, i = 1,…,n.

Two Stage Least Squares, ctd.
(2) Replace Xi by Xˆi in the regression of interest:

regress Y on Xˆi using OLS:
Yi = 0 + 1Xˆi + ui (2)

  • Because Xˆi is uncorrelated with ui, the first least squares assumption holds for regression (2). (This requires n to be large so that ?0 and ?1 are precisely estimated.)
  • Thus, in large samples, 1 can be estimated by OLS using regression (2)
  • The resulting estimator is called the Two Stage Least Squares (TSLS) estimator, ˆTSLS1

Two Stage Least Squares: Summary
Suppose Zi, satisfies the two conditions for a valid instrument:
1. Instrument relevance: corr(Zi,Xi) 0
2. Instrument exogeneity: corr(Zi,ui) = 0
Two-stage least squares:
Stage 1: Regress Xi on Zi (including an intercept), obtain the predicted values Xˆi
Stage 2: Regress Yi on Xˆi (including an intercept); the coefficient on Xˆi is the TSLS estimator, ˆTSLS1
ˆTSLS1 is a consistent estimator of ?1.
The IV Estimator, one X and one Z, ctd.
Explanation #2: A direct algebraic derivation
Yi = 0 + 1Xi + ui
Thus,
cov(Yi, Zi) = cov(0 + 1Xi + ui, Zi)
= cov(0, Zi) + cov(1Xi, Zi) + cov(ui, Zi)
= 0 + cov(1Xi, Zi) + 0
= 1cov(Xi, Zi)
where cov(ui, Zi) = 0 by instrument exogeneity; thus
1 = cov(Yi , Zi)/cov(Xi , Zi)

The IV Estimator, one X and one Z, ctd.
1 = cov(Yi , Zi)/cov(Xi , Zi)

The IV estimator replaces these population covariances with sample covariances:
ˆTSLS1 = SYZ/SXZ
sYZ and sXZ are the sample covariances. This is the TSLS estimator – just a different derivation!


Related Solutions

Show that OLS estimator of variance is an unbiased estimator?
Show that OLS estimator of variance is an unbiased estimator?
The Gauss-Markov theorem says that the OLS estimator is the best linear unbiased estimator.
The Gauss-Markov theorem says that the OLS estimator is the best linear unbiased estimator. Explain which assumptions are needed in order to verify Gauss-Markov theorem? Consider the Cobb-Douglas production function
There are two OLS regression specification Yi = aSexi + ui (1) Yi = b Malei...
There are two OLS regression specification Yi = aSexi + ui (1) Yi = b Malei + c Femalei + ui(2) a, b, c, are constants. Sexi = 1 if the person is Female, and 0 otherwise. Malei = 1 if Sexi = 0, Femalei = 1 if Sexi = 1, and both are 0 otherwise. Why are neither regression(1) nor regression(2) directly tell you if the difference in Yi between Males and Females is statistically significant? What would be...
1. Consider the model Ci= β0+β1 Yi+ ui. Suppose you run this regression using OLS and...
1. Consider the model Ci= β0+β1 Yi+ ui. Suppose you run this regression using OLS and get the following results: b0=-3.13437; SE(b0)=0.959254; b1=1.46693; SE(b1)=21.0213; R-squared=0.130357; and SER=8.769363. Note that b0 and b1 the OLS estimate of b0 and b1, respectively. The total number of observations is 2950.According to these results the relationship between C and Y is: A. no relationship B. impossible to tell C. positive D. negative 2. Consider the model Ci= β0+β1 Yi+ ui. Suppose you run this...
Consider the model Ci= β0+β1 Yi+ ui. Suppose you run this regression using OLS and get...
Consider the model Ci= β0+β1 Yi+ ui. Suppose you run this regression using OLS and get the following results: b0=-3.13437; SE(b0)=0.959254; b1=1.46693; SE(b1)=0.0697828; R-squared=0.130357; and SER=8.769363. Note that b0 and b1 the OLS estimate of b0 and b1, respectively. The total number of observations is 2950. The following values are relevant for assessing goodness of fit of the estimated model with the exception of A. 0.130357 B. 8.769363 C. 1.46693 D. none of these
1. Consider the model Ci= β0+β1 Yi+ ui. Suppose you run this regression using OLS and...
1. Consider the model Ci= β0+β1 Yi+ ui. Suppose you run this regression using OLS and get the following results: b0=-3.13437; SE(b0)=0.959254; b1=1.46693; SE(b1)=0.0697828; R-squared=0.130357; and SER=8.769363. Note that b0 and b1 the OLS estimate of b0 and b1, respectively. The total number of observations is 2950. The number of degrees of freedom for this regression is A. 2952 B. 2948 C. 2 D. 2950 2. Consider the model Ci= β0+β1 Yi+ ui. Suppose you run this regression using OLS...
Consider the simple linear regression: Yi = β0 + β1Xi + ui whereYi and Xi...
Consider the simple linear regression: Yi = β0 + β1Xi + ui where Yi and Xi are random variables, β0 and β1 are population intercept and slope parameters, respectively, ui is the error term. Suppose the estimated regression equation is given by: Yˆ i = βˆ 0 + βˆ 1Xi where βˆ 0 and βˆ 1 are OLS estimates for β0 and β1. Define residuals ˆui as: uˆi = Yi − Yˆ i Show that: (a) (2 pts.) Pn i=1...
Explain what assumptions are needed in order for the OLS estimator to be unbiased in a crosss sectional environment.
Explain what assumptions are needed in order for the OLS estimator to be unbiased in a crosss sectional environment. Explain what assumptions are needed in order for the OLS estimator to be unbiased in a panel data environment.  
Explain what assumptions are needed in order for the OLS estimator to be unbiased in a time series environment.
Explain what assumptions are needed in order for the OLS estimator to be unbiased in a time series environment.
Draw a scatterplot with a linear regression line for which the condition E(ui|Xi)=0 does not hold...
Draw a scatterplot with a linear regression line for which the condition E(ui|Xi)=0 does not hold for all Xi, but E(ui)=0. Be sure to explain how your scatterplot satisfies these criteria.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT