In: Statistics and Probability
There are three major reasons why you have to worry about endogeneity problems. List them and briefly discuss the method to resolve each problem.
endogeneity problem occurs when there is a correlation between your X variable and the error term in your model. These occur because of following three reasons -
A) Omitted variables
Suppose that a regression model excludes a key variable, due to data unavailability.
One remedy is to obtain a proxy variable that is correlated to the omitted variable
B) Measurement error
Data is often measured with error: reporting errors. coding errors. When the measurement error is in the dependent variable, the zero conditional mean assumption is not violated and thus no endogeneity. In contrast, when the measure error is in the independent variable, the problem of endogeneity arises.
C) Simultaneity in simultaneous equations models
Simultaneity arises when one or more of the independent variables, Xjs, is jointly determined with the dependent variable, Y , typically through an equilibrium mechanism.
Solutions
We need some way of separating out genuinely exogenous variation in independent variables that might be endogenous. Ideally, we would also like some way of testing the extent to which endogeneity is a problem in our data, and ensuring that the solution we have chosen is a good one. Two groups of solutions:
Ad hoc approaches
If a dependent variable is potentially endogenous, it is intuitively appealing to look for a proxy that does not suffer from the same problem. The most common approach is to lag the suspect variables by one or more periods.
Advantages - Very simple to implement. Additional data requirements limited. Intuitively appealing.
Limitations - Lags/Proxies Interpretation and becomes a little more difficult since the variable in the regression is only a proxy for the variable we are interested in. Loss of precision in some cases
Instrumental variables estimation
The best way to deal with endogeneity concerns is through instrumental variables (IV) techniques. The most common IV estimator is Two Stage Least Squares (TSLS). Intuitively, IV estimation works as follows: Find a genuinely exogenous variable (instrument) that is strongly correlated with the potentially endogenous regressor. Ensure that the instrument only influences the dependant variable through the potentially endogenous independent variable.