In: Statistics and Probability
Define endogeneity and explain how it may affect the results of quantitative research. List at least 3 causes of endogeneity. For each of these, describe measures that can be taken to identify prevent the biases that lead to invalid results.
Endogeneity broadly refers to situations in
which an explanatory variable is correlated with the error term. A
simple example of endogeneity is the following:
A = XB+E
X = AC+W
Since the current value of X depends on the current value
of A, X must be influenced by current shocks to A:
A = (AC+W)B+E
Thus, X and E are correlated. This violates the OLS
assumptions.
Endogenity is a key problem for research on inequality. Technically, endogeneity occurs when a predictor variable (x) in a regression model is correlated with the error term (e) in the model. This can occur under a variety of conditions, but two cases are especially common in inequality research: (1) when important variables are omitted from the model (called “omitted variable bias”) and (2) when the outcome variable is a predictor of x and not simply a response to x (called “simultaneity bias”). At least part of the latter problem is often called “selection.”
Endogeneity problem = ommitted variable problem.
If the ommitted variable is invariant at the level of the fixed effect then the fixed effect solves the problem.
The best way to deal with endogeneity concerns is through
instrumental variables (IV) techniques.
The most common IV estimator is Two Stage Least
Squares (TSLS).
If you still have any doubts, please feel free to ask them in comments section. Thanks.