In: Economics
How do you avoid endogeneity problems and multicollinearity problems in your empirical work? Explain each
Endogeneity occurs when a variable, observed or unobserved, that is not included in our models, is related to a variable we incorporated in our model.
The regression model is given by: yi = β0 + β1x1i + . . . + βk xki + εi E (εi |x1i , . . . , xki) = 0
Once we have the information of our regressors, on average what we did not include in our model has no importance. E (yi |x1i , . . . , xki) = β0 + β1x1i + . . . + βk xki
Example: We want to explain wages and we use years of schooling as a covariate. Years of schooling is correlated with unobserved ability, and work ethic etc.
There are various instruments we use to avoid endogeneity. Eg. We are modeling income as a function of education. Education is endogenous. Quarter of birth is an instrument, albeit weak. We are modeling the demand for fish. We need to exclude the supply shocks and keep only the demand shocks. Rain is an instrument.
Variables:
We model Y as a function of X1 and X2 & X1 is endogenous .We can model X1 .X1 can be divided into two parts; an endogenous part and an exogenous part X1 = f(X2, Z) + ν. Z are variables that affect Y only through X1. Z are referred to as intrumental variables or excluded instruments.
The solution is the get a consistent estimate of the exogenous part and get rid of the endogenous part An example is two-stage least squares In two-stage least squares both relationships are linear.
Multicollinearity occurs when independent variables in a regression model are correlated. This correlation is a problem because independent variables should be independent. If the degree of correlation between variables is high enough, it can cause problems when you fit the model and interpret the results.
Multicollinearity causes the following two basic types of problems:
The coefficients become very sensitive to small changes in the model. Multicollinearity reduces the precision of the estimate coefficients, which weakens the statistical power of your regression model.
The need to reduce multicollinearity depends on its severity and your primary goal for your regression model.