In: Statistics and Probability
If I receive STATA output (regression) in an exam, and the question is to detect the following issues:
1- Heteroscedasticity
2- multicollinearity
3- Omitted variable
3- over specification
How can I detect them and know and detect there is issue in this output easily?
for example I know one of the signs of multicollinearity issue is when I notes insignificant t-values.
1. Heteroscedasticity - Heteroscedasticity is a condition when the variance of a set of variables are not constant.
Breusch-Pagan test helps to check the heteroscedasticity present in a data set and check the null hypothesis versus the alternative hypothesis. A null hypothesis postulates that the error variances in the model are all equal (homoscedasticity), whereas the alternative hypothesis states that the error variances in the model are are a multiplicative function of one or more variables (heteroscedasticity).
In the STATA output the acceptance or rejection of the null hypothesis can be determined by the chi-square test statistic value and the probability value. If the probability value is less than 0.05 then the null hypothesis can be rejected, i.e. the given variables are heteroscedastic.
2. Multicollinearity - Multicollinearity is a condition when the predictor varaibles in a model are linearly related. In order to measure multicollinearity it is required to study the variance inflation factor (VIF), which assesses how much the variance of an estimated regression coefficient increases if your predictors are correlated. If no factors are correlated, the VIFs will all be 1. A VIF factor of more than 10 indicates strong multicollinearity between the data values.
3. Omitted Variable - Omitted variables are variables that significantly influence the independent variable and so should be included in the model, but are excluded. In stata, the ovtest gives the results to check this criteria using Ramsay RESET test. The resultant output will have an F-statistic value and the corresponding probability value. If this probability value is more than 0.05 then the results are significant and there are omitted variables.
4. Over Specification - The Ramsay RESET test can be used to check for model specification. The resultant output will have an F-statistic value and the corresponding probability value. If this probability value is more than 0.05 then the results are significant and the model is not specified correctly.
Alternatively, the effect of dropping each predictor variable from the model can be tested. If dropping a variable does not lead to a significant reduction in the residual variance then the particular variable can be dropped.