In: Statistics and Probability
1. Explain why the linear probability model is inadequate as a specification for binary dependent variable estimation.
2. How can we measure whether the probit and logit model that we have estimated fits the data well or not?
3. How does R-square for the OLS differ frmo the pseduo R-square for binary models?
1. Probabilities must logically lie between 0 and 1 but LPM also predicts probabilities outside this range. LPM is also said to be heteroskedastic which means that the variable varies unequally across the range of the other variable which is predicting it. It is assumed that the residuals of a regression model are homoscedastic. This property of error determines whether the regression model has the ability to predict dependent variable consistently across all values of the dependent variable. However, this is not the case with LPM.
2. We could do some tests to validate whether the probit or logit model fit the data. I can think of the Hausman test, Small-Hsiao test. We can look at Bayesian or Akaike information criteria as well. Scalar measures of fit such as McFadden's R-squared can also be used.
3. Most likely estimates through an iterative process are the model estimates of logistic regression. They are not found to minimize variance which implies that the OLS approach to determine a good fit does not apply. However, pseudo R- squared can determine the good fit. These pseudo R squared look a lot like R-squared from OLS in terms of their structure and scale but they all have different interpretations and values as well. The pseudo R- squared have their calculations picked from various approaches of R-squared from OLS. In a way these peudo R-squared, all originate from such approaches.
For example, Efron's pseudo R-squared takes into account the squaring,summing of errors, division by variability, squared correlation which are interpretations we can look in R-squared for OLS.