In: Advanced Math
In linear regression , why can we see the lack of fit test as the comparison between the model we choose and the saturated model?
Let the two alternative models are under consideration, one model is simpler or more parsimonious than the other,one of the models is the saturated model. Another common situation is to consider ‘nested’ models, where one model is obtained from the other one by putting some of the parameters to be zero. Suppose now we test
The difference in the null and alternative hypothesis from the section above. Here to test the null hypothesis that an arbitrary group of k coefficients from the model is set equal to zero (e.g. no relationship with the response), we need to fit two models:
The likelihood-ratio statistic is
the degrees of freedom is k (the number of coefficients in question).
To perform the test, we must look at the "Model Fit Statistics" section and examine the value of "−2 Log L" for "Intercept and Covariates." Here, the reduced model is the "intercept-only" model (e.g. no predictors) and "intercept and covariates" is the current model we fitted. For our running example, this would be equivalent to testing "intercept-only" model vs. full (saturated) model (since we have only one covariate).
For our example, ΔG2 = 5176.510 − 5147.390 = 29.1207 with df = 2 − 1 = 1. Notice that this matches Deviance we got in the earlier text above.
Another way to calculate the test statistic is
ΔG2 = G2 from reduced
model
−G2
from current model,
where the G2's are the overall goodness-of-fit statistics.
This value of -2 Log L is useful to compare two nested models which differ by an arbitrary set of coefficients.
Also notice that the ΔG2 we calculated for this example equals to
Likelihood Ratio 29.1207 1 <.0001
from "Testing Global Hypothesis: BETA=0" section
Testing the Joint Significance of All Predictors.
Testing the null hypothesis that the set of coefficients is simultaneously zero.
H0 : β1 = β2 = ... = 0 versus the alternative that at least one of the coefficients β1, . . . , βk is not zero.
the alternative that the current model (in this case saturated model) is correct
the SAS output, three different chisquare statistics for this test are displayed in the section
"Testing Global Null Hypothesis: Beta=0," corresponding to the likelihood ratio, score and Wald tests. Recall their definitions from the very first lessons.
The Homer-Lemeshow Statistic
An alternative statistic for measuring overall goodness-of-fit is Hosmer-Lemeshow statistic.
This is a Pearson-like χ2 that is computed after data are grouped by having similar predicted probabilities. It is more useful when there is more than one predictor and/or continuous predictors in the model too.