In: Math
The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question.
Such measures may be used in statistical hypothesis testing, for example, to test for normality of residuals, to test whether two samples are drawn from identical distributions, or rather outcome frequencies follow a specified distribution (Pearson's chi-squared test).
In the analysis of variance one of the components in
to which the variance is partitioned may be a lack of fit sum of
squares. In other words, it tells you if your sample data
represents the data you would expect to find in the actual
population.
Please in a minimum of 200 words:
What good is this information to us? Why would we need to know something like this?
Goodness of fit investigates whether it is reasonable to regard a random sample as coming from a particular specified distribution i.e. whether a particular model provides a "good fit" to the data. Chi- squared test is one way of checking overall goodness of fit. For eg.- If the deaths within a population have been classified by cause of death, this would enable us to test whether the numbers dying from each cause are consistent with the numbers predicted from an assumed set of proportions. It can be used to compare experience with a standard table (for eg. mortality/life table).
Examining the residuals from the fitted model can reveal inadequacies in the model. The residual is defined as the difference between the observed result and the expected result. Plotting the residuals for each treatment may reveal a pattern (lack of randomness- some uncontrolled factor at work), or reveal non- normality (if serious a transformation of the data may be required eg a log transformation if there is evidence of a pronounced positive skew), or reveal that the error variance is not independent of the treatment (again a transformation may be helpful in making the variance homogeneous among treatments - in particular, if the variance increases with the mean then a log transformation is recommended).
In the analysis of variance, the whole point of partitioning the variability is to see how much of the overall variance is made up of expected "within treatment" variance, SSR, and how much is made up of variance between the means, SSB. The larger the between- means variance is, the less likely it is that we can assume that they all have the same mean.