In: Math
Explain the different hypothesis tests one could use when assessing the distribution of a categorical variable (e.g. smoking status) with only two levels (e.g. levels: smoker and non-smoker) vs. more than two levels (e.g. levels: heavy smoker, moderate smoker, occasional smoker, non-smoker).
Be precise. Use the language of the textbook to identify the appropriate test and how you would conduct it. NOTE: Minimum of 150 words for primary post and 50 words for each of three replies to your peers.
The hypothesis test that should be used in this case is the Chi Square Test of Goodness of Fit.
This test is used to test the validity of a distribution assumed for a random phenomenon.
It evaluates the null hypotheses H0 (that the data are governed by the assumed distribution) against the alternative (that the data are not drawn from the assumed distribution).
Let p1, p2, ..., pk denote the probabilities hypothesized for k possible outcomes.
In n independent trials, let Y1, Y2, ..., Yk denote the observed counts of each outcome which are to be compared to the expected counts np1, np2, ..., npk.
The chi-square test statistic is qk-1 =
= (Y1 - np1)² + (Y2 - np2)² + ... + (Yk - npk)² ---------- ---------- -------- np1 np2 npk
H0 is rejected if this value exceeds the upper critical value of the (k-1) distribution, where is the desired level of significance.
As with most test statistics, the larger the difference between observed and expected, the larger the test statistic becomes.
The distribution of the test statistic under the null hypothesis is approximately the same as the theoretical chi-square distribution.
This means that once the chi-square value and the number of degrees of freedom is known, you the probability of getting that value of chi-square can be calculated using the chi-square distribution.
The number of degrees of freedom is the number of categories minus one.