In: Statistics and Probability
The chi square test is easey to understand following step.
The Chi Square statistic is commonly used for testing relationships between categorical variables. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population; they are independent.
Three types of chi squre test of following.
1) Independence: Use the test for independence to decide whether two variables (factors) are independent or dependent. In this case there will be two qualitative survey questions or experiments and a contingency table will be constructed. The goal is to see if the two variables are unrelated (independent) or related (dependent). The null and alternative hypotheses are:
H0: The two variables (factors) are independent.
H1 : The two variables (factors) are dependent.
Example:
Non-Smoker | Smoker | Total | |
Athlete | 14 | 4 | 18 |
Non-athlete | 0 | 10 | 10 |
Total | 14 | 14 | 28 |
Non-Smoker | Smoker | Total | |
Athlete | (14*18) / 28 =9 | (14*18)/28 = 9 | 18 |
Non-athlete | (10*14) /28 = 5 | (10*14)/28 = 5 | 10 |
Total | 14 | 14 | 28 |
Test statistic
We have the observed and expected frequencies. We now need to compare these frequencies to determine if they differ significantly. The difference between the observed and expected frequencies, referred as the test statistic (or t-stat) and denoted χ2, is computed as follows:
and then we sum them all to obtain the test statistic:
df = (number of rows − 1) ⋅ (number of columns − 1)
In our example, the degrees of freedom is thus df = (2−1) ⋅ (2−1) = 1 since there are two rows and two columns in the contingency table (totals do not count as a row or column).
We now have all the necessary information to find the critical value in the Chi-square table (α = 0.05 and df = 1). To find the critical value we need to look at the row df = 1 and the column χ^2_0.050 (since α = 0.05) in the picture below. The critical value is 3.84146.1
Conclusion and interpretation :
test statistic = 15.56 > critical value = 3.84146
Like for any statistical test, when the test statistic is larger than the critical value, we can reject the null hypothesis at the specified significance level.
..................................................................................................................................................................
2) Goodness-of-Fit: Use the goodness-of-fit test to decide whether a population with an unknown distribution "fits" a known distribution. In this case there will be a single qualitative survey question or a single outcome of an experiment from a single population. Goodness-of-Fit is typically used to see if the population is uniform (all outcomes occur with equal frequency), the population is normal, or the population is the same as another population with a known distribution. The null and alternative hypotheses are:
H0: The population fits the given distribution.
H1 : Ha: The population does not fit the given distribution.
.............................................................................................................................................................
Homogeneity: Use the test for homogeneity to decide if two populations with unknown distributions have the same distribution as each other. In this case there will be a single qualitative survey question or experiment given to two different populations. The null and alternative hypotheses are:
H0: The two populations follow the same distribution.
H1 : The two populations have different distributions.
overview:
The goodness-of-fit test is typically used to determine if data fits a particular distribution. The test of independence makes use of a contingency table to determine the independence of two factors. The test for homogeneity determines whether two populations come from the same distribution, even if this distribution is unknown.