In: Math
1. Define the following terms:
A. Contingency table
B. Chi-square test
2. List the assumptions required to perform a chi-square test?
solution:
1) Contingency Table
A two-way table (also called a contingency table) is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies (just like a one-way table ).
Dance | Sports | TV | Total | |
Men | 2 | 10 | 8 | 20 |
Women | 16 | 6 | 8 | 30 |
Total | 18 | 16 | 16 | 50 |
The two-way table above shows the favorite leisure activities for 50 adults - 20 men and 30 women. Because entries in the table are frequency counts, the table is a frequency table .
B)
1) Chi-Square Goodness of Fit Test
A chi-square goodness of fit test attempts to answer the following question: Are sample data consistent with a hypothesized distribution?
The test is appropriate when the following conditions are met:
Here is how to conduct the test.
Define hypotheses. For a chi-square goodness of fit test, the hypotheses take the following form.
H0: The data are consistent with a specified
distribution. |
Typically, the null hypothesis specifies the proportion of observations at each level of the categorical variable. The alternative hypothesis is that at least one of the specified proportions is not true.
Ei = npi
where Ei is the expected frequency count for the ith level of the categorical variable, n is the total sample size, and pi is the hypothesized proportion of observations in level i.Χ2 = Σ [ (Oi - Ei)2 / Ei ]
where Oi is the observed frequency count for the ith level of the categorical variable, and Ei is the expected frequency count for the ith level of the categorical variable.If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.
2)
Chi-Square Test for Independence
A chi-square test for independence is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.
The test consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.
DF = (r - 1) * (c - 1)
where r is the number of levels for one catagorical variable, and c is the number of levels for the other categorical variable.Er,c = (nr * nc) / n
where Er,c is the expected frequency count for level r of Variable A and level c of Variable B, nr is the total number of sample observations at level r of Variable A, nc is the total number of sample observations at level cof Variable B, and n is the total sample size.Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
where Or,c is the observed frequency count at level r of Variable A and level c of Variable B, and Er,c is the expected frequency count at level r of Variable A and level c of Variable B.3)
Chi-Square Test for Homogeneity
The chi-square test of homogeneity is applied to a single categorical variable . It is used to compare the distribution of frequency counts across different populations. It answers the following question: Are frequency counts distributed identically across different populations?
The test procedure is appropriate when the following conditions are met:
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.
H0: Plevel 1 of population 1 = Plevel
1 of population 2 = . . . = Plevel 1 of population
r |
DF = (r - 1) * (c - 1)
where r is the number of populations, and c is the number of levels for the categorical variable.Er,c = (nr * nc) / n
where Er,c is the expected frequency count for population r at level c of the categorical variable, nr is the total number of observations from population r, nc is the total number of observations at treatment level c, and n is the total sample size.Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
where Or,c is the observed frequency count in population r for level c of the categorical variable, and Er,c is the expected frequency count in population r for level c of the categorical variable.