In: Statistics and Probability
When and how do you use a chi-square distribution to test if two
variables are independent? What is an example of how to use the
contingency table to find expected frequencies?
The Chi-Square test of independence is used to determine if there is a significant relationship between two nominal (categorical) variables. The frequency of each category for one nominal variable is compared across the categories of the second nominal variable. The data can be displayed in a contingency table where each row represents a category for one variable and each column represents a category for the other variable. For example, say a researcher wants to examine the relationship between gender (male vs. female) and empathy (high vs. low). The chi-square test of independence can be used to examine this relationship. The null hypothesis for this test is that there is no relationship between gender and empathy. The alternative hypothesis is that there is a relationship between gender and empathy (e.g. there are more high-empathy females than high-empathy males).
Let us consider the following example :
Suppose we have the observed counts we need to compute the expected counts under the null hypothesis that the two categorical variables are independent. This is done using the marginal totals and overall total to compute expected counts for each cell of the table. In words, to find the expected count for each cell in the table we take multiply the marginal row and column totals for that cell and divide by the overall total. Formulaically for each cell this is:
To conduct this test we compute a Chi-square test statistic where we compare each cell's observed count to it's respective expected count. This Chi-square test statistic is calculated as follows:
We make our decision by either comparing the value of the test statistic to a critical value (rejection region approach), or by finding the probability of getting this test statistic value or one more extreme (p-value approach). The critical value for our Chi-square test is χ2alpha with degree of freedom = (r - 1) (c - 1), while the p-value is found by P(χ2alpha > χ2∗) with degrees of freedom = (r - 1)(c - 1)
So, in this problem our hypothesis are :
Null hypothesis : Party Affiliation and Opinion on Tax Reform are independent.
Alternative hypothesis : Party Affiliation and Opinion on Tax Reform are not independent.
Now, Tabulated χ2alpha = 7.377
Since, χ2∗ = 22.152 > 7.377 (χ2alpha) , we reject null hypothesis and conclude that Party Affiliation and Opinion on Tax Reform are not independent.