In: Statistics and Probability
Provide a simple research example to illustrate how chi-square could be used in an analysis.
It must discuss how the frequency distribution should be set up.
For example, you have to make categories of the data in order to run a chi-square.
Introduction:
Consider that a researcher wants to test the effectiveness of a vaccine in treating a disease.
Two categories of vaccination are considered- vaccinated and non-vaccinated. Again, two categories of the status of the disease are considered- diseased and non-diseased.
The null and alternative hypotheses for the test are:
H0: Vaccination and disease status are independent.
H1: Vaccination and disease status are not independent, but related.
A chi-squared is suitable to be used in this situation. It would be a chi-square test of independence.
Explanation:
Data must be collected on a total four classes- diseased and vaccinated, diseased and non-vaccinated, non-diseased and vaccinated, and non-diseased and non-vaccinated. Consider that the observed frequencies of these four classes are respectively, a, b, c, and d.
The frequency distribution of the data would be of the form:
Consider the observation in the ith row and jth column of the table, that is, in the cell (i, j). Then, the expected frequency of the cell (i, j), if it is assumed that the two categories are independent, is [(total of row i) ∙ (total of column j)] / (grand total).
For example, for the 1st cell, that is, row 1 and column 1, the expected frequency will be:
Similarly, the expected frequencies of all the cells are calculated below:
Now, the formula for the chi square test statistic is:
In general, for r categories of the row variable and c categories of the column variable, the degrees of freedom for the test would be df = (r – 1) ∙ (c – 1).
It can be observed that there are r = 2 rows, indicating the 2 statuses of vaccination- Vaccinated, and Non-vaccinated, and there are c = 2 columns, indicating 2 statuses of disease- Diseased, and Non-diseased.
Thus, the degrees of freedom for the test would be:
df
= (r – 1) ∙ (c – 1)
= (2 – 1) ∙ (2 – 1)
= 1 ∙ 1
= 1.
Thus, the degree of freedom is 1.
The p-value is, p-value = P (χdf2 ≥ χ2).
Decision rule using p-value for level of significance, α:
Reject H0 at significance level α, if p-value ≤ α. Otherwise, fail to reject H0.