In: Statistics and Probability
Use the following data to create the contingency tables.
AGE
Male
16 17 17 19 19 19 18 17 18 17 16 19 19 19 17 16 17 16 19 19 24 31 23 44 21 42 23 43 43 33 30 41 35 40 24 43 22 30 25 32
43 51 55 80 61 58 65 52 67 75 90 63 71 74
Female
17 16 17 19 19 18 17 19 16 18 19 17 19 17 18 19 19 16 33 23 46 46 23 21 46 47 48 47 48 30 35 24 48 49 47 25 84 54 77 63 51 72 90 57 69 81
1. In the first table, use gender (male and female) as your row variable and age (<20, 20-50, and >50) for
your column variable. Run a Chi-square test of independence and find the test statistic, p-value, and
degrees of freedom.
2. In the second table, use gender (male and female) as your row variable and age (<18, 18-25, 26-45,
and >45) for your column variable. Run a Chi-square test of independence and find the test statistic,
p-value, and degrees of freedom.
3. Compare the results and comment on problems that may occur when categorizing continuous
variables
1)
The Chi-Square test of independence is used to determine if there is a significant relationship between two factors.
For variables Gender and Age
The Chi-Square test of independence is performed in following steps,
Step 1: The hypothesis is defined as,
Null hypothesis, Ho:There is no association between two variables.
Alternative hypothesis, Ha There is an association present between the two variables.
Step 2: The significance level for the test is,
Step 3: The Chi-Square test statistic is obtained as follow,
The observed values are,
<20 | 20-50 | >50 | Total | |
Male | 20 | 21 | 13 | 54 |
Female | 18 | 18 | 10 | 46 |
Total | 38 | 39 | 23 | 100 |
Step 4: The expected values are obtained using the formula,
The expected values are,
<20 | 20-50 | >50 | Total | |
Male | 20.52 | 21.06 | 12.42 | 54 |
Female | 17.48 | 17.94 | 10.58 | 46 |
Total | 38 | 39 | 23 | 100 |
Step 5: Now the Chi-Square Value is obtained using the formula,
Observed, | Expected, | |||
20 | 20.52 | -0.5200 | 0.2704 | 0.0132 |
21 | 21.06 | -0.0600 | 0.0036 | 0.0002 |
13 | 12.42 | 0.5800 | 0.3364 | 0.0271 |
18 | 17.48 | 0.5200 | 0.2704 | 0.0155 |
18 | 17.94 | 0.0600 | 0.0036 | 0.0002 |
10 | 10.58 | -0.5800 | 0.3364 | 0.0318 |
Sum | 0.0879 |
The P-value is obtained from chi square distribution table for degree of freedom = (r-1)(c-1)=(2-1)(3-1)=2
Since the P-value is greater than 0.05 at 5% significance level, the null hypothesis is not rejected.
2)
The Chi-Square test statistic is obtained as follow,
The observed values are,
<18 | 18-25 | 26-45 | >45 | Total | |
Male | 10 | 17 | 14 | 13 | 54 |
Female | 8 | 15 | 3 | 20 | 46 |
Total | 18 | 32 | 17 | 33 | 100 |
the expected values are obtained using the formula,
The expected values are,
<18 | 18-25 | 26-45 | >45 | Total | |
Male | 9.6364 | 17.1313 | 9.1010 | 17.6667 | 53 |
Female | 8.3636 | 14.8687 | 7.8990 | 15.3333 | 46 |
Total | 18 | 32 | 17 | 33 | 100 |
Now the Chi-Square Value is obtained using the formula,
Observed, | Expected, | |||
10 | 9.72 | 0.2800 | 0.0784 | 0.0081 |
17 | 17.28 | -0.2800 | 0.0784 | 0.0045 |
14 | 9.18 | 4.8200 | 23.2324 | 2.5308 |
13 | 17.82 | -4.8200 | 23.2324 | 1.3037 |
8 | 8.28 | -0.2800 | 0.0784 | 0.0095 |
15 | 14.72 | 0.2800 | 0.0784 | 0.0053 |
3 | 7.82 | -4.8200 | 23.2324 | 2.9709 |
20 | 15.18 | 4.8200 | 23.2324 | 1.5305 |
Sum | 8.3632 |
The P-value is obtained from chi square distribution table for degree of freedom = (r-1)(c-1)=(2-1)(4-1)=3
Since the P-value is less than 0.05 at 5% significance level, the null hypothesis is rejected.
3)
The null hypothesis is not rejected in first part while null hypothesis is rejected in second part of the problem. The main cause of this happen was due to change in the age categorization.
The main problem with the categorization is to choosing the number of cut points in for the continuous variable.
.