In: Statistics and Probability
2. A random sample of 395 people were surveyed and each person
was asked to report the highest education level they obtained. The
data that resulted from the survey is summarized in the following
table:
High School Bachelors
Masters Ph.d. Total
Female 60 54 46
41 201
Male 40 44 53
57 194
Total 100 98 99
98 395
a. Are gender and education level dependent at 5% level of
significance? (6mks)
b.State and explain two methods of studying correlation
(4mks
Solution
Part (a)
Solution is based on the theory of Chi-square Test for Independence.
Final answers are given below. Back-up Theory and Details of calculations follow at the end.
DF |
3 |
α |
0.05 |
χ2crit |
7.8147 |
χ2cal |
8.0061 |
p-value |
0.0459 |
Reject H0 |
Since null hypothesis of independence is rejected, we conclude that
gender and education level are NOT independent. Answer 1
Back-up Theory and Details of calculations
Suppose we have a contingency table with r rows representing r levels/grades of one attribute and c columns representing c levels/grades of another attribute.
The Chi-square Test of Independence is designed to test if the two attributes are associated.
Hypotheses
Null H0: The two attributes, namely gender and education level are independent .
Vs
Alternative H1: The two attributes are not independent
Test Statistic
χ2 = ∑(i = 1 to r, j = 1 to c){(Oij - Eij)2/Eij}, where Oij and Eij are respectively, the observed and the expected frequencies of the ijth cell of the contingency table.
Under H0, Eij = (Oi. x O.j)/O.., where Oi.,O.j, and O.. are respectively the ith row total, jth column total and grand total.
Calculations
Oij |
|||||
j1 |
j2 |
j3 |
j4 |
Oi. |
|
i1 |
60 |
54 |
46 |
41 |
201 |
i2 |
40 |
44 |
53 |
57 |
194 |
O.j |
100 |
98 |
99 |
98 |
395 |
Eij = (Oi. X O.j)/N |
|||||
j1 |
j2 |
j3 |
j4 |
Ei. |
|
i1 |
50.8861 |
49.8684 |
50.3772 |
49.8684 |
201 |
i2 |
49.1139 |
48.1316 |
48.6228 |
48.1316 |
194 |
E.j |
100 |
98 |
99 |
98 |
395 |
χ2ij |
|||||
j1 |
j2 |
j3 |
j4 |
Total |
|
i1 |
1.6323 |
0.3423 |
0.3803 |
1.5771 |
3.9321 |
i2 |
1.6912 |
0.3547 |
0.3941 |
1.6340 |
4.0740 |
Total |
3.3236 |
0.6970 |
0.7744 |
3.2111 |
8.0061 |
Distribution
Under H0, χ2 ~ χ2n, where n = {(r - 1)(s - 1)}
Critical Value
Given level of significance as α, critical value is the upper α% point of χ2n.
p-value
P(χ2n > χ2cal)
Critical value and p-value obtained using Excel Function: Statistical CHIINV and CHIDIST are given in the above table.
Decision
Since χ2cal > χ2crit, or equivalently, p-value < α, H0is rejected.
Part (b)
Two methods of studying correlation
1. Chi-square Test for Independence which will ascertain if there is association
2. Theory of correlation and regression which will first ascertain if there is association and if so establish the actual correlation relationship. Answer 2
DONE