In: Statistics and Probability
1. From Statistics and Data Analysis from Elementary to
Intermediate by Tamhane and
Dunlop, pg 339. The following table gives the eye color and hair
color of 592 students.
Eye Hair Color Row
Color Black Brown Red Blond Total
Brown 68 119 26 7 220
Blue 20 84 17 94 215
Hazel 15 54 14 10 93
Green 5 29 14 16 64
Column Total 108 286 71 127 592
a) What test should we use to test that the eye color and hair
color are associated? Give the null
and alternative hypothesis.
b) Conduct the test at α = 0.05. What do you conclude? Give more
than reject or fail to reject
H0.
2. The following data is looking at how long it takes to get to
work. Let x = commuting
distance (miles) and y = commuting time (minutes)
x 15 16 17 18 19 20
y 42 35 45 42 49 46
a) Give a scatterplot of this data and comment on the direction, form, and strength of this relationship.
b) Determine the least-squares estimate equation for this data
set.
c) Give the coefficient of determination, R2, comment on what that
means.
d) Give the residual plot based on the least-squares estimate
equation.
e) Test if this least-squares estimate equation specify a useful
relationship between commuting
distance and commuting time.
1.
a) What test should we use to test that the eye color and hair
color are associated? Give the null
and alternative hypothesis..
We shall be using a test for independence/association of attributes between eye color and the hair color.
Null Hypothesis: There is no association between eye color and the hair color
Alternate Hypothesis: There is an association between eye color and hair color.
b). Conduct the test at α = 0.05. What do you conclude? Give
more than reject or fail to reject
H0.
Level of significance
Test statistic:
where is the observed frequency in the cell, is the expected frequency in the cell and is given by where is the total of row, is the total of column and T is the total of all the values. We shall first calculate the expected frequency first and then calculate the Chi-square.
The observed frequencies:
OBS | Black | Brown | Red | Blond | Total |
Brown | 68 | 119 | 26 | 7 | 220 |
Blue | 20 | 84 | 17 | 94 | 215 |
Hazel | 15 | 54 | 14 | 10 | 93 |
Green | 5 | 29 | 14 | 16 | 64 |
Total | 108 | 286 | 71 | 127 | 592 |
The expected frequencies.
EXPECTED | Black | Brown | Red | Blond | Total |
Brown | 40.13514 | 106.2838 | 26.38514 | 47.19595 | 220 |
Blue | 39.22297 | 103.8682 | 25.78547 | 46.12331 | 215 |
Hazel | 16.96622 | 44.92905 | 11.15372 | 19.95101 | 93 |
Green | 11.67568 | 30.91892 | 7.675676 | 13.72973 | 64 |
Total | 108 | 286 | 71 | 127 | 592 |
CHiSQ | Black | Brown | Red | Blond | Total |
Brown | 19.34591 | 1.521419 | 0.005622 | 34.23417 | 55.10712 |
Blue | 9.421078 | 3.80046 | 2.993334 | 49.69672 | 65.91159 |
Hazel | 0.227865 | 1.831378 | 0.726335 | 4.96329 | 7.748867 |
Green | 3.816879 | 0.119094 | 5.210887 | 0.375399 | 9.522259 |
Total | 32.81173 | 7.27235 | 8.936177 | 89.26958 | 138.2898 |
The critical value of at 9 df is 16.9190.
The calculated value of is 138.2898>the critical value and so we reject the null hypothesis. Hence, we conclude that the eye color and the hair color is associated.
2. The given data:
x | y |
15 | 42 |
16 | 35 |
17 | 45 |
18 | 42 |
19 | 49 |
20 | 46 |
a). the scatter plot.
The scatter plot shows an upward trend of the points and the points seem to form a line but the relationship is weak since all the points are bit away from the line.
b) Determine the least-squares estimate equation for this data set.
For the calculation of the least square estimates:
x | y | x^2 | y^2 | x*y | |
15 | 42 | 225 | 1764 | 630 | |
16 | 35 | 256 | 1225 | 560 | |
17 | 45 | 289 | 2025 | 765 | |
18 | 42 | 324 | 1764 | 756 | |
19 | 49 | 361 | 2401 | 931 | |
20 | 46 | 400 | 2116 | 920 | |
Total | 105 | 259 | 1855 | 11295 | 4562 |
Therefore the estimated linear equation is
c) Give the coefficient of determination, R2, comment on what that means.
The coefficient of determination is . The correlation is given by
The coefficient of determination is This implies that 43.31% of the variation of the data is being explained by this model.
d) Give the residual plot based on the least-squares estimate equation.
e). Test if this least-squares estimate equation specify a
useful relationship between commuting
distance and commuting time.
For this, we shall create the ANOVA table. We need to calculate the F-value for the Regression SS.
The Total sum of squares is
The regression sum of squares is
The error sum of squares ESS=TSS-RSS=114.83333-49.7285=65.1052
We shall form the ANOVA table:
df | SS | MS | F | Significance F | |
Regression | 1 | 49.72857 | 49.72857 | 3.055295 | 0.15539 |
Residual | 4 | 65.10476 | 16.27619 | ||
Total | 5 | 114.8333 |
From the Table above, we can see that the p-value is 0.1554>0.05. Hence we fail to reject the null hypothesis and conclude that there is not enough evidence to that teh relationship between X and Y is linear. There fore,the equation does not specify a useful relationship between commuting distance and commuting time.