In: Statistics and Probability
I’m working on an ANOVA homework problem, I have 5 group’s individual income based on their years of education. (<12, 12, 13-15, 16 and 16+). Here is the problem:
Use an extra sum of squares F-test (BYOA: Build Your Own ANOVA!) to use all the data (to increase the degrees of freedom and thus the power of the test!) to compare only the bachelor’s degree group (16) income to the more than bachelor’s degree group (>16) income.Show your final ANOVA table and your 6-step complete analysis.You will need to assume that the standard deviations of the log-transformed data are again equal to proceed here.A two-sample t-test between these two groups (assuming equal standard deviations on logged data) yields a p-value of .1648 (try it!), but it only uses 778 degrees of freedom (from a pooled t-test).Make note again of how many degrees of freedom were used to estimate the pooled standard deviation in your extra sum of squares test.You may use SAS or R.
I need help!I’m using SAS.For the first step, I ran the ANOVA on the logged data to determine if there was a difference in any of the means.Test concluded there was.
Dependent Variable: logincome2005
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
---|---|---|---|---|---|
Model |
4 |
217.653784 |
54.413446 |
62.87 |
<.0001 |
Error |
2579 |
2232.120383 |
0.865498 |
||
Corrected Total |
2583 |
2449.774168 |
R-Square |
Coeff Var |
Root MSE |
logincome2005 Mean |
---|---|---|---|
0.088846 |
8.913094 |
0.930322 |
10.43770 |
Then, I put all the non-16 year subjects in one group and ran the ANOVA to compare the 16 year group to the combined group of the others to see if there was a difference.Test concluded there was.
Dependent Variable: logincome2005
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
---|---|---|---|---|---|
Model |
1 |
62.214640 |
62.214640 |
67.28 |
<.0001 |
Error |
2582 |
2387.559527 |
0.924694 |
||
Corrected Total |
2583 |
2449.774168 |
R-Square |
Coeff Var |
Root MSE |
logincome2005 Mean |
---|---|---|---|
0.025396 |
9.212857 |
0.961610 |
10.43770 |
Next, I grouped the non-16+ subjects together and ran the ANOVA to compare the 16+ group to the combined group of the others.Test concluded there was a difference.
Dependent Variable: logincome2005
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
---|---|---|---|---|---|
Model |
1 |
92.614028 |
92.614028 |
101.45 |
<.0001 |
Error |
2582 |
2357.160140 |
0.912920 |
||
Corrected Total |
2583 |
2449.774168 |
R-Square |
Coeff Var |
Root MSE |
logincome2005 Mean |
---|---|---|---|
0.037805 |
9.154018 |
0.955469 |
10.43770 |
I then “built” my own ANOVA tables:The first comparing the 16 group comparison run against the original and then the 16+ group comparison run against the original.
16 years educ different (comparing to original result)
df |
SS |
MS |
F |
Pr > F |
|
Model (Full) |
3 |
155.44 |
51.8133333 |
59.8653238 |
0 |
Error (from Full) |
2579 |
2232.12 |
0.86549826 |
||
Total (From Reduced) |
2582 |
2387.56 |
16+ years educ different (comparing to original result)
df |
SS |
MS |
F |
Pr > F |
|
Model (Full) |
3 |
125.04 |
41.68 |
48.1572317 |
0 |
Error (from Full) |
2579 |
2232.12 |
0.86549826 |
||
Total (From Reduced) |
2582 |
2357.16 |
I’m stuck on what to do next in order to compare only the 16 yr group against the 16+ year.Guidance would be appreciated.
After you ran the first ANOVA
Dependent Variable: logincome2005
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
---|---|---|---|---|---|
Model |
4 |
217.653784 |
54.413446 |
62.87 |
<.0001 |
Error |
2579 |
2232.120383 |
0.865498 |
||
Corrected Total |
2583 |
2449.774168 |
R-Square |
Coeff Var |
Root MSE |
logincome2005 Mean |
---|---|---|---|
0.088846 |
8.913094 |
0.930322 |
10.43770 |
You should have run Post-hoc tests. Post-hoc tests are used to check where the difference exists
between groups if there is an overall significance. Since there exists a difference in any of the means of 5 groups, a post-hoc test will provide the evidence where this difference occurred. A Bonferroni post-hoc test, Tukey HSD, or any other test is generally used to do so. This will compare each group with all the other groups as :
<12 with <12 , <12 with 12 , <12 with 13-15, <12 with 16, <12 with 16+
and so on.
In this way you will get a comparison of 16 with 16+ as well.