In: Statistics and Probability
part of a long-term study of individuals aged 65 years or older, sociologists and physicians at the Wentworth Medical Center in upstate New York investigated the relationship between geographic location and depression. A sample of 60 individuals, all in reasonably good health, was selected; 20 individuals were residents of Florida, 20 were residents of New York, and 20 were residents of North Carolina. Each of the individuals sampled was given a standardized test to measure depression; higher test scores indicate higher levels of depression. The dataset can be found in the Worksheet titled “MEDICAL1”.
a) What is your response variable in this experiment, what is the factor, and what are the factor levels (treatments)?
b) You’re interested if there is a difference in Depression Score depending on the State. What are your null and alternative hypotheses for this?
c) Conduct an ANOVA at the alpha = 0.05 level to see if the State has an effect on the depression score. Show the ANOVA table including F statistic and p value. What is your conclusion?
d) To see which states are significantly different in the previous ANOVA (part c), you now want to conduct a multiple comparisons procedure. Run a Least Significant Difference test at the level alpha = 0.1 and determine which states, if any, are significantly different from each other.
e) Finally, we talked about comparison-wise vs experiment-wise Type 1 error in class. Run the LSD test (see part d) again, now including the Bonferroni adjustment. Consult the specification of the LSD test (?LSD.test) on how to do this. Using the alpha = 0.1 level, what changes in your conclusion when you compare the results with part d?
State | Depression_Score |
Florida | 3 |
Florida | 7 |
Florida | 7 |
Florida | 3 |
Florida | 8 |
Florida | 8 |
Florida | 8 |
Florida | 5 |
Florida | 5 |
Florida | 2 |
Florida | 6 |
Florida | 2 |
Florida | 6 |
Florida | 6 |
Florida | 9 |
Florida | 7 |
Florida | 5 |
Florida | 4 |
Florida | 7 |
Florida | 3 |
New.York | 8 |
New.York | 11 |
New.York | 9 |
New.York | 7 |
New.York | 8 |
New.York | 7 |
New.York | 8 |
New.York | 4 |
New.York | 13 |
New.York | 10 |
New.York | 6 |
New.York | 8 |
New.York | 12 |
New.York | 8 |
New.York | 6 |
New.York | 8 |
New.York | 5 |
New.York | 7 |
New.York | 7 |
New.York | 8 |
North.Carolina | 10 |
North.Carolina | 7 |
North.Carolina | 3 |
North.Carolina | 5 |
North.Carolina | 11 |
North.Carolina | 8 |
North.Carolina | 4 |
North.Carolina | 3 |
North.Carolina | 7 |
North.Carolina | 8 |
North.Carolina | 8 |
North.Carolina | 7 |
North.Carolina | 3 |
North.Carolina | 9 |
North.Carolina | 8 |
North.Carolina | 12 |
North.Carolina | 6 |
North.Carolina | 3 |
North.Carolina | 8 |
North.Carolina | 11 |
a) The Response Variable in this experiment is the "Depression Score", the factor variable is "State" and the factor levels (treatments) are "Florida", "New York" and "North Carolina".
b) The null and alternative hypotheses for this are:
H0 : The mean (average value of the dependent variable that is the "Depression Score") is the same for all the States.
H1 : The mean (average value of the dependent variable that is the "Depression Score") is not same for atleast one of the States.
c) The R-Code is:
depression_score = c(3,7,7,3,8, 8, 8, 5, 5, 2, 6, 2, 6, 6, 9, 7,
5, 4, 7, 3, 8, 11, 9, 7, 8, 7, 8, 4, 13, 10, 6, 8, 12, 8, 6, 8, 5,
7, 7, 8, 10, 7, 3, 5, 11, 8, 4, 3, 7, 8, 8, 7, 3, 9, 8, 12, 6, 3,
8, 11)
state = c(rep("Florida",20),rep("New York",20),rep("North
Carolina",20))
summary(aov(depression_score~state))
---
Df Sum Sq Mean Sq F value Pr(>F)
state 2 61.0 30.517 5.241 0.00814 **
Residuals 57 331.9 5.823
---
qf(.95, df1=2, df2=57) #R-Code for obtaining the tabulated
F-Value.
thus, Tabulated F-Value = 3.158843 < Calculated F-Value =
5.241
Conclusion:
Hence, we do not accept the null hypothesis that is the mean (average value of the dependent variable that is the "Depression Score") is not same for atleast one of the "States".
Again, the p-value = 0.00814 < 0.05 (level of significance) and hence, we do not accept the null hypothesis that is the mean (average value of the dependent variable that is the "Depression Score") is not same for atleast one of the "States".
d) The R-Code:
out = LSD.test(y, trt = "state", alpha = 0.1,
p.adj="none")
out
$statistics
MSerror Df Mean CV t.value LSD
5.822807 57 6.866667 35.14149 1.672029 1.27588
$parameters
test p.ajusted name.t ntr alpha
Fisher-LSD none state 3 0.1
$means
depression_score std r LCL UCL Min Max Q25 Q50 Q75
Florida 5.55 2.139233 20 4.647816 6.452184 2 9 3.75 6.0 7.00
New York 8.00 2.200478 20 7.097816 8.902184 4 13 7.00 8.0
8.25
North Carolina 7.05 2.837252 20 6.147816 7.952184 3 12 4.75 7.5
8.25
$comparison
NULL
$groups
depression_score groups
New York 8.00 a
North Carolina 7.05 a
Florida 5.55 b
attr(,"class")
[1] "group"
Hence, we observe that the states "New York" and "North Carolina" are not significantly different from each other while "Florida" is significantly different from the other states.
e) The R-Code:
out = LSD.test(y, trt = "state", alpha = 0.1, p.adj="bonferroni")
out
$statistics
MSerror Df Mean CV t.value MSD
5.822807 57 6.866667 35.14149 2.180883 1.664173
$parameters
test p.ajusted name.t ntr alpha
Fisher-LSD bonferroni state 3 0.1
$means
depression_score std r LCL UCL Min Max Q25 Q50 Q75
Florida 5.55 2.139233 20 4.647816 6.452184 2 9 3.75 6.0 7.00
New York 8.00 2.200478 20 7.097816 8.902184 4 13 7.00 8.0
8.25
North Carolina 7.05 2.837252 20 6.147816 7.952184 3 12 4.75 7.5
8.25
$comparison
NULL
$groups
depression_score groups
New York 8.00 a
North Carolina 7.05 ab
Florida 5.55 b
attr(,"class")
[1] "group"
Hence, we observe that the states "New York" and "North Carolina" are not significantly different from each other while "Florida" and "North Carolina" are also not significantly different from the each other but "Florida" and "New York" are significantly different from each other.