In: Statistics and Probability
Please answer a statistic problem(please answer all subquestions). Thank you very much.
NCbirths, cont' d. This time we will use a subset of the dataset after removing premature birth (36 weeks or sooner) and continue to use MomRace as the explanatory variable to predict birth weight.
To get NCbirths:
install.packages("Stat2Data")
library(Stat2Data)
data("NCbirths")
(a) Use the following R command to remove premature birth and create a dataset noPremie.
noPremies<-subset(NCbirths, NCbirths$Premie==0) ##subset(dataName, condition)
(b) Construct a side-by-side boxplot of birth weights separated by MomRate.
(c) Use tapply() or mean() to estimate group effects (αk)
(d) Conduct ANOVA F-test to assess whether there is at least one group effect statistically significant.
i) Copy and paste ANOVA table generated using R,
ii) check assumptions,
iii) Write all the 5 steps of the hypothesis test.
(e) Modify R command provide below to apply Fisher’s LSD to investigate which racial groups differ significantly from which others. Copy and paste R output. Summarize your conclusion.
#install.packages("agricolae")
#install outside r markdown, install in the Rstudio console or in an R script.
library(agricolae)
LSD.test(yourANOVAmodel, "groupVariableName", group=FALSE, console=T)
(f) This is a fairly large sample (over 1200 observations after removing premature births) so even relatively small difference in group means might yeild significant results. Do you think that the difference in mean birth weight among these racial groups are important in a practical sense? (Hint: do you think the estimated group effect (αk) from part (c) are meaningful in reality?) Explain briefly.
install.packages("Stat2Data")
library(Stat2Data)
data("NCbirths")
#a
noPremies<-subset(NCbirths, NCbirths$Premie==0)
##subset(dataName, condition)
#b
boxplot(BirthWeightOz ~ MomRace, data = noPremies)
#c
tapply(noPremies$BirthWeightOz, noPremies$MomRace, mean)
#d
model <- aov(BirthWeightOz ~ MomRace, data = noPremies)
summary(model)
Output
b)
c)
d)
Ho : m1 = mu2 = mu3 = mu4
Ha: at least one mean is different
TS = 8.401
since p-value = 0 < alpha
there is significant difference in means
e)
install.packages("agricolae")
library(agricolae)
LSD.test(model, "MomRace", group=FALSE, console=T)
Black and hispanic and Black and white vary significantly
as p-value < alpha