In: Statistics and Probability
Use RStudio to answer this question.
Before opening the dataset needed for this problem, you’ll need to call the “car” package
> library(car)
Now you can import the “Robey” dataset and use it to answer the question below. Name the data frame with abc:
> abc <- Robey
Remember to include any R code you use along with your answers!
The Robey dataset contains fertility rates from a sample of countries. You want to see if total fertility rate (tfr), which is the average number of children per woman, differs across the four regions (region) of countries in the dataset. There is also a variable that indicates the percent of married women of childbearing age who use contraception (contraceptors).
a) Is there a difference in mean total fertility rate across the four regions? Carry out a one-way ANOVA to answer this question, and include all steps for full credit
b) Which pairs of regions have significantly different mean total fertility rates?
c) Now run an ANOVA model (main effects only) to determine if region is still significant while controlling for contraceptors. You do not need to write hypotheses or check assumptions. Write a full conclusion for the main effect of region in this new multivariate model.
a)
H0: Total fertility rate (tfr) is same for all four regions.
H1: Atleast one region has different fertility rates.
Run levene test to check for assumptions of homogeneity of variance to conduct anova test.
> leveneTest(tfr ~ region, data = abc)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.6144 0.6092
46
From the output above we can see that the p-value (0.6092) is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.
Run one-way ANOVA using below command in R studio.
> summary(anova.test <- aov(tfr ~ region, data =
abc))
Df Sum Sq Mean Sq F value Pr(>F)
region 3 44.30 14.768 11.47 9.72e-06 ***
Residuals 46 59.21 1.287
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
The p-value of the anova test is 9.72e-06 which is less than the significance level of 0.05. So, we reject H0 and conclude that there is significant evidence that atleast one region has different fertility rates.
b)
We will conduct Tukey HSD test to see which pairs of regions have significantly different mean total fertility rates.
> TukeyHSD(anova.test)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = tfr ~ region, data = abc)
$region
diff lwr upr p adj
Asia-Africa -2.315556 -3.5082625 -1.1228486
0.0000283
Latin.Amer-Africa -1.805556 -2.8446002 -0.7665109
0.0001708
Near.East-Africa -1.055556 -2.4811130 0.3700019 0.2127807
Latin.Amer-Asia 0.510000 -0.7090392 1.7290392 0.6821544
Near.East-Asia 1.260000 -0.3016200 2.8216200 0.1526528
Near.East-Latin.Amer 0.750000 -0.6976605 2.1976605 0.5174900
We see that p-value for Asia-Africa and Latin.Amer-Africa regions are less than 0.05 significance level. Thus, Asia-Africa and Latin.Amer-Africa regions have significantly different mean total fertility rates.
c)
Run two-way ANOVA (including the variable contraceptors) using below command in R studio.
> summary(anova.test <- aov(tfr ~ region + contraceptors,
data = abc))
Df Sum Sq Mean Sq F value Pr(>F)
region 3 44.30 14.77 46.92 6.67e-14 ***
contraceptors 1 45.05 45.05 143.12 1.42e-15 ***
Residuals 45 14.16 0.31
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The p-value of the main effect of region is 6.67e-14 which is less than the significance level of 0.05. So, we reject H0 and conclude that there is significant evidence that atleast one region has different fertility rates while controlling for contraceptors.