Question

In: Statistics and Probability

Use RStudio to answer this question. Before opening the dataset needed for this problem, you’ll need...

Use RStudio to answer this question.

Before opening the dataset needed for this problem, you’ll need to call the “car” package

> library(car)

Now you can import the “Robey” dataset and use it to answer the question below. Name the data frame with abc:

> abc <- Robey

Remember to include any R code you use along with your answers!

The Robey dataset contains fertility rates from a sample of countries. You want to see if total fertility rate (tfr), which is the average number of children per woman, differs across the four regions (region) of countries in the dataset. There is also a variable that indicates the percent of married women of childbearing age who use contraception (contraceptors).

a) Is there a difference in mean total fertility rate across the four regions? Carry out a one-way ANOVA to answer this question, and include all steps for full credit

b) Which pairs of regions have significantly different mean total fertility rates?

c) Now run an ANOVA model (main effects only) to determine if region is still significant while controlling for contraceptors. You do not need to write hypotheses or check assumptions. Write a full conclusion for the main effect of region in this new multivariate model.

Expert Solution

a)

H0: Total fertility rate (tfr) is same for all four regions.

H1: Atleast one region has different fertility rates.

Run levene test to check for assumptions of homogeneity of variance to conduct anova test.

> leveneTest(tfr ~ region, data = abc)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.6144 0.6092
46

From the output above we can see that the p-value (0.6092) is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.

Run one-way ANOVA using below command in R studio.

> summary(anova.test <- aov(tfr ~ region, data = abc))
Df Sum Sq Mean Sq F value Pr(>F)
region 3 44.30 14.768 11.47 9.72e-06 ***
Residuals 46 59.21 1.287
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The p-value of the anova test is 9.72e-06 which is less than the significance level of 0.05. So, we reject H0 and conclude that there is significant evidence that atleast one region has different fertility rates.

b)

We will conduct Tukey HSD test to see which pairs of regions have significantly different mean total fertility rates.

> TukeyHSD(anova.test)
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = tfr ~ region, data = abc)

$region
diff lwr upr p adj
Asia-Africa -2.315556 -3.5082625 -1.1228486 0.0000283
Latin.Amer-Africa -1.805556 -2.8446002 -0.7665109 0.0001708
Near.East-Africa -1.055556 -2.4811130 0.3700019 0.2127807
Latin.Amer-Asia 0.510000 -0.7090392 1.7290392 0.6821544
Near.East-Asia 1.260000 -0.3016200 2.8216200 0.1526528
Near.East-Latin.Amer 0.750000 -0.6976605 2.1976605 0.5174900

We see that p-value for Asia-Africa and Latin.Amer-Africa regions are less than 0.05 significance level. Thus, Asia-Africa and Latin.Amer-Africa regions have significantly different mean total fertility rates.

c)

Run two-way ANOVA (including the variable contraceptors) using below command in R studio.

> summary(anova.test <- aov(tfr ~ region + contraceptors, data = abc))
Df Sum Sq Mean Sq F value Pr(>F)
region 3 44.30 14.77 46.92 6.67e-14 ***
contraceptors 1 45.05 45.05 143.12 1.42e-15 ***
Residuals 45 14.16 0.31
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The p-value of the main effect of region is 6.67e-14 which is less than the significance level of 0.05. So, we reject H0 and conclude that there is significant evidence that atleast one region has different fertility rates while controlling for contraceptors.

orchestra answered 1 year ago

Before opening the dataset needed for this problem, you’ll need to call the “car” package: >...

Before opening the dataset needed for this problem, you’ll need to call the “car” package: > library(car) Now you can import the “Wong” dataset and use it to answer the question below. Remember to include any code you use along with your answers in your submission! 3. The Wong dataset contains data from a study by Wong, Monette, and Weiner (2001) on patients who fell into comas after sustaining traumatic brain injuries. After waking, Wong and colleagues administered two different...

R work (must be done in R) Before opening the dataset needed for this problem, you’ll...

R work (must be done in R) Before opening the dataset needed for this problem, you’ll need to call the “car” package. Run the following line of code: > library(car)  Now you can import the “ Cowles” dataset and use it to answer the question below. Name the data frame with your EID: > my_eid <- Cowles Remember to include any code you use along with your answers in your submission! 3. Cowles and Davis (1987) collected data on...

To find the dataset needed for this problem, you’ll first need to open the “swiss” dataset...

To find the dataset needed for this problem, you’ll first need to open the “swiss” dataset that is contained in R by running the following line: > data('swiss') Now you can rename the “swiss” dataset and use it to answer the question below. Name the data frame with your UT EID: > my_variable <- swiss This dataset contains socio-economic indicators for the French-speaking provinces of Switzerland in the year 1888. Among the variables, “Agriculture” is the percentage of the...

I need to answer these questions in rstudio - Continuous Probability Distributions Question 4: A printer...

I need to answer these questions in rstudio - Continuous Probability Distributions Question 4: A printer manufacturer determines that the time to failure of one of their ink cartridges is an approximately normally distributed random variable with a mean of 5570 pages and a standard deviation of 237 pages. For marketing purposes they wish to offer a warranty to purchasers, but to protect profitability of the company, they wish to set the warranty period so that at most 2% of...

Please use RStudio to answer the question and give the R command: please load data use...

Please use RStudio to answer the question and give the R command: please load data use data: library(MASS) data(cats) Use the “cats” data set to test for the variance of the body weight in male and female cats

This question is about bond duration. You’ll need to use derivatives. Remember your calculus: The chain...

This question is about bond duration. You’ll need to use derivatives. Remember your calculus: The chain rule tells you that the derivative of ln(P(YTM)), with respect to YTM, is P’(YTM)/P(YTM). a. Consider a zero-coupon bond which pays off $F in T years. What is its duration when its yield to maturity (YTM) is zero? Show your work. b. Now consider a T year bond which pays a coupon of $C each year. (It makes no final face payment.) What is...

I need this in R code please: Use the dataset ’juul’ in package ’ISwR’ to answer...

I need this in R code please: Use the dataset ’juul’ in package ’ISwR’ to answer the question. (1) Conduct one-way ANOVA test to test if the mean of igf1 of each level of tanner are the same? (2) What is the mean of igf1 in each level of tanner? (3) If there is any difference, which ones appear to be different? (Use pairwise t test for each pair of level with bonferroni method)

Solve it by R Use the ‘cement’ dataset in ‘MASS’ package to answer the question. (1)...

Solve it by R Use the ‘cement’ dataset in ‘MASS’ package to answer the question. (1) Conduct the multiple linear regression, regress y value on x1, x2, x3 and x4 (without intercept). Report the estimated coefficients. Which predictor variables have strong linear relationship with response variable y at significance level 0.05? (2) What is the adjusted R square of your regression? What is the interquartile range (IQR) of the residuals from your regression? (3) Conduct a best subset regression (with...

This question involves coding in RStudio Problem 3 (Verzani problem 2.22): Write a function to compute...

This question involves coding in RStudio Problem 3 (Verzani problem 2.22): Write a function to compute the average distance from the mean for some data vector.

Use the following linear regression equation regarding airline tickets to answer the question. (The dataset collected...

Use the following linear regression equation regarding airline tickets to answer the question. (The dataset collected for Distance was from 500 miles to 5,687 miles) Note: that Distance is the number of miles between the departure and arrival cities, and Price is the cost in dollars of an airline ticket. (a) Find the slope using the linear regression equation given to you above. Inter- pret the value that you got for the slope in the context of the problem. Predicted...