Question

In: Statistics and Probability

The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

The Book of R (Question 20.2) Please answer using R code.

Continue using the survey data frame from the package MASS for the next few exercises.

  1. The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise.
  2. Assuming independence of the observations and normality as usual, fit a linear regression model with height as the response variable and exercise as the explanatory variable (dummy coding). What’s the default reference level of the predictor? Produce a model summary.
  3. Draw a conclusion based on the fitted model from (b)—does it appear that exercise frequency has any impact on mean height? What is the nature of the estimated effect?
  4. Predict the mean heights of one individual in each of the three exercise categories, accompanied by 95 percent prediction intervals.
  5. Do you arrive at the same result and interpretation for the height-by-exercise model if you construct an ANOVA table using aov ?
  6. Is there any change to the outcome of (e) if you alter the model so that the reference level of the exercise variable is “none”? Would you expect there to be?

Now, turn back to the ready-to-use mtcars data set. One of the variables in this data frame is qsec , described as the time in seconds it takes to race a quarter mile; another is gear , the number of forward gears (cars in this data set have either 3, 4, or 5 gears).

  1. Using the vectors straight from the data frame, fit a simple linear regression model with qsec as the response variable and gear as the explanatory variable and interpret the model summary.
  2. Explicitly convert gear to a factor vector and refit the model. Compare the model summary with that from (g). What do you find?
  3. Explain, with the aid of a relevant plot in the same style as the right image of Figure 20-6 why you think there is a difference between the two models (g) and (h).

Solutions

Expert Solution

> library(MASS)
> #a)
> library(plyr)
> count(survey,vars='Exer')
Exer freq
1 Freq 115
2 None 24
3 Some 98

> par(mfrow=c(1,3))
> attach(survey)

> s1=survey[which(Exer=='Freq'),]
> s2=survey[which(Exer=='None'),]
> s3=survey[which(Exer=='Some'),]
> boxplot(s1$Height,xlab="Freq")
> boxplot(s2$Height,xlab="None")
> boxplot(s3$Height,xlab="Some")


> #b)
> linmodel=lm(Height~Exer)
> summary(linmodel)

Call:
lm(formula = Height ~ Exer)

Residuals:
Min 1Q Median 3Q Max
-24.607 -6.397 -1.607 6.103 25.393

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 174.6067 0.9396 185.836 < 2e-16 ***
ExerNone -5.5787 2.3489 -2.375 0.01847 *
ExerSome -4.2098 1.4094 -2.987 0.00316 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.628 on 206 degrees of freedom
(28 observations deleted due to missingness)
Multiple R-squared: 0.05333, Adjusted R-squared: 0.04414
F-statistic: 5.802 on 2 and 206 DF, p-value: 0.003536

>
> #c)
> #Based on the p-value for the model (=0.003536), the exercise frequency does have an impact on the height since the null corresponding to equality of means in rejected at 5% level.

>
> #d) lwr and upr give the 95% prediction intervals and fit gives the mean
> predict(linmodel, s1, interval="predict")
fit lwr upr
7 174.6067 155.5349 193.6784
> predict(linmodel, s2, interval="predict")
fit lwr upr
2 169.028 149.5777 188.4783
> predict(linmodel, s3, interval="predict")
fit lwr upr
1 170.3969 151.3027 189.4911

> #e)
> aov(Height~Exer)
Call:
aov(formula = Height ~ Exer)

Terms:
Exer Residuals
Sum of Squares 1075.657 19094.894
Deg. of Freedom 2 206

Residual standard error: 9.627755
Estimated effects may be unbalanced
28 observations deleted due to missingness
> summary(aov(Height~Exer))
Df Sum Sq Mean Sq F value Pr(>F)   
Exer 2 1076 537.8 5.802 0.00354 **
Residuals 206 19095 92.7   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
28 observations deleted due to missingness
> #No change from the result obtained in the linear model. p-value is also the same.

Question 2

> attach(mtcars)

#a)

> linmod2=lm(qsec~gear)
> linmod2

Call:
lm(formula = qsec ~ gear)

Coefficients:
(Intercept) gear
19.452 -0.449

> summary(linmod2)

Call:
lm(formula = qsec ~ gear)

Residuals:
Min 1Q Median 3Q Max
-2.7074 -1.0629 -0.2164 1.2436 2.3436

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.4523 1.5131 12.856 8.96e-13 ***
gear -0.4490 0.4039 -1.112 0.276
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.517 on 26 degrees of freedom
Multiple R-squared: 0.04537, Adjusted R-squared: 0.008657
F-statistic: 1.236 on 1 and 26 DF, p-value: 0.2765

#From the p-value, we see that the null is accepted and hence gear is not a significant variable when qsec is the response.

> #b)
> linmod3=lm(qsec~factor(gear))
> summary(linmod3)

Call:
lm(formula = qsec ~ factor(gear))

Residuals:
Min 1Q Median 3Q Max
-2.29308 -0.66058 -0.04727 0.81568 2.51692

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.7031 0.3549 49.880 <2e-16 ***
factor(gear)4 0.9042 0.5242 1.725 0.0969 .
factor(gear)5 -1.8031 0.7317 -2.464 0.0210 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.28 on 25 degrees of freedom
Multiple R-squared: 0.3468, Adjusted R-squared: 0.2945
F-statistic: 6.635 on 2 and 25 DF, p-value: 0.004881

#The difference arises because in the first case the gear column is treated as a continuous variable and in the second case it is a categorical variable with 3 levels.


Related Solutions

Answer IN R CODE please. Using the data below, Create a scatterplot of y vs x...
Answer IN R CODE please. Using the data below, Create a scatterplot of y vs x (show this) and fit it a simple linear regression model using y as the response and plot the regression line (with the data). Show this as well. Test whether x is a significant predictor and create a 95% CI around the slope coefficient. What does the coefficient of determinations represent? For x=20, create a CI for E(Y|X=20). Show this. For x=150, can you use...
*Please answer the following question using R code* 3. A bank wants to get new customers...
*Please answer the following question using R code* 3. A bank wants to get new customers for their credit card. They try two different approaches in their marketing campaign. The first promises a "cash back" reward, and the second promises low interest rates. A sample of 500 people is mailed the first brochure; of these, 125 get the credit card. A separate sample of 500 people is mailed the second brochure; 150 get the credit card. Are the two campaigns...
Answer the following bootstrap question by showing the R code : A set of data X...
Answer the following bootstrap question by showing the R code : A set of data X contains the following numbers: 119.7 104.1 92.8 85.4 108.6 93.4 67.1 88.4 101.0 97.2 95.4 77.2 100.0 114.2 150.3 102.3 105.8 107.5 0.9 94.1 We generated n = 20 observations Xi = 10 Wi+100, where Wi has a contaminated normal distribution with proportion of contamination 20% and σc = 4. Suppose we are interested in testing: H0 : μ = 90 versus H1 :...
Answer IN R CODE to get the following. Using the data below, Create a scatterplot of...
Answer IN R CODE to get the following. Using the data below, Create a scatterplot of y vs x Fit a simple linear regression model using y as the response and plot the regression line (with the data) Test whether x is a significant predictor and create a 95% CI around the slope coefficient. Report and interpret the coefficient of determination. For x=20, create a CI for E(Y|X=20). For x=150, can you use the model to estimate E(Y|X=150)? Discuss. Does...
Use R to answer the following question. Copy and paste the code and answer from R...
Use R to answer the following question. Copy and paste the code and answer from R into your paper. On the average,five cars arrive at a particular car wash every hour. Let X count the number of cars that arrive from 10 AM to 11 AM. Then X ∼pois(lambda = 5). Also, μ = σ2 = 5.  What is the probability that no car arrives during this period?  Suppose the car wash above is in operation from 8AM...
Can You please Answer the question using R studio and R cloud Telomeres are complexes of...
Can You please Answer the question using R studio and R cloud Telomeres are complexes of DNA and protein that cap chromosomal ends. They consist of the same short DNA sequence TTAGGG repeated over and over again. They tend to shorted with cell divisions and with advancing cellular age, which will lead to chromosome instability or apoptosis (programmed cell death). Eukaryotic cells have the ability to reverse telomere shortening by expressing telomerase, an enzyme that extends the telomeres of chromosomes....
Please use RStudio to answer the question and give the R command: please load data use...
Please use RStudio to answer the question and give the R command: please load data use data: library(MASS) data(cats) Use the “cats” data set to test for the variance of the body weight in male and female cats
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
<<<<<<<<<<Please USE DATA IN QUESTION #9 TO ANSWER QUESTION 10#. #####please answere question 10 using question...
<<<<<<<<<<Please USE DATA IN QUESTION #9 TO ANSWER QUESTION 10#. #####please answere question 10 using question 9 ............. 9. A company that manufactures bicycles receives steel rods in large shipments from a supplier. From past experience, the manager knows that inches and that the lengths .12 σ = 0 are approximately normally distributed. He takes a random sample of 12 rods. The lengths of the sampled rods are as follows: 12.12 11.90 11.94 12.05 12.07 11.97 12.01 12.08 12.05 12.12...
(Be sure to paste the R Console Output and code!!!) Using the following data and R,...
(Be sure to paste the R Console Output and code!!!) Using the following data and R, write a brief paragraph about whether the in-home treatment is equally effective as the out-of-home treatment for two separate groups. Here are the data. The outcome variable is level of anxiety after treatment on a scale from 1 to 10. In-Home Treatment Out-of-Home Treatment 3 7 4 6 1 7 1 8 1 7 3 6 3 5 6 6 5 4 1 2...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT