In: Statistics and Probability
Use y=faithful$eruptions; x=faithful$waiting to set the eruption durations and waiting time between eruptions of the R data set faithful in objects y and x, respectively, and complete the following parts.
1. Make an scatterplot of the (x,y) data. Does it support the assumption that the data follows the simple linear regression model? (Include the plot with your answer.)
2. Fit the simple linear regression model and construct a 90% CI for: a) the slope, and b) the mean eruption duration following a 70 min waiting time.
3. Do a 90% CI for the marginal mean of eruption duration in two ways: a) ignoring the regression structure (i.e., ignoring the data on waiting time between eruptions), and b) taking into consideration the regression structure. Which CI is shorter?
4. Do an 80% PI for the next eruption duration following a 70 min waiting time.
5. Do an 80% PI for the next eruption duration (without any information on the waiting time).
6. Do a 95% CI for the proportion of eruption durations that last at least 4 min. [NOTE: The R command phat=sum(y>=4)/length(y) sets the sample proportion into the object phat.]
For all parts, include the R commands and R output with your answer (Copy it from the R console and paste it).
data("faithful")
x<-faithful$waiting
y<-faithful$eruptions
Does it support the assumption that the data follows the simple linear regression model?
Yes, we can use a simple linear regression model to this data.
b)
model1<-lm(y~x)
summary(model1)
2
a)
Confidence interval of slope at x = 70
confint(model1,x=70,level = 0.9)
> confint(model1,x=70,level = 0.80)
10 % 90 %
(Intercept) -2.0797513 -1.6682807
x 0.0727778 0.0784781
b)
The mean eruption duration following a 70 min waiting time.
newx<- data.frame(
x = c(70))
predict(model1, newdata = newx, interval = "confidence",level =
0.90)
> newx<- data.frame(
+ x = c(70))
> predict(model1, newdata = newx, interval = "confidence",level
= 0.90)
fit lwr
upr
3.584106 3.499065 3.669146
3. Do a 90% CI for the marginal mean of eruption duration in two ways: a) ignoring the regression structure (i.e., ignoring the data on waiting time between eruptions), and b) taking into consideration the regression structure. Which CI is shorter?
a)
> m<-mean(y)
> s<-sd(y)
> m+qnorm(1-0.10/2)*s
[1] 5.348947
> m-qnorm(1-0.10/2)*s
[1] 1.57268
90% CI is given by (5.348947, 1.57268)
4)
Do an 80% PI for the next eruption duration following a 70 min waiting time.
newx<- data.frame(
x = c(70))
predict(model1, newdata = newx, interval = "prediction",level =
0.80)
> predict(model1, newdata = newx, interval =
"prediction",level = 0.80)
fit lwr upr
1 3.41994 2.780896 4.058985