Question

In: Statistics and Probability

Use y=faithful$eruptions; x=faithful$waiting to set the eruption durations and waiting time between eruptions of the R...

Use y=faithful$eruptions; x=faithful$waiting to set the eruption durations and waiting time between eruptions of the R data set faithful in objects y and x, respectively, and complete the following parts.

1. Make an scatterplot of the (x,y) data. Does it support the assumption that the data follows the simple linear regression model? (Include the plot with your answer.)

2. Fit the simple linear regression model and construct a 90% CI for: a) the slope, and b) the mean eruption duration following a 70 min waiting time.

3. Do a 90% CI for the marginal mean of eruption duration in two ways: a) ignoring the regression structure (i.e., ignoring the data on waiting time between eruptions), and b) taking into consideration the regression structure. Which CI is shorter?

4. Do an 80% PI for the next eruption duration following a 70 min waiting time.

5. Do an 80% PI for the next eruption duration (without any information on the waiting time).

6. Do a 95% CI for the proportion of eruption durations that last at least 4 min. [NOTE: The R command phat=sum(y>=4)/length(y) sets the sample proportion into the object phat.]

For all parts, include the R commands and R output with your answer (Copy it from the R console and paste it).

Solutions

Expert Solution

data("faithful")
x<-faithful$waiting
y<-faithful$eruptions

Does it support the assumption that the data follows the simple linear regression model?

Yes, we can use a simple linear regression model to this data.

b)

model1<-lm(y~x)
summary(model1)

2
a)

Confidence interval of slope at x = 70

confint(model1,x=70,level = 0.9)

> confint(model1,x=70,level = 0.80)
10 % 90 %
(Intercept) -2.0797513 -1.6682807
x 0.0727778 0.0784781

b)

The mean eruption duration following a 70 min waiting time.

newx<- data.frame(
x = c(70))
predict(model1, newdata = newx, interval = "confidence",level = 0.90)

> newx<- data.frame(
+ x = c(70))
> predict(model1, newdata = newx, interval = "confidence",level = 0.90)
fit   
lwr   upr
3.584106 3.499065 3.669146

3. Do a 90% CI for the marginal mean of eruption duration in two ways: a) ignoring the regression structure (i.e., ignoring the data on waiting time between eruptions), and b) taking into consideration the regression structure. Which CI is shorter?

a)

> m<-mean(y)
> s<-sd(y)
> m+qnorm(1-0.10/2)*s
[1] 5.348947
> m-qnorm(1-0.10/2)*s
[1] 1.57268

90% CI is given by (5.348947, 1.57268)

4)

Do an 80% PI for the next eruption duration following a 70 min waiting time.

newx<- data.frame(
x = c(70))
predict(model1, newdata = newx, interval = "prediction",level = 0.80)

> predict(model1, newdata = newx, interval = "prediction",level = 0.80)
fit lwr upr
1 3.41994 2.780896 4.058985


Related Solutions

The following data represents the heights of the old faithful geyser eruptions, the durations of the...
The following data represents the heights of the old faithful geyser eruptions, the durations of the eruption and the interval between eruptions. The data is attached and an excel file is also included on canvas. The data is arranged in duration, interval and height a) Use the paired data for durations and intervals after eruptions of the geyser. Is there significant linear correlation at the 0.05 significance level suggesting interval after an eruption is related to duration (use the r...
2. An environmentalist is interested in estimating the mean time between eruptions of Old Faithful Geyser...
2. An environmentalist is interested in estimating the mean time between eruptions of Old Faithful Geyser in Yellowstone National Park. Over the course of a year, she takes a random sample of the time intervals (in minutes) between eruptions. Her results are recorded in the following table. Both the population mean and standard deviation are unknown. Time Between Eruptions (in minutes) 63 63 71 77 81 65 67 84 72 75 70 70 93 83 85 79 90 74 81...
The following are some of interval times (minutes) between eruptions of the Old Faithful geyser in...
The following are some of interval times (minutes) between eruptions of the Old Faithful geyser in Yellowstone National Park. 81​81​86​87​89​92​93​94 95​96​97​98 ​98​101​101​106​ A. Find the 5 number summary (you may use the technology).
Let R[x, y] be the set of polynomials in two coefficients. Prove that R[x, y] is...
Let R[x, y] be the set of polynomials in two coefficients. Prove that R[x, y] is a vector space over R. A polynomial f(x, y) is called degree d homogenous polynomial if the combined degree in x and y of each term is d. Let Vd be the set of degree d homogenous polynomials from R[x, y]. Is Vd a subspace of R[x, y]? Prove your answer.
Azzalini and Bowman (1990) analyzed the data of the waiting time (in minutes) of consecutive eruptions...
Azzalini and Bowman (1990) analyzed the data of the waiting time (in minutes) of consecutive eruptions of the Old Faithful geyser in Yellowstone National Park. They found that the waiting times can be categorized into two groups in historical data. -In group 1, waiting time is normally distributed with mean 54 minutes and standard deviation of 2.95. -In group 2, the waiting time is normally distributed with mean 80 minutes and standard deviation of 7.5. Suppose that there is a...
*(4) (a) Prove that if p=(x,y) is in the set where y<x and if r=distance from...
*(4) (a) Prove that if p=(x,y) is in the set where y<x and if r=distance from p to the line y=x then the ball about p of radius r does not intersect with the line y=x. (b) Prove that the set where y<c is an open set. Justify your answer
Consider the region R, which is bounded by the curves  y=3x and x=y(4−y). (a) Set up, but...
Consider the region R, which is bounded by the curves  y=3x and x=y(4−y). (a) Set up, but DO NOT SOLVE, an integral to find the area of the region RR. (b) Set up, but DO NOT SOLVE, an integral to find the volume of the solid resulting from revolving the region RRaround the xx-axis. (c) Set up, but DO NOT SOLVE, an integral to find the volume of the solid resulting from revolving the region RRaround the line x=−5x=−5.
(6) Define a binary operation ∗ on the set G = R^2 by (x, y) ∗...
(6) Define a binary operation ∗ on the set G = R^2 by (x, y) ∗ (x', y') = (x + x', y + y'e^x) (a) Show that (G, ∗) is a group. Specifically, prove that the associative law holds, find the identity e, and find the inverse of (x, y) ∈ G. (b) Show that the group G is not abelian. (c). Show that the set H= (x*x=e) is a subgroup of G.
Consider the region R enclosed between the curves y = 2 /x and y = 1,...
Consider the region R enclosed between the curves y = 2 /x and y = 1, between x = 1 and x = 2. Calculate the volume of the solid obtained by revolving R about the x-axis, (a) using cylindrical shells; (b) using washers
Suppose a geyser has a mean time between eruptions of 71 minutes.Let the interval of time...
Suppose a geyser has a mean time between eruptions of 71 minutes.Let the interval of time between the eruptions be normally distributed with standard deviation 24 minutes. Complete parts ​(a) through ​(e) below. ​(a) What is the probability that a randomly selected time interval between eruptions is longer than 83 ​minutes? The probability that a randomly selected time interval is longer than 83 minutes is approximately ____ ​ (Round to four decimal places as​ needed.) ​(b) What is the probability...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT