Question

In: Statistics and Probability

Drilling down beneath a lake in Alaska yields chemical evidence of past changes in climate. Biological...

Drilling down beneath a lake in Alaska yields chemical evidence of past changes in climate. Biological silicon, left by the skeletons of single-celled creatures called diatoms, measures the abundance of life in the lake. A rather complex variable based on the ratio of certain isotopes relative to ocean water gives an indirect measure of moisture, mostly from snow. As we drill down, we look farther into the past. Here are data from 2300 to 12,000 years ago:

Isotope
(%)
Silicon
(mg/g)
Isotope
(%)
Silicon
(mg/g)
Isotope
(%)
Silicon
(mg/g)
−19.90 95 −20.71 154 −21.63 226
−19.84 108 −20.80 263 −21.63 237
−19.46 118 −20.86 269 −21.19 188
−20.20 139 −21.28 296 −19.37 339


(b) Find the single outlier in the data. This point strongly influences the correlation. What is the correlation with this point? (Round your answer to two decimal places.)


What is the correlation without this point? (Round your answer to two decimal places.)


(c) Is the outlier also strongly influential for the regression line? Calculate the regression line with the outlier. (Round your slope to two decimal places, round your y-intercept to one decimal place.)
ŷ =__________ − ___________ x

Calculate the regression line without the outlier. (Round your slope to two decimal places, round your y-intercept to one decimal place.)
ŷ = __________ − ___________ x

2. Runners are concerned about their form when racing. One measure of form is the stride rate, the number of steps taken per second. As running speed increases, the stride rate should also increase. In a study of 21 of the best American female runners, researchers measured the stride rate for different speeds. The following table gives the speeds (in feet per second) and the mean stride rates for these runners.

Speed 15.87 16.93 17.58 18.66 20.15 21.27 22.06
Stride rate 3.05 3.19 3.21 3.31 3.48 3.59 3.65


(b) Find the equation of the regression line of stride rate on speed. Draw this line on your plot. (Round your answers to four decimal places.)

stride rate =__________ + __________ speed


(c) For each of the speeds given, obtain the predicted value of the stride rate and the residual. Verify that the residuals add to zero. (Round your answers to three decimal places.)

     predicted value      residual
15.87          
16.93          
17.58          
18.66          
20.15          
21.27          
22.06          

Solutions

Expert Solution

Here I am using R software to answer this question.

1.

R-code

iso=c(-19.90,-19.84,-19.46,-20.20,-20.71,-20.80,-20.86,-21.28,-21.63,-21.63,-21.19,-19.37)
sili=c(95,108,118,139,154,263,269,296,226,237,188,339)
plot(iso,sili,type="p")
summary(lm(iso~sili))

OUTPUT:

From the scatter plot of the data we can see that the data point (-19.37,339) is the single outlier of this data set.

Call:
lm(formula = iso ~ sili)

Residuals:
Min 1Q Median 3Q Max
-0.97916 -0.46230 -0.04487 0.33694 1.66021

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.892089 0.642933 -30.940 2.92e-11 ***
sili -0.003357 0.002965 -1.132 0.284
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7923 on 10 degrees of freedom
Multiple R-squared: 0.1137, Adjusted R-squared: 0.02502
F-statistic: 1.282 on 1 and 10 DF, p-value: 0.2839

And the correlation is (with the outlier) = 0.11

Removing the outlier and re-fit the model.

R-code:

iso=c(-19.90,-19.84,-19.46,-20.20,-20.71,-20.80,-20.86,-21.28,-21.63,-21.63,-21.19)
sili=c(95,108,118,139,154,263,269,296,226,237,188)
plot(iso,sili,type="p")
summary(lm(iso~sili))

OUTPUT:

Call:
lm(formula = iso ~ sili)

Residuals:
Min 1Q Median 3Q Max
-0.65575 -0.42593 0.06214 0.36672 0.63025

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.124401 0.429562 -44.521 7.27e-12 ***
sili -0.008185 0.002125 -3.851 0.0039 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4806 on 9 degrees of freedom
Multiple R-squared: 0.6224, Adjusted R-squared: 0.5804
F-statistic: 14.83 on 1 and 9 DF, p-value: 0.003898

And the correlation is (without the outlier) = 0.62

Without Outlier,

With Outlier, (Rounding of 2 decimal it becomes 0, so it is rounded of 3 decimal)

2.

R-code:

s.rate=c(3.05,3.19,3.21,3.31,3.48,3.59,3.65)
speed=c(15.87,16.93,17.58,18.66,20.15,21.27,22.06)
plot(speed,s.rate,type="p",main="Regression equation of stride rate on speed")
abline(lm(s.rate~speed))
summary(lm(s.rate~speed))
round(predict(lm(s.rate~speed)),3)
round(resid(lm(s.rate~speed)),3)
round(sum(resid(lm(s.rate~speed))))

OUTPUT:

stride rate = 1.5232 + 0.0967 speed

> round(predict(lm(s.rate~speed)),3)
1 2 3 4 5 6 7
3.058 3.161 3.224 3.328 3.472 3.580 3.657
> round(resid(lm(s.rate~speed)),3)
1 2 3 4 5 6 7
-0.008 0.029 -0.014 -0.018 0.008 0.010 -0.007

> round(sum(resid(lm(s.rate~speed))))
[1] 0

It is verified that sum of the residuals is zero.


Related Solutions

ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT