In: Statistics and Probability
Drilling down beneath a lake in Alaska yields chemical evidence of past changes in climate. Biological silicon, left by the skeletons of single-celled creatures called diatoms, measures the abundance of life in the lake. A rather complex variable based on the ratio of certain isotopes relative to ocean water gives an indirect measure of moisture, mostly from snow. As we drill down, we look farther into the past. Here are data from 2300 to 12,000 years ago:
Isotope (%) |
Silicon (mg/g) |
Isotope (%) |
Silicon (mg/g) |
Isotope (%) |
Silicon (mg/g) |
---|---|---|---|---|---|
−19.90 | 95 | −20.71 | 154 | −21.63 | 226 |
−19.84 | 108 | −20.80 | 263 | −21.63 | 237 |
−19.46 | 118 | −20.86 | 269 | −21.19 | 188 |
−20.20 | 139 | −21.28 | 296 | −19.37 | 339 |
(b) Find the single outlier in the data. This point strongly
influences the correlation. What is the correlation with this
point? (Round your answer to two decimal places.)
What is the correlation without this point? (Round your answer to
two decimal places.)
(c) Is the outlier also strongly influential for the regression
line? Calculate the regression line with the outlier. (Round your
slope to two decimal places, round your y-intercept to one
decimal place.)
ŷ =__________ − ___________ x
Calculate the regression line without the outlier. (Round your
slope to two decimal places, round your y-intercept to one
decimal place.)
ŷ = __________ − ___________ x
2. Runners are concerned about their form when racing. One measure of form is the stride rate, the number of steps taken per second. As running speed increases, the stride rate should also increase. In a study of 21 of the best American female runners, researchers measured the stride rate for different speeds. The following table gives the speeds (in feet per second) and the mean stride rates for these runners.
Speed | 15.87 | 16.93 | 17.58 | 18.66 | 20.15 | 21.27 | 22.06 |
Stride rate | 3.05 | 3.19 | 3.21 | 3.31 | 3.48 | 3.59 | 3.65 |
(b) Find the equation of the regression line of stride rate on
speed. Draw this line on your plot. (Round your answers to four
decimal places.)
stride rate | =__________ + __________ speed |
(c) For each of the speeds given, obtain the predicted value of the
stride rate and the residual. Verify that the residuals add to
zero. (Round your answers to three decimal places.)
predicted value | residual | |||
15.87 | ||||
16.93 | ||||
17.58 | ||||
18.66 | ||||
20.15 | ||||
21.27 | ||||
22.06 |
Here I am using R software to answer this question.
1.
R-code
iso=c(-19.90,-19.84,-19.46,-20.20,-20.71,-20.80,-20.86,-21.28,-21.63,-21.63,-21.19,-19.37)
sili=c(95,108,118,139,154,263,269,296,226,237,188,339)
plot(iso,sili,type="p")
summary(lm(iso~sili))
OUTPUT:
From the scatter plot of the data we can see that the data point (-19.37,339) is the single outlier of this data set.
Call:
lm(formula = iso ~ sili)
Residuals:
Min 1Q Median 3Q Max
-0.97916 -0.46230 -0.04487 0.33694 1.66021
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.892089 0.642933 -30.940 2.92e-11 ***
sili -0.003357 0.002965 -1.132 0.284
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7923 on 10 degrees of freedom
Multiple R-squared: 0.1137, Adjusted R-squared: 0.02502
F-statistic: 1.282 on 1 and 10 DF, p-value: 0.2839
And the correlation is (with the outlier) = 0.11
Removing the outlier and re-fit the model.
R-code:
iso=c(-19.90,-19.84,-19.46,-20.20,-20.71,-20.80,-20.86,-21.28,-21.63,-21.63,-21.19)
sili=c(95,108,118,139,154,263,269,296,226,237,188)
plot(iso,sili,type="p")
summary(lm(iso~sili))
OUTPUT:
Call:
lm(formula = iso ~ sili)
Residuals:
Min 1Q Median 3Q Max
-0.65575 -0.42593 0.06214 0.36672 0.63025
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.124401 0.429562 -44.521 7.27e-12 ***
sili -0.008185 0.002125 -3.851 0.0039 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4806 on 9 degrees of freedom
Multiple R-squared: 0.6224, Adjusted R-squared: 0.5804
F-statistic: 14.83 on 1 and 9 DF, p-value: 0.003898
And the correlation is (without the outlier) = 0.62
Without Outlier,
With Outlier, (Rounding of 2 decimal it becomes 0, so it is rounded of 3 decimal)
2.
R-code:
s.rate=c(3.05,3.19,3.21,3.31,3.48,3.59,3.65)
speed=c(15.87,16.93,17.58,18.66,20.15,21.27,22.06)
plot(speed,s.rate,type="p",main="Regression equation of stride rate
on speed")
abline(lm(s.rate~speed))
summary(lm(s.rate~speed))
round(predict(lm(s.rate~speed)),3)
round(resid(lm(s.rate~speed)),3)
round(sum(resid(lm(s.rate~speed))))
OUTPUT:
stride rate = 1.5232 + 0.0967 speed
> round(predict(lm(s.rate~speed)),3)
1 2 3 4 5 6 7
3.058 3.161 3.224 3.328 3.472 3.580 3.657
> round(resid(lm(s.rate~speed)),3)
1 2 3 4 5 6 7
-0.008 0.029 -0.014 -0.018 0.008 0.010 -0.007
> round(sum(resid(lm(s.rate~speed))))
[1] 0
It is verified that sum of the residuals is zero.