In: Statistics and Probability
The excel file gives the city and highway gas mileage for 21 two-seater cars, including the Leaf hybrid car:
Type | City | Hwy |
T | 17 | 24 |
T | 20 | 28 |
T | 20 | 28 |
T | 17 | 25 |
T | 18 | 25 |
T | 12 | 20 |
T | 11 | 16 |
T | 10 | 16 |
T | 17 | 23 |
T | 60 | 66 |
T | 9 | 15 |
T | 9 | 13 |
T | 15 | 22 |
T | 12 | 17 |
T | 22 | 28 |
T | 16 | 23 |
T | 13 | 19 |
T | 20 | 26 |
T | 20 | 29 |
T | 15 | 23 |
T | 26 | 32 |
M | 12 | 19 |
M | 21 | 29 |
M | 19 | 27 |
M | 19 | 28 |
M | 16 | 23 |
M | 18 | 26 |
M | 16 | 23 |
M | 18 | 23 |
M | 25 | 32 |
M | 23 | 31 |
M | 20 | 29 |
M | 18 | 26 |
M | 14 | 22 |
Make a scatterplot of highway mileage y against city mileage x
for all 21 cars. There is a strong positive linear association. The
Leaf lies far from the other points. Does the Leaf extend the
linear pattern of the other cars, or is it far from the line they
form?
(b) Find the correlation between city and the highway mileage both
without and with the Leaf. Based on your answer to (a), explain why
r changes in this direction when you add the Leaf.
Please, give a detailed answer. Thank you!
a)
Yes the Leaf extend the linear pattern of the other cars.
b)
Without Leaf
ΣX | ΣY | Σ(x-x̅)² | Σ(y-ȳ)² | Σ(x-x̅)(y-ȳ) | |
total sum | 379 | 518 | 2256.952381 | 2324.7 | 2275.33 |
mean | 18.05 | 24.67 | SSxx | SSyy | SSxy |
sample size , n = 21
here, x̅ = Σx / n= 18.05 ,
ȳ = Σy/n = 24.67
SSxx = Σ(x-x̅)² = 2256.9524
SSxy= Σ(x-x̅)(y-ȳ) = 2275.3
estimated slope , ß1 = SSxy/SSxx = 2275.3
/ 2256.952 = 1.0081
intercept, ß0 = y̅-ß1* x̄ =
6.4721
so, regression line is Ŷ =
6.4721 + 1.0081 *x
SSE= (SSxx * SSyy - SS²xy)/SSxx =
30.803
std error ,Se = √(SSE/(n-2)) =
1.273
correlation coefficient , r = Sxy/√(Sx.Sy)
= 0.9934
With Leaf:
ΣX | ΣY | Σ(x-x̅)² | Σ(y-ȳ)² | Σ(x-x̅)(y-ȳ) | |
total sum | 618 | 856 | 2404.941176 | 2514.9 | 2433.94 |
mean | 18.18 | 25.18 | SSxx | SSyy | SSxy |
sample size , n = 34
here, x̅ = Σx / n= 18.18 ,
ȳ = Σy/n = 25.18
SSxx = Σ(x-x̅)² = 2404.9412
SSxy= Σ(x-x̅)(y-ȳ) = 2433.9
estimated slope , ß1 = SSxy/SSxx = 2433.9
/ 2404.941 = 1.0121
intercept, ß0 = y̅-ß1* x̄ =
6.7808
so, regression line is Ŷ =
6.7808 + 1.0121 *x
SSE= (SSxx * SSyy - SS²xy)/SSxx =
51.650
std error ,Se = √(SSE/(n-2)) =
1.270
correlation coefficient , r = Sxy/√(Sx.Sy)
= 0.9897
R changes a bit in negative direction(less) when we added the Leaf because few of the hybrid x can not be explained by the model.
Please revert in case of any doubt.
Please upvote. Thanks in advance