In: Statistics and Probability
Table 2.5 (page 109) gives data on the true calories in 10 foods and the average guesses made by a large group of people. Exercise 2.26 explored the influence of two outlying observations on the correlation.
(a) Make a scatterplot suitable for predicting guessed calories from true calories. Circle the points for spaghetti and snack cake on your plot. These points lie outside the linear pattern of the other 8 points.
(b) Find the least-squares regression line of guessed calories on true calories. Do this twice, first for all 10 data points and then leaving out spaghetti and snack cake.
(c) Plot both lines on your graph. (Make one dashed so you can tell them apart.) Are spaghetti and snack cake, taken together, influential observations? Explain your answer.
Table 2.5
Guessed |
Correct |
|
Food |
calories |
calories |
8 oz. whole milk |
196 |
159 |
5 oz. spaghetti with tomato sauce |
394 |
163 |
5 oz. macaroni with cheese |
350 |
269 |
One slice wheat bread |
117 |
61 |
One slice white bread |
136 |
76 |
2-oz. candy bar |
364 |
260 |
Saltine cracker |
74 |
12 |
Medium-size apple |
107 |
80 |
Medium-size potato |
160 |
88 |
Cream-filled snack cake |
419 |
160 |
a)
from Scatter plot we can see that two circled observations are outliers which are far away from straight line
b)
c)
From above output we can see that when the two points spaghetti and snack cake in data slope of the regression line is 1.303 with R2 0.67
when we remove the two points the slop is 1.1472 with R2 0.96
i.e these two points are influential point because they affect slop and R2 in huge amount