In: Statistics and Probability
Here is data with y as the response variable.
(x,y): 59.2 48.5, 64.5 46.8, 47.4 13.8, -37.3 10, 71.4 84.9, 71.6 18.3, 64.4 28.3, 71.1 89.6, 56.2 6.2, 56.7 11.7, 69.1 23.8
Make a scatter plot of this data. Which point is an outlier? Enter as an ordered pair. For example (a,b) - with parenthesis.
1. Find the regression equation for the data set without the outlier. Enter as an equation of the form y = a + b x . Rounded to three decimal places. Do not include the hat in y-hat. Find the regression equation for the data set with the outlier.
2. Enter as an equation of the form y = a + b x . Rounded to three decimal places. Do not include the hat in y-hat.
3. Is this outlier an influential point?
a. Yes, the outlier appears to be an influential point.
b. No, the outlier does not appear to be an influential point.
Scatter plot:
Outlier : (-37.3, 10)
Without Outlier:
X | Y | XY | X² | Y² |
59.2 | 48.5 | 2871.2 | 3504.64 | 2352.25 |
64.5 | 46.8 | 3018.6 | 4160.25 | 2190.24 |
47.4 | 13.8 | 654.12 | 2246.76 | 190.44 |
71.4 | 84.9 | 6061.86 | 5097.96 | 7208.01 |
71.6 | 18.3 | 1310.28 | 5126.56 | 334.89 |
64.4 | 28.3 | 1822.52 | 4147.36 | 800.89 |
71.1 | 89.6 | 6370.56 | 5055.21 | 8028.16 |
56.2 | 6.2 | 348.44 | 3158.44 | 38.44 |
56.7 | 11.7 | 663.39 | 3214.89 | 136.89 |
69.1 | 23.8 | 1644.58 | 4774.81 | 566.44 |
Ʃx = | 631.6 |
Ʃy = | 371.9 |
Ʃxy = | 24765.55 |
Ʃx² = | 40486.88 |
Ʃy² = | 21846.65 |
Sample size, n = | 10 |
x̅ = Ʃx/n = 631.6/10 = | 63.16 |
y̅ = Ʃy/n = 371.9/10 = | 37.19 |
SSxx = Ʃx² - (Ʃx)²/n = 40486.88 - (631.6)²/10 = | 595.024 |
SSyy = Ʃy² - (Ʃy)²/n = 21846.65 - (371.9)²/10 = | 8015.689 |
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 24765.55 - (631.6)(371.9)/10 = | 1276.346 |
Slope, b = SSxy/SSxx = 1276.346/595.024 = 2.145032805
y-intercept, a = y̅ -b* x̅ = 37.19 - (2.14503)*63.16 = -98.29027199
Regression equation :
y = -98.290 + (2.145) x
--------------------
With Outlier:
X | Y | XY | X² | Y² |
59.2 | 48.5 | 2871.2 | 3504.64 | 2352.25 |
64.5 | 46.8 | 3018.6 | 4160.25 | 2190.24 |
47.4 | 13.8 | 654.12 | 2246.76 | 190.44 |
-37.3 | 10 | -373 | 1391.29 | 100 |
71.4 | 84.9 | 6061.86 | 5097.96 | 7208.01 |
71.6 | 18.3 | 1310.28 | 5126.56 | 334.89 |
64.4 | 28.3 | 1822.52 | 4147.36 | 800.89 |
71.1 | 89.6 | 6370.56 | 5055.21 | 8028.16 |
56.2 | 6.2 | 348.44 | 3158.44 | 38.44 |
56.7 | 11.7 | 663.39 | 3214.89 | 136.89 |
69.1 | 23.8 | 1644.58 | 4774.81 | 566.44 |
Ʃx = | 594.3 |
Ʃy = | 381.9 |
Ʃxy = | 24392.55 |
Ʃx² = | 41878.17 |
Ʃy² = | 21946.65 |
Sample size, n = | 11 |
x̅ = Ʃx/n = 594.3/11 = | 54.02727273 |
y̅ = Ʃy/n = 381.9/11 = | 34.71818182 |
SSxx = Ʃx² - (Ʃx)²/n = 41878.17 - (594.3)²/11 = | 9769.761818 |
SSyy = Ʃy² - (Ʃy)²/n = 21946.65 - (381.9)²/11 = | 8687.776364 |
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 24392.55 - (594.3)(381.9)/11 = | 3759.534545 |
Slope, b = SSxy/SSxx = 3759.53455/9769.76182 = 0.384813327
y-intercept, a = y̅ -b* x̅ = 34.71818 - (0.38481)*54.02727 = 13.92776727
Regression equation :
y = 13.928 + (0.385) x
-------------------------
3. Answer: a. Yes, the outlier appears to be an influential point.