In: Statistics and Probability
Here is data with y as the response variable. x y 57.8 47.7 65.3 42.7 61 30.8 54.5 26.4 -70.7 -338.8 63 45.7 38.9 -46.1 70.2 35.5 Make a scatter plot of this data. Which point is an outlier? Enter as an ordered pair. For example (a,b) - with parenthesis. Find the regression equation for the data set without the outlier. Enter as an equation of the form y = a + b x . Rounded to three decimal places. Do not include the hat in y-hat. Find the regression equation for the data set with the outlier. Enter as an equation of the form y = a + b x . Rounded to three decimal places. Do not include the hat in y-hat. Is this outlier an influential point? No, the outlier does not appear to be an influential point. Yes, the outlier appears to be an influential point.
Scatter plot:
outlier = (-70.7, -338.8)
----------------------
Without outlier:
X | Y | XY | X² | Y² |
57.8 | 47.7 | 2757.06 | 3340.84 | 2275.29 |
65.3 | 42.7 | 2788.31 | 4264.09 | 1823.29 |
61 | 30.8 | 1878.8 | 3721 | 948.64 |
54.5 | 26.4 | 1438.8 | 2970.25 | 696.96 |
63 | 45.7 | 2879.1 | 3969 | 2088.49 |
38.9 | -46.1 | -1793.29 | 1513.21 | 2125.21 |
70.2 | 35.5 | 2492.1 | 4928.04 | 1260.25 |
Ʃx = | 410.7 |
Ʃy = | 182.7 |
Ʃxy = | 12440.88 |
Ʃx² = | 24706.43 |
Ʃy² = | 11218.13 |
Sample size, n = | 7 |
x̅ = Ʃx/n = 410.7/7 = | 58.6714286 |
y̅ = Ʃy/n = 182.7/7 = | 26.1 |
SSxx = Ʃx² - (Ʃx)²/n = 24706.43 - (410.7)²/7 = | 610.074286 |
SSyy = Ʃy² - (Ʃy)²/n = 11218.13 - (182.7)²/7 = | 6449.66 |
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 12440.88 - (410.7)(182.7)/7 = | 1721.61 |
Slope, b = SSxy/SSxx = 1721.61/610.07429 = 2.8219678
y-intercept, a = y̅ -b* x̅ = 26.1 - (2.82197)*58.67143 = -139.4689
Regression equation :
y = -139.469 + (2.822) x
--------------------------------------
With outlier:
X | Y | XY | X² | Y² |
57.8 | 47.7 | 2757.06 | 3340.84 | 2275.29 |
65.3 | 42.7 | 2788.31 | 4264.09 | 1823.29 |
61 | 30.8 | 1878.8 | 3721 | 948.64 |
54.5 | 26.4 | 1438.8 | 2970.25 | 696.96 |
-70.7 | -338.8 | 23953.16 | 4998.49 | 114785.44 |
63 | 45.7 | 2879.1 | 3969 | 2088.49 |
38.9 | -46.1 | -1793.29 | 1513.21 | 2125.21 |
70.2 | 35.5 | 2492.1 | 4928.04 | 1260.25 |
Ʃx = | 340 |
Ʃy = | -156.1 |
Ʃxy = | 36394.04 |
Ʃx² = | 29704.92 |
Ʃy² = | 126003.57 |
Sample size, n = | 8 |
x̅ = Ʃx/n = 340/8 = | 42.5 |
y̅ = Ʃy/n = -156.1/8 = | -19.5125 |
SSxx = Ʃx² - (Ʃx)²/n = 29704.92 - (340)²/8 = | 15254.92 |
SSyy = Ʃy² - (Ʃy)²/n = 126003.57 - (-156.1)²/8 = | 122957.669 |
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 36394.04 - (340)(-156.1)/8 = | 43028.29 |
Slope, b = SSxy/SSxx = 43028.29/15254.92 = 2.8206172
y-intercept, a = y̅ -b* x̅ = -19.5125 - (2.82062)*42.5 = -139.3887
Regression equation :
y = -139.389 + (2.821) x
------------------------
No, the outlier does not appear to be an influential point.