In: Statistics and Probability
Consider the following data for two variables, x and y.
x1 = 4 5 7 8 10 12 12 22
y1 = 12 14 16 15 18 20 24 19
b. Compute the standardized residuals for these data (to 2 decimals, if necessary). Enter negative values as negative numbers.
Observation 1 | |
Observation 2 | |
Observation 3 | |
Observation 4 | |
Observation 5 | |
Observation 6 | |
Observation 7 | |
Observation 8 |
Do the data include any outliers?
- Select your answer -Yes, there appear to be 3 outliersYes, there
appear to be 2 outliersYes, there appears to be an outlierNo, there
do not appear to be any outliersItem 11
c. Compute the leverage values for these data (to 2 decimals). Enter negative values as negative numbers.
Observation 1 | |
Observation 2 | |
Observation 3 | |
Observation 4 | |
Observation 5 | |
Observation 6 | |
Observation 7 | |
Observation 8 |
Does there appear to be any influential observations in these
data?
- Select your answer -Yes, observation 8 is an influential
observation Yes, observation 6 is an influential observation Yes,
observation 3 is an influential observation No, there do not appear
to be any influential observations
Answer:
Leverage is a measure of how much the estimated value of the x changes when that point is removed. If leverage for an observation is high, it means it forces the regression line to be close to the point. In these data points, influential observation in these data is observation 8 as the rule for identifying the influential observation is:
if the leverage value of an observation is greater than 3times larger than the mean of leverage values
so, mean of the leverage value = 0.25000 and three times = 3*0.25000 = 0.75000
by looking at the leverage value in the table, observation 8 (whose x value =22 is extreme) has leverage value 0.76, which is higher than 0.75. so observation 8 is the influential observation. so it means removing this datapoint form the equation would have a greater impact on the regression analysis.
Create a scatterplot of yI on xI. How to create: choose xI and yI data set and in excel, go to insert bar, choose scatter
we would see scatterplot below which shows datapoint (x= 22, y = 19) is extreme with respect to other x values. this means scatter diagram indicate influential observation. ((loook at the extreme right point big in size in diagram below which stands out)
Observation | xI | yI | Standard Residuals | (xI-mean of x)2 | leverage |
1 | 4 | 12 | -0.917182874 | 36 | 0.284292 |
2 | 5 | 14 | -0.382347324 | 25 | 0.235619 |
3 | 7 | 16 | 0.008262909 | 9 | 0.164823 |
4 | 8 | 15 | -0.475492842 | 4 | 0.142699 |
5 | 10 | 18 | 0.254647825 | 0 | 0.125 |
6 | 12 | 20 | 0.645258058 | 4 | 0.142699 |
7 | 12 | 24 | 2.003379792 | 4 | 0.142699 |
8 | 22 | 19 | -1.136525544 | 144 | 0.762168 |
To compute the standardized residuals in excel (Regress Y on X), Choose Tools, Data Analysis, Regression
Highlight the column Y then column X, click on standardized residuals and click OK
Similarly, the leverage value of each observation can be computed as:
hii = 1/n + (xi ? x¯)2 /(xi ? x¯)2 .
here n =8, so the leverage value of point 1 is: h11 = 1/8 + (36/226) = 0.284292
we can calculate variance in excel by choosing function in excel (insert function)
Here 226 is denominator of leverage formula and the calculation is : sum of column 5 below