In: Statistics and Probability
On this worksheet, make an XY scatter plot linked to the following data:
X | Y |
92 | 22 |
87 | 23 |
102 | 23 |
80 | 25 |
91 | 27 |
100 | 20 |
95 | 21 |
109 | 19 |
77 | 28 |
100 | 221 |
98 | 25 |
89 | 27 |
97 | 23 |
93 | 22 |
89 | 27 |
91 | 22 |
97 | 21 |
105 | 21 |
88 | 22 |
83 | 24 |
86 | 27 |
89 | 26 |
79 | 30 |
88 | 22 |
94 | 24 |
18) | Add trendline, regression equation and r squared to the plot. | ||||||||||||||
Add this title. ("Scatterplot of X and Y Data") | |||||||||||||||
19) | The scatterplot reveals a point outside the point pattern. Copy the data to a new location in the worksheet. You now have 2 sets of data. | ||||||||||||||
Data that are more tha 1.5 IQR below Q1 or more than 1.5 IQR above Q3 are considered outliers and must be investigated. | |||||||||||||||
It was determined that the outlying point resulted from data entry error. Remove the outlier in the copy of the data. | |||||||||||||||
Make a new scatterplot linked to the cleaned data without the outlier, and add title ("Scatterplot without Outlier,") trendline, and regression equation label |
20)
Compare the regression equations of the two plots. How did removal of the outlier affect the slope and R2? |
Based on the given data, using excel,
We find that our data has one outlier with co-ordinates (100,221)
Regression Equation and R2:
18. The trend line is expressed as: y = 0.799x - 41.83 and R2 is obtained as 2.6%.
To add the title:
19. Obtaining the inter quartile range:
We find that the point Y = 221 lies outside the interquartile range.
Removing the outlier: Creating a scatter plot, regression line and computing R2 for the cleaned data:
20.Comparing the two scatter plots:
We find that the slope decreased from 0.799 to -0.25, from positive slope to negative, suggesting that the outlier was an influential observation that pulled the fitted regression line towards it.Also removal of the outlier increased R2 significantly from 0.026 to 0.495. The model is now a better fit to the data; as X, now, explains a larger proportion of variation (49.5%) in Y than before (2.6%) .