In: Statistics and Probability
Anscombe's Data | |||||||||||
Observation | x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | |||
1 | 10 | 8.04 | 10 | 9.14 | 10 | 7.46 | 8 | 6.58 | |||
2 | 8 | 6.95 | 8 | 8.14 | 8 | 6.77 | 8 | 5.76 | |||
3 | 13 | 7.58 | 13 | 8.74 | 13 | 12.74 | 8 | 7.71 | |||
4 | 9 | 8.81 | 9 | 8.77 | 9 | 7.11 | 8 | 8.84 | |||
5 | 11 | 8.33 | 11 | 9.26 | 11 | 7.81 | 8 | 8.47 | |||
6 | 14 | 9.96 | 14 | 8.1 | 14 | 8.84 | 8 | 7.04 | |||
7 | 6 | 7.24 | 6 | 6.13 | 6 | 6.08 | 8 | 5.25 | |||
8 | 4 | 4.26 | 4 | 3.1 | 4 | 5.39 | 19 | 12.5 | |||
9 | 12 | 10.84 | 12 | 9.13 | 12 | 8.15 | 8 | 5.56 | |||
10 | 7 | 4.82 | 7 | 7.26 | 7 | 6.42 | 8 | 7.91 | |||
11 | 5 | 5.68 | 5 | 4.74 | 5 | 5.73 | 8 | 6.89 |
Fit a simple linear regression model to each set of (x, y) data, i.e., one model fit to (x1, y1), one model fit to (x2, y2), one model fit to (x3, y3), and one model fit to (x4, y4).
Write down the estimated regression equation for each fitted model, together with the values of the coefficient of determination, r2, and the standard error of the estimate, s=MSE‾‾‾‾‾√.
For each set of (x, y) data, create a scatterplot of y (vertical) versus x (horizontal) with the estimated regression line added to the plot.
For each set of (x, y) data, create a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot, do the zero mean and constant variance assumptions about the simple linear regression model error seem reasonable?
For each set of (x, y) data, create a normal probability plot of the standardized residuals. Based on each plot, does the normality assumption about the simple linear regression model error seem reasonable?
For each set of (x, y) data, are there any outliers?
For each set of (x, y) data, are there any high leverage points?
For each set of (x, y) data, are there any influential points?
Post a summary of your group’s analysis. What important “big picture” conclusions can you draw from your analysis?
1) for set of (x1, y1)
Regression Equation
y1 = 3.00 + 0.500 x1
coefficient of determination R2 =66.65%
standard error of the estimate S=1.23660
scatterplot of y1 (vertical) versus x1 (horizontal) with the estimated regression line added to the plot.
a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot
From above graph the zero mean and constant variance assumptions about the simple linear regression model error seem reasonable
a normal probability plot of the standardized residuals
from above graph the normality assumption about the simple linear regression model error seem reasonable
1) for set of (x2, y2)
Regression Equation
y2 = 3.00 + 0.500 x2
coefficient of determination R2 =66.62%
standard error of the estimate S=1.2372
scatterplot of y2 (vertical) versus x2 (horizontal) with the estimated regression line added to the plot.
a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot
From above graph the zero mean and constant variance assumptions about the simple linear regression model error seem not reasonable
a normal probability plot of the standardized residuals
from above graph the normality assumption about the simple linear regression model error seem reasonable
3) for (x3,y3)
Regression Equation
y3 = 3.00 + 0.500 x3
coefficient of determination R2 =66.6%
standard error of the estimate S=1.2357
scatterplot of y3 (vertical) versus x3 (horizontal) with the estimated regression line added to the plot.
a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot
from above the zero mean and constant variance assumptions about the simple linear regression model error seem not reasonable
a normal probability plot of the standardized residuals
from above graph the normality assumption about the simple linear regression model error seem reasonable
4)
for (x4,y4)
Regression Equation
y4 = 3.00 + 0.500 x4
coefficient of determination R2 =66.63%
standard error of the estimate S=1.2361
scatterplot of y3 (vertical) versus x3 (horizontal) with the estimated regression line added to the plot.
a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot
from above the zero mean and constant variance assumptions about the simple linear regression model error seem not reasonable
a normal probability plot of the standardized residuals
from above graph the normality assumption about the simple linear regression model error seem reasonable