In: Statistics and Probability
Observation | x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | |||
1 | 10 | 8.04 | 10 | 9.14 | 10 | 7.46 | 8 | 6.58 | |||
2 | 8 | 6.95 | 8 | 8.14 | 8 | 6.77 | 8 | 5.76 | |||
3 | 13 | 7.58 | 13 | 8.74 | 13 | 12.74 | 8 | 7.71 | |||
4 | 9 | 8.81 | 9 | 8.77 | 9 | 7.11 | 8 | 8.84 | |||
5 | 11 | 8.33 | 11 | 9.26 | 11 | 7.81 | 8 | 8.47 | |||
6 | 14 | 9.96 | 14 | 8.1 | 14 | 8.84 | 8 | 7.04 | |||
7 | 6 | 7.24 | 6 | 6.13 | 6 | 6.08 | 8 | 5.25 | |||
8 | 4 | 4.26 | 4 | 3.1 | 4 | 5.39 | 19 | 12.5 | |||
9 | 12 | 10.84 | 12 | 9.13 | 12 | 8.15 | 8 | 5.56 | |||
10 | 7 | 4.82 | 7 | 7.26 | 7 | 6.42 | 8 | 7.91 | |||
11 | 5 | 5.68 | 5 | 4.74 | 5 | 5.73 | 8 | 6.89 |
Fit a simple linear regression model to each set of (x, y) data, i.e., one model fit to (x1, y1), one model fit to (x2, y2), one model fit to (x3, y3), and one model fit to (x4, y4).
Write down the estimated regression equation for each fitted model, together with the values of the coefficient of determination, r2, and the standard error of the estimate, s=MSE‾‾‾‾‾√.
For each set of (x, y) data, create a scatterplot of y (vertical) versus x (horizontal) with the estimated regression line added to the plot.
For each set of (x, y) data, create a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot, do the zero mean and constant variance assumptions about the simple linear regression model error seem reasonable?
For each set of (x, y) data, create a normal probability plot of the standardized residuals. Based on each plot, does the normality assumption about the simple linear regression model error seem reasonable?
For each set of (x, y) data, are there any outliers?
For each set of (x, y) data, are there any high leverage points?
For each set of (x, y) data, are there any influential points?
Post a summary of your group’s analysis. What important “big picture” conclusions can you draw from your analysis?
I used R software to solve this question.
For data (x1,y1)
R codes and output:
> x1=scan('clipboard');x1
Read 11 items
[1] 10 8 13 9 11 14 6 4 12 7 5
> y1=scan('clipboard');y1
Read 11 items
[1] 8.04 6.95 7.58 8.81 8.33 9.96 7.24 4.26 10.84 4.82 5.68
> plot(x1,y1)
> fit=lm(y1~x1)
> summary(fit)
Call:
lm(formula = y1 ~ x1)
Residuals:
Min 1Q Median 3Q Max
-1.92127 -0.45577 -0.04136 0.70941 1.83882
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0001 1.1247 2.667 0.02573 *
x1 0.5001 0.1179 4.241 0.00217 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.237 on 9 degrees of freedom
Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
> res=fit$residuals
> plot(x1,res)
> std_res=rstandard(fit) # standardized residuals
> qqnorm(std_res)
> qqline(std_res)
Estimated regression equation:
Y1 = 3.0001 + 0.5001 X1
Coefficient of determination: R2 = 0.6665
Standard error of the estimate = 1.237
Scatter plot:
Residual plot:
Residual plot shows random pattern hence residuals are independent.
Normal probability plot for standardized residuals:
Points des not lie on straight line, hence normality assumption is not satisfied.