In: Statistics and Probability
Use the given data to find the equation of the regression line. Examine the scatterplot and identify a characteristic of the data that is ignored by the regression line.
x
44
1212
99
77
33
1111
1010
88
55
1313
66
y
3.643.64
8.448.44
9.309.30
8.108.10
1.441.44
9.089.08
9.369.36
8.888.88
5.485.48
7.447.44
6.966.96
(R code:
x=c(44,1212,99,77,33,1111,1010,88,55,1313,66)
y=c(3643.64,8448.44,9309.30,8108.10,1441.44,9089.08,9369.36,8888.88,
5485.48,7447.44,6966.96)
lm(y~x))
R code:
x=c(44,1212,99,77,33,1111,1010,88,55,1313,66)
y=c(3643.64,8448.44,9309.30,8108.10,1441.44,9089.08,9369.36,8888.88,
5485.48,7447.44,6966.96)
Y=6128.556+2.111*x
plot(x,y,lwd=2,type="p",xlab="x", ylab="y")
lines(x,Y,lwd=2,type="l",col=2)
summary(lm(y~x))
anova(lm(y~x))
> summary(lm(y~x))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-4757 -1106 615 1463 2972
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6128.556 970.315 6.316 0.000138 ***
x 2.111 1.375 1.536 0.158941
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2424 on 9 degrees of freedom
Multiple R-squared: 0.2077, Adjusted R-squared: 0.1196
F-statistic: 2.359 on 1 and 9 DF, p-value: 0.1589
> anova(lm(y~x))
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x 1 13858389 13858389 2.3589 0.1589
Residuals 9 52873639 5874849
>
From scatter plot it is observed that all most all data points lie
far away from fitted regression line (see red line) and it is also
seen from ANOVA table that the regression equation is insignificant
(see p-value=0.1589>0.05).