In: Statistics and Probability
The number of defective items produced by a machine (Y) is known to be linearly related to the speed setting of the machine (X). Data is provided below.
a) (3) Fit a linear regression function by ordinary least squares; obtain the residuals and plot the residuals against X. What does the residual plot suggest?
b) (3) Plot the absolute value of the residuals and the squared residuals vs. X. Which plot has a better line?
c) (4) Perform a weighted least square using the squared residuals to compute the weights. Obtain the weighted least squares estimates for the estimated parameters and their standard errors. Are these values similar to the ones produced in a)? Which results are better, the ones generated in a) or c)? Please explain your answer.
y |
x |
28 |
200 |
75 |
400 |
37 |
300 |
53 |
400 |
22 |
200 |
58 |
300 |
40 |
300 |
96 |
400 |
46 |
200 |
52 |
400 |
30 |
200 |
69 |
300 |
SOLUTION
a)
We shall use R for all numeric computation
model10<-lm(Y ~ X, data = data10)
> model10
Call:
lm(formula = Y ~ X, data = data10)
Coefficients:
(Intercept) X
-5.7500 0.1875
> summary(model10)
Call:
lm(formula = Y ~ X, data = data10)
Residuals:
Min 1Q Median 3Q Max
-17.250 -11.250 -2.750 9.188 26.750
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.75000 16.73052 -0.344 0.73820
X 0.18750 0.05381 3.484 0.00588 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.22 on 10 degrees of freedom
Multiple R-squared: 0.5484, Adjusted R-squared: 0.5032
F-statistic: 12.14 on 1 and 10 DF, p-value: 0.005878
As the independent variable seems to be categorical, the model too much deviates from actual values. Residuals are large!
b)
c. plot(model10$residuals, model10$model$X,xlab =
"Residuals", ylab = "X", type = "p")
Call:
lm(formula = model10$residuals ~ data10$X)
Residuals:
Min 1Q Median 3Q Max
-17.250 -11.250 -2.750 9.188 26.750
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.467e-17 1.673e+01 0 1
data10$X -3.140e-18 5.381e-02 0 1
c)
Line of Regression Y on X i.e Y = bo + b1 X | ||||
X | Y | (Xi - Mean)^2 | (Yi - Mean)^2 | (Xi-Mean)*(Yi-Mean) |
28 | 200 | 506.25 | 10000 | 2250 |
75 | 400 | 600.25 | 10000 | 2450 |
37 | 300 | 182.25 | 0 | 0 |
53 | 400 | 6.25 | 10000 | 250 |
22 | 200 | 812.25 | 10000 | 2850 |
58 | 300 | 56.25 | 0 | 0 |
40 | 300 | 110.25 | 0 | 0 |
96 | 400 | 2070.25 | 10000 | 4550 |
46 | 200 | 20.25 | 10000 | 450 |
52 | 400 | 2.25 | 10000 | 150 |
30 | 200 | 420.25 | 10000 | 2050 |
69 | 300 | 342.25 | 0 | 0 |
calculation procedure for regression
mean of X = sum ( X / n ) = 50.5
mean of Y = sum ( Y / n ) = 300
sum ( (Xi - Mean)^2 ) = 5129
sum ( (Yi - Mean)^2 ) = 80000
sum ( (Xi-Mean)*(Yi-Mean) ) = 15000
b1 = sum ( (Xi-Mean)*(Yi-Mean) ) / sum ( (Xi - Mean)^2 )
= 15000 / 5129
= 2.925
bo = sum ( Y / n ) - b1 * sum ( X / n )
bo = 300 - 2.925*50.5 = 152.31
value of regression equation is, Y = bo + b1 X
Y'=152.31+2.925* X
bo =152.31
b1 =2.925
Standard Error of Y on X i.e Y = bo + b1 X | ||||
Xi | Yi | Y'=152.31+2.92*X | Y-Y' | (Y-Yi)^2 |
28 | 200 | 234.21 | -34.21 | 1170.324 |
75 | 400 | 371.685 | 28.315 | 801.739 |
37 | 300 | 260.535 | 39.465 | 1557.486 |
53 | 400 | 307.335 | 92.665 | 8586.802 |
22 | 200 | 216.66 | -16.66 | 277.556 |
58 | 300 | 321.96 | -21.96 | 482.242 |
40 | 300 | 269.31 | 30.69 | 941.876 |
96 | 400 | 433.11 | -33.11 | 1096.272 |
46 | 200 | 286.86 | -86.86 | 7544.66 |
52 | 400 | 304.41 | 95.59 | 9137.448 |
30 | 200 | 240.06 | -40.06 | 1604.804 |
69 | 300 | 354.135 | -54.135 | 2930.598 |
Standard error = Sqrt( ( sum ( Y -Yi )^2/ n-2 )
sum ( Y -Yi )^2 = 36131.807
Standard Error = 36131.807