In: Math
Copier maintenance. The Tri-City Office Equipment Corporation sells an imported copier on a franchise basis and performs preventive maintenance and repair service on this copier. The data below have been collected from 45 recent calls on users to perform routine preventive maintenance service; for each call, X is the number of copiers serviced and Y is the total number of minutes spent by the service person. Assume that first-order regression model (1.1) is appropriate. (a) Obtain the estimated regression function. (b) Plot the estimated regression function and the data. How well does the estimated regression function fit the data? (c) Interpret b o in your estimated regression function. Does b o provide any relevant information here? Explain. (d) Obtain a point estimate of the mean service time when X = 5 copiers are serviced. Use R programming . The data set is 20 2 60 4 46 3 41 2 12 1 137 10 68 5 89 5 4 1 32 2 144 9 156 10 93 6 36 3 72 4 100 8 105 7 131 8 127 10 57 4 66 5 101 7 109 7 74 5 134 9 112 7 18 2 73 5 111 7 96 6 123 8 90 5 20 2 28 2 3 1 57 4 86 5 132 9 112 7 27 1 131 9 34 2 27 2 61 4 77 5
################################################################
CODE IN R FOR LINEAR REGRESSION
################################################################
#installing packages
install.packages("boot")
install.packages("car")
install.packages("QuantPsyc")
install.packages("lmtest")
install.packages("MASS")
install.packages("sandwich")
install.packages("nortest")
install.packages("vars")
#importing libraries
library(boot)
library(car)
library(QuantPsyc)
library(lmtest)
library(sandwich)
library(vars)
library(nortest)
library(MASS)
#creating dataframe
Y <-
c(20,60,46,41,12,137,68,89,4,32,144,156,93,36,72,100,105,131,127,57,66,101,109,74,134,112,18,73,111,96,123,90,20,28,3,57,86,132,112,27,131,34,27,61,77)
X <-
c(2,4,3,2,1,10,5,5,1,2,9,10,6,3,4,8,7,8,10,4,5,7,7,5,9,7,2,5,7,6,8,5,2,2,1,4,5,9,7,1,9,2,2,4,5)
data <- data.frame(Y,X,stringsAsFactors=FALSE)
data
#fitting linear regression
fit<- lm(Y ~ X,data=data)
summary(fit)
# Get the predicted or fitted values
fitted(fit)
data$pred_Y <- fitted(fit)
data
#scatterplot
plot(data$X,data$Y, main='scatter plot with regression line',
xlab='number of copiers serviced', ylab='total number of minutes
spent')
abline(lm(Y ~ X,data=data), col='red')
################################################################
OUTPUT
################################################################
Y <-
c(20,60,46,41,12,137,68,89,4,32,144,156,93,36,72,100,105,131,127,57,66,101,109,74,134,112,18,73,111,96,123,90,20,28,3,57,86,132,112,27,131,34,27,61,77)
> X <-
c(2,4,3,2,1,10,5,5,1,2,9,10,6,3,4,8,7,8,10,4,5,7,7,5,9,7,2,5,7,6,8,5,2,2,1,4,5,9,7,1,9,2,2,4,5)
>
> data <- data.frame(Y,X,stringsAsFactors=FALSE)
> data
Y X
1 20 2
2 60 4
3 46 3
4 41 2
5 12 1
6 137 10
7 68 5
8 89 5
9 4 1
10 32 2
11 144 9
12 156 10
13 93 6
14 36 3
15 72 4
16 100 8
17 105 7
18 131 8
19 127 10
20 57 4
21 66 5
22 101 7
23 109 7
24 74 5
25 134 9
26 112 7
27 18 2
28 73 5
29 111 7
30 96 6
31 123 8
32 90 5
33 20 2
34 28 2
35 3 1
36 57 4
37 86 5
38 132 9
39 112 7
40 27 1
41 131 9
42 34 2
43 27 2
44 61 4
45 77 5
> #fitting linear regression
> fit<- lm(Y ~ X,data=data)
> summary(fit)
Call:
lm(formula = Y ~ X, data = data)
Residuals:
Min
1Q Median
3Q Max
-22.7723 -3.7371 0.3334 6.3334 15.4039
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) -0.5802
2.8039 -0.207
0.837
X
15.0352 0.4831
31.123 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.914 on 43 degrees of freedom
Multiple R-squared: 0.9575, Adjusted
R-squared: 0.9565
F-statistic: 968.7 on 1 and 43 DF, p-value: < 2.2e-16
> fitted(fit)
1
2
3
4
5
6
7 8
29.49034 59.56084 44.52559 29.49034 14.45509 149.77232 74.59608
74.59608
9
10
11
12
13
14
15 16
14.45509 29.49034 134.73708 149.77232 89.63133 44.52559 59.56084
119.70183
17
18
19
20
21
22
23 24
104.66658 119.70183 149.77232 59.56084 74.59608 104.66658 104.66658
74.59608
25
26
27
28
29
30
31 32
134.73708 104.66658 29.49034 74.59608 104.66658 89.63133 119.70183
74.59608
33
34
35
36
37
38
39 40
29.49034 29.49034 14.45509 59.56084 74.59608 134.73708 104.66658
14.45509
41
42
43
44 45
134.73708 29.49034 29.49034 59.56084 74.59608
> data$pred_Y <- fitted(fit)
> data
Y X pred_Y
1 20 2 29.49034
2 60 4 59.56084
3 46 3 44.52559
4 41 2 29.49034
5 12 1 14.45509
6 137 10 149.77232
7 68 5 74.59608
8 89 5 74.59608
9 4 1 14.45509
10 32 2 29.49034
11 144 9 134.73708
12 156 10 149.77232
13 93 6 89.63133
14 36 3 44.52559
15 72 4 59.56084
16 100 8 119.70183
17 105 7 104.66658
18 131 8 119.70183
19 127 10 149.77232
20 57 4 59.56084
21 66 5 74.59608
22 101 7 104.66658
23 109 7 104.66658
24 74 5 74.59608
25 134 9 134.73708
26 112 7 104.66658
27 18 2 29.49034
28 73 5 74.59608
29 111 7 104.66658
30 96 6 89.63133
31 123 8 119.70183
32 90 5 74.59608
33 20 2 29.49034
34 28 2 29.49034
35 3 1 14.45509
36 57 4 59.56084
37 86 5 74.59608
38 132 9 134.73708
39 112 7 104.66658
40 27 1 14.45509
41 131 9 134.73708
42 34 2 29.49034
43 27 2 29.49034
44 61 4 59.56084
45 77 5 74.59608
> plot(data$X,data$Y, main='scatter plot with regression line',
xlab='number of copiers serviced', ylab='total number of minutes
spent')
> abline(lm(Y ~ X,data=data), col='red')
> abline(lm(Y ~ X,data=data), col='red')
################################################################
SCATTER PLOT
################################################################
################################################################
ANSWERS
################################################################
(a) Obtain the estimated regression function.
Look at the summary fit in the outout
------------------------------------------------------------------------------------------------
> summary(fit)
Call:
lm(formula = Y ~ X, data = data)
Residuals:
Min 1Q Median 3Q Max
-22.7723 -3.7371 0.3334 6.3334 15.4039
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.5802 2.8039 -0.207 0.837
X 15.0352 0.4831 31.123 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.914 on 43 degrees of freedom
Multiple R-squared: 0.9575, Adjusted R-squared: 0.9565
F-statistic: 968.7 on 1 and 43 DF, p-value: < 2.2e-16
------------------------------------------------------------------------------------------------
If we consider out regression function as Y = b0 + b1 X
Where,
b0= intercept
b1=slope or regression coefficient of Y on X, which represents the unit increment in Y for a unit increment in X.
so, our estimated regression equation is Y= (-0.5802) + (15.0352) * X
(b) Plot the estimated regression function and the data. How well does the estimated regression function fit the data?
Scatterplot Previously done.
We know that, R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination. It is known as a measure of goodness of fit of a model.
R-squared is always between 0 and 100%. 0% indicates that the model explains none of the variability of the response data around its mean. 100% indicates that the model explains all the variability of the response data around its mean. In general, the higher the R-squared, the better the model fits your data.
In this case we have R-squared =0.9565, that is 95.65% of total variability is explained by the linear regression model. So we can conclude that the fitting is very good.
(c) Interpret b o in your estimated regression function. Does b o provide any relevant information here? Explain.
The intercept (often labeled as constant) is the point where the function crosses the y-axis. It is the expected mean value of Y when all X=0. If X sometimes equals 0, the intercept is simply the expected mean value of Y at that value. If X never equals 0, then the intercept has no intrinsic meaning. In scientific research, the purpose of a regression model is to understand the relationship between predictors and the response. If so, and if X never = 0, there is no interest in the intercept. It doesn’t tell you anything about the relationship between X and Y.
Here our intercept is -0.5802. That is when X=0, the regression line meets the Y-Axis at -0.5802.
In our case we have no X value=0. So, in this case it doesn’t tell you anything about the relationship between X and Y.
(d) Obtain a point estimate of the mean service time when X = 5 copiers are serviced.
Look at the pred_X values in the output. When X = 5, predicted value of Y = 74.59608
We can calculate this from the regression equation also.
Y= (-0.5802) + (15.0352) * X , put X = 5
Or, Y= (-0.5802) + (15.0352) * 5
Or, Y= 74.5958 , when X=5
scatter plot with regression line 150 100 total number of minutes spent 50 00 odo 0 000 EN 0 1 4 8 10 number of copiers serviced