In: Statistics and Probability
y | x1 | x2 |
13 | 20 | 3 |
1 | 15 | 2 |
11 | 23 | 2 |
2 | 10 | 4 |
20 | 30 | 1 |
15 | 21 | 4 |
27 | 38 | 0 |
5 | 18 | 2 |
26 | 24 | 5 |
1 | 16 | 2 |
A manufacturer recorded the number of defective items (y) produced on a given day by each of ten machine operators and also recorded the average output per hour (x1) for each operator and the time in weeks from the last machine service (x2).
a. What is the least-squares prediction equation?
b. Is there evidence to indicate that both independent variables contribute significantly to the prediction of y? Why or why not?
c. Using the original model, how good is it? Give a quantitative answer and then explain your answer in a way that a non-statistician could understand.
d. Use the prediction equation to predict the number of defective items produced for an operator whose average output per hour is 25 and whose machine was serviced three weeks ago.
The regression or least squared prediction output is attached in the below image
Question (a)
The least-squares prediction equation is
Y = 1.4755 * X1 + 3.8192 * X2 + (-29.1721)
Where Y is Number of defective items
X1 is Average output per hour for each operator
X2 is Time in weeks from the last machine services
Question (b)
Both the Independent varibales X1 and X2 are contributing significanly to the prediction of Y since their p-values are lower than our value of 0.05. Hence both those variables are statistically significant
So there is evidence to indicate that both the independent variables X1 and X2 are contributing significantly to the dependent variable Y
Te p-value for X1 is 0.000000087 and X2 is 0.00001125 which are far lesser than our significance levels
Question (c)
The fit of the model is determine dy R-Square value and Significance F values.
Here R Square value from the Summary output is 0.98642, which implies that the model is a very strong fit for the data
The dependent variables are explaining 98.64% of variance of dependent variable which is very good for the model
The Significant F-value is 0.000000291 which is way less than our significance levels. Hence the model is a very good fit for the data
So for a non-statistican we can say that the number of defective items in almost all of the casses will be found accurately using the dependent variables Average output per hour for each operator and time in weeks from the last machine services
Question (d)
The least-squares prediction equation is
Y = 1.4755 * X1 + 3.8192 * X2 + (-29.1721)
Here our given X1 = 25, X2 = 3
So Y = 1.4755 * 25 + 3.8192 * 3 + (-29.1721)
= 19.71399
The number of defective items produced is 19.71399