In: Statistics and Probability
A random sample of 50 customer orders for wash down motors used in the food processing industry. The order size refers to the number of motors ordered and the total cost is the total production cost for that order.
Order Size | Total Cost |
28 | $ 7,531 |
27 | $ 6,329 |
34 | $ 8,413 |
30 | $ 7,793 |
19 | $ 5,360 |
21 | $ 4,838 |
11 | $ 2,551 |
17 | $ 3,899 |
33 | $ 8,326 |
23 | $ 5,465 |
7 | $ 2,283 |
23 | $ 5,413 |
18 | $ 4,238 |
26 | $ 6,911 |
20 | $ 6,315 |
29 | $ 8,243 |
10 | $ 2,866 |
22 | $ 6,775 |
14 | $ 4,289 |
6 | $ 1,475 |
15 | $ 3,590 |
35 | $ 9,439 |
23 | $ 6,760 |
18 | $ 5,170 |
31 | $ 7,780 |
21 | $ 4,896 |
35 | $ 8,816 |
29 | $ 8,116 |
21 | $ 6,212 |
24 | $ 5,551 |
27 | $ 7,080 |
36 | $ 9,826 |
24 | $ 6,129 |
20 | $ 5,094 |
13 | $ 3,568 |
14 | $ 3,738 |
19 | $ 5,332 |
14 | $ 3,286 |
28 | $ 6,664 |
24 | $ 5,990 |
27 | $ 7,093 |
16 | $ 3,975 |
36 | $ 9,046 |
20 | $ 4,906 |
19 | $ 5,324 |
7 | $ 2,734 |
32 | $ 8,138 |
21 | $ 5,376 |
29 | $ 7,763 |
23 | $ 5,964 |
Independent Variable : Order Size
Dependent Variables : Total cost
----
Scatter plot:
The scatter plot indicates a positive and strong linear relationship between order size and Total cost.
---
Ʃx = | 1119 |
Ʃy = | 292669 |
Ʃxy = | 7297616 |
Ʃx² = | 28031 |
Ʃy² = | 1911167665 |
Sample size, n = | 50 |
x̅ = Ʃx/n = 1119/50 = | 22.38 |
y̅ = Ʃy/n = 292669/50 = | 5853.38 |
SSxx = Ʃx² - (Ʃx)²/n = 28031 - (1119)²/50 = | 2987.78 |
SSyy = Ʃy² - (Ʃy)²/n = 1911167665 - (292669)²/50 = | 198064793.8 |
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 7297616 - (1119)(292669)/50 = | 747683.78 |
Correlation coefficient, r = SSxy/√(SSxx*SSyy) = 747683.78/√(2987.78*198064793.78) = 0.9719
There is strong and positive linear relationship.
----
Slope, b = SSxy/SSxx = 747683.78/2987.78 = 250.24727
y-intercept, a = y̅ -b* x̅ = 5853.38 - (250.24727)*22.38 = 252.84616
Regression equation :
ŷ = 252.8462 + (250.2473) x
----
Slope Hypothesis test:
Null and alternative hypothesis:
Ho: β₁ = 0
Ha: β₁ ≠ 0
Slope, b = 250.2473
Sum of Square error, SSE = SSyy -SSxy²/SSxx = 198064793.78 - (747683.78)²/2987.78 = 10958971.1
Standard error, se = √(SSE/(n-2)) = √(10958971.10394/(50-2)) = 477.81994
Test statistic:
t = b/(se/√SSxx) = 250.2473 /(477.8199/√2987.78) = 28.6272
df = n-2 = 48
p-value = T.DIST.2T(ABS(28.6272), 48) = 0.0000
Conclusion:
p-value < α Reject the null hypothesis.
The model is significant.
-----
Critical value, t_c = T.INV.2T(0.05, 48) = 2.0106
95% Confidence interval for slope:
Lower limit = β₁ - tc*se/√SSxx = 250.2473 - 2.0106*477.8199/√2987.78 = 232.6711
Upper limit = β₁ + tc*se/√SSxx = 250.2473 + 2.0106*477.8199/√2987.78 = 267.8234
-----
Predicted value of y at x = 10
ŷ = 252.8462 + (250.2473) * 10 = 2755.3188
Predicted value of y at x = 20
ŷ = 252.8462 + (250.2473) * 20 = 5257.7915
Predicted value of y at x = 30
ŷ = 252.8462 + (250.2473) * 30 = 7760.2642