In: Statistics and Probability
Eclipse Engineering provides services of structural engineering. They just opened a new branch in Portland, OR. One of their products is called Structural Insulated Panels, or SIPs. It is a type of foam insulation that replaces typical wall and roof framing for residential or commercial buildings. Clients are billed a fixed fee for each project based on square footage of the building, square footage of the panel area, and number of panels. Depending on how long a project takes to complete, there usually ends up being additional cost not billed to the client (fee burned). Eclipse engineering is using simple linear regression to model their costs. We will investigate the data to find the best model that can be used to predict what the fee burned will be for a particular project!
Fee Burned | Panel Area | Floor Area | # Panels |
1759.92 | 2832 | 2482 | 12 |
2225 | 5670 | 2566 | 10 |
2885 | 6740 | 2632 | 23 |
1732.5 | 4478 | 1204 | 20 |
1662.5 | 4303 | 1680 | 11 |
390 | 2493 | 805 | 6 |
683 | 1804 | 1012 | 10 |
1138 | 14459 | 1296 | 9 |
683 | 2350 | 1184 | 18 |
748 | 2198 | 1215 | 10 |
130 | 3215 | 1209 | 9 |
163 | 2497 | 987 | 12 |
1108 | 4168 | 2270 | 14 |
857.5 | 2988 | 1160 | 9 |
928 | 4481 | 1853 | 11 |
458 | 1040 | 326 | 7 |
683 | 3350 | 1610 | 8 |
Please show all working by using Excel formulas/commands. Please write out any explanations where necessary. Please show all formulas and working out. Thank you. Please only use Excel commands and show which was used. Please show where you got your numbers from when talking about equations and etc. Thank you.
SCATTERPLOT
panel area(X1) | floor area(X2) | # panel(X3) | free burned(y) |
2832 | 2482 | 12 | 1759.92 |
5670 | 2566 | 10 | 2225 |
6740 | 2632 | 23 | 2885 |
4478 | 1204 | 20 | 1732.5 |
4303 | 1680 | 11 | 1662.5 |
2493 | 805 | 6 | 390 |
1804 | 1012 | 10 | 683 |
14459 | 1296 | 9 | 1138 |
2350 | 1184 | 18 | 683 |
2198 | 1215 | 10 | 748 |
3215 | 1209 | 9 | 130 |
2497 | 987 | 12 | 163 |
4168 | 2270 | 14 | 1108 |
2988 | 1160 | 9 | 857.5 |
4481 | 1853 | 11 | 928 |
1040 | 326 | 7 | 458 |
3350 | 1610 | 8 | 683 |
LINEAR EQUATION : Y = a +bx
a = interceot
b = slope
b= (som of x.y * mean of x * mean of y)/ standard deviation of x * standard deviation of y
a = y bar - b * x bar
between panel area and fee burned- linear equation
panel area(X1) | fee burned(y) |
2832 | 1759.92 |
5670 | 2225 |
6740 | 2885 |
4478 | 1732.5 |
4303 | 1662.5 |
2493 | 390 |
1804 | 683 |
14459 | 1138 |
2350 | 683 |
2198 | 748 |
3215 | 130 |
2497 | 163 |
4168 | 1108 |
2988 | 857.5 |
4481 | 928 |
1040 | 458 |
3350 | 683 |
linear equation : y= 0.0996*x +667.98
between floor area and fee burned- linear equation
floor area(X2) | free burned(y) |
2482 | 1759.92 |
2566 | 2225 |
2632 | 2885 |
1204 | 1732.5 |
1680 | 1662.5 |
805 | 390 |
1012 | 683 |
1296 | 1138 |
1184 | 683 |
1215 | 748 |
1209 | 130 |
987 | 163 |
2270 | 1108 |
1160 | 857.5 |
1853 | 928 |
326 | 458 |
1610 | 683 |
linear equation : y = 0.8838*x - 252.57
between# panel and fee burned- linear equation
# panel(X3) | free burned(y) |
12 | 1759.92 |
10 | 2225 |
23 | 2885 |
20 | 1732.5 |
11 | 1662.5 |
6 | 390 |
10 | 683 |
9 | 1138 |
18 | 683 |
10 | 748 |
9 | 130 |
12 | 163 |
14 | 1108 |
9 | 857.5 |
11 | 928 |
7 | 458 |
8 | 683 |
linear equation : y = 98472*x -80.088
after removing the outlier in the panel area
14459-panel area | 1138- fee burned |
y = 0.4256*x -383.86
between panel area and fee burned after removal outlier
Y = -383.8617 + 0.4256 X1
Source |
DF |
Sum of Squares | Mean Square | F Statistic | P-value |
---|---|---|---|---|---|
Regression (between ŷi andyi bar) |
1 |
5953159.9054 |
5953159.9054 |
26.8510 |
0.0001391 |
Residual (between yi and ŷi) |
14 |
3103955.7000 |
221711.1214 |
||
Total(between yi andyi bar) |
15 |
9057115.6054 |
603807.7070 |
regression: sum of square = ( y hat - y bar)^2
residual : sum of square = (y - y hat )^2
F = mean square of regression/mean square of residual
Coeff |
SE | t-stat | lower t0.025(14) | upper t0.975(14) |
Stand Coeff |
p-value |
VIF |
|
---|---|---|---|---|---|---|---|---|
intercept | -383.8617 | 304.0027 | -1.2627 | -1035.8825 | 268.1592 | 0.000 | 0.2273 | |
slope | 0.4256 | 0.08212 | 5.1818 | 0.2494 | 0.6017 | 0.8107 | 0.0001391 | 1.0000 |
SE of intercept =
X = independent variable
x bar = mean of the independent variable
Se = square root of [(sum of square of residual / (n-2)]
t test of intercept = intercept/ SE
SE of slope =
Syx = square root of [(sum of square of residual / (n-2)]
slope t test = slope/ SE
Y and X relationship
R square (R2) equals 0.6573. It means
that the predictors (Xi) explain 65.7% of the variance
of Y.
Adjusted R square equals 0.6328.
The coefficient of correlation (R) equals 0.8107.
It means that there is a very strong direct relationship between
the predicted data (ŷ) and the observed data (y).
Goodness of fit
Overall regression: right-tailed, F(1,14) = 26.8510,
p-value = 0.0001391. Since p-value < α (0.05), we reject the
H0.
The linear regression model, Y = b0+
b1X1provides a better fit
All the independent variables (Xi) are significant.
The Y-intercept : two-tailed, T = -1.2627, p-value = 0.2273. Hence
intercept is not significantly different from zero. It is still
most likely recommended not to force b to be zero.
between floor area and fee burned - there is no outlier
Y = -252.5679 + 0.8838 X1
Source |
DF |
Sum of Squares | Mean Square | F Statistic | P-value |
---|---|---|---|---|---|
Regression (between ŷi andyi bar) |
1 |
5480818.3840 |
5480818.3840 |
22.9589 |
0.0002378 |
Residual (between yi and ŷi) |
15 |
3580839.9055 |
238722.6604 |
||
Total(between yi andyibar) |
16 |
9061658.2896 |
566353.6431 |
Coeff |
SE | t-stat | lower t0.025(15) | upper t0.975(15) |
Stand Coeff |
p-value |
VIF |
|
---|---|---|---|---|---|---|---|---|
intercept | -252.5679 | 300.8844 | -0.8394 | -893.8878 | 388.7521 | 0.000 | 0.4144 | |
slope | 0.8838 | 0.1844 | 4.7915 | 0.4906 | 1.2769 | 0.7777 | 0.0002378 | 1.0000 |
Y and X relationship
R square (R2) equals 0.6048. It means
that the predictors (Xi) explain 60.5% of the variance
of Y.
Adjusted R square equals 0.5785.
The coefficient of correlation (R) equals 0.7777.
It means that there is a strong direct relationship between the
predicted data (ŷ) and the observed data (y).
Goodness of fit
Overall regression: right-tailed, F(1,15) = 22.9589,
p-value = 0.0002378. Since p-value < α (0.05), we reject the
H0.
The linear regression model, Y = b0+
b1X1 provides a better fit t
All the independent variables (Xi) are significant.
The Y-intercept : two-tailed, T = -0.8394, p-value = 0.4144. Hence
intercept is not significantly different from zero. It is still
most likely recommended not to force b to be zero.
between # panel and fee burned - there is no outlier
y = -80.0880 + 98.4719 X1
Source |
DF |
Sum of Squares | Mean Square | F Statistic | P-value |
---|---|---|---|---|---|
Regression (between ŷi andyibar) |
1 |
3311716.2806 |
3311716.2806 |
8.6393 |
0.01015 |
Residual (between yi and ŷi) |
15 |
5749942.0089 |
383329.4673 |
||
Total(between yi andyibar) |
16 |
9061658.2896 |
566353.6431 |
Coeff |
SE | t-stat | lower t0.025(15) | upper t0.975(15) |
Stand Coeff |
p-value |
VIF |
|
---|---|---|---|---|---|---|---|---|
intercept | -80.0880 | 419.9374 | -0.1907 | -975.1634 | 814.9873 | 0.000 | 0.8513 | |
slope | 98.4719 | 33.5021 | 2.9393 | 27.0639 | 169.8800 | 0.6045 | 0.01015 | 1.0000 |
Y and X relationship
R square (R2) equals 0.3655. It means
that the predictors (Xi) explain 36.5% of the variance
of Y.
Adjusted R square equals 0.3232.
The coefficient of correlation (R) equals 0.6045.
It means that there is a strong direct relationship between the
predicted data (ŷ) and the observed data (y).
Goodness of fit
Overall regression: right-tailed, F(1,15) = 8.6393,
p-value = 0.01015. Since p-value < α (0.05), we reject the
H0.
The linear regression model, Y = b0+
b1X1, provides a better fit.
All the independent variables (Xi) are significant.
The Y-intercept : two-tailed, T = -0.1907, p-value = 0.8513. Hence
intercept is not significantly different from zero. It is still
most likely recommended not to force b to be zero.
so, the best model is panel area and fee burned relationship because there is a strong correlation after removal of the outlier and F value is highest p-value0.0001391.
6)
with three explanatory variable- panel area(x1) , floor area(x2) , # panel(x3) and fee burned
y = -740.43 + 0.0454749*X1 + 0.659653*X2 + 54.6017X3
Variable | Parameter | S.D. | T-STAT H0: parameter = 0 |
2-tail p-value | 1-tail p-value |
(Intercept) | -740.4 | 336.6 | -2.2000e+00 | 0.04651 | 0.02326 |
X1 | +0.04548 | 0.03758 | +1.2100e+00 | 0.2478 | 0.1239 |
X2 | +0.6596 | 0.1887 | +3.4960e+00 | 0.003946 | 0.001973 |
X3 | +54.6 | 26.01 | +2.1000e+00 | 0.05586 | 0.02793 |
Multiple Linear Regression - Regression Statistics | |
Multiple R | 0.8532 |
R-squared | 0.7279 |
Adjusted R-squared | 0.6651 |
F-TEST (value) | 11.59 |
F-TEST (DF numerator) | (4-1) =3 |
F-TEST (DF denominator) | 13 |
p-value | 0.0005619 |
Multiple Linear Regression - Residual Statistics | |
Residual Standard Deviation | 435.5 |
Sum Squared Residuals | 2.465e+06 |
WITH two explanatory variable - floor area and # panel
y = -650.516 + 0.722435X2 + 54.6614V3X3
Variable | Parameter | S.D. | T-STAT H0: parameter = 0 |
2-tail p-value | 1-tail p-value |
(Intercept) | -650.5 | 333.7 | -1.9500e+00 | 0.07155 | 0.03578 |
X2 | +0.7224 | 0.1844 | +3.9170e+00 | 0.001548 | 0.0007738 |
X3 | +54.66 | 26.43 | +2.0680e+00 | 0.05766 | 0.02883 |
Multiple Linear Regression - Regression Statistics | |
Multiple R | 0.835 |
R-squared | 0.6973 |
Adjusted R-squared | 0.654 |
F-TEST (value) | 16.12 |
F-TEST (DF numerator) | 2 |
F-TEST (DF denominator) | 14 |
p-value | 0.0002329 |
Multiple Linear Regression - Residual Statistics | |
Residual Standard Deviation | 442.6 |
Sum Squared Residuals | 2.743e+06 |
Based on F test and slope test of two models we can say that the second model is best ( two variable) because 69 % explained by this two variable if I include third variable help to explained extra 3 %. and coefficient of panel area is insignificant.
y = -740.43 + 0.0454749*X1 + 0.659653*X2 + 54.6017X3
x1 =5000
x2= 2500
x3 =11
y = -740.43 + 0.0454749*5000 + 0.659653*2500 + 54.6017*11
y = 1736.6957