Question

In: Statistics and Probability

Eclipse Engineering provides services of structural engineering. They just opened a new branch in Portland, OR....

Eclipse Engineering provides services of structural engineering. They just opened a new branch in Portland, OR. One of their products is called Structural Insulated Panels, or SIPs. It is a type of foam insulation that replaces typical wall and roof framing for residential or commercial buildings. Clients are billed a fixed fee for each project based on square footage of the building, square footage of the panel area, and number of panels. Depending on how long a project takes to complete, there usually ends up being additional cost not billed to the client (fee burned). Eclipse engineering is using simple linear regression to model their costs. We will investigate the data to find the best model that can be used to predict what the fee burned will be for a particular project!

  1. Create a scatter plot for each explanatory variable and show the linear equation on each of the 3 scatter plots. (Hint: you will have to rearrange your data to create the scatter plot to list x first, then y for each.)
  2. Remove the one obvious outlier you can see in the Panel Area. (Your scatter plots should update automatically.)
  3. Pull 3 simple linear regression models (with outlier removed) and determine which is the best model.
  4. Pull 2 more models. One with all three explanatory variables and one with some combination of 2 of the variables. Identify your best model.
  5. Write out the sample multiple regression equation and use it to predict the fee burned for taking on a project with panel area = 5,000, floor area = 2500, and panels = 11.
Fee Burned Panel Area Floor Area # Panels
1759.92 2832 2482 12
2225 5670 2566 10
2885 6740 2632 23
1732.5 4478 1204 20
1662.5 4303 1680 11
390 2493 805 6
683 1804 1012 10
1138 14459 1296 9
683 2350 1184 18
748 2198 1215 10
130 3215 1209 9
163 2497 987 12
1108 4168 2270 14
857.5 2988 1160 9
928 4481 1853 11
458 1040 326 7
683 3350 1610 8

Please show all working by using Excel formulas/commands. Please write out any explanations where necessary. Please show all formulas and working out. Thank you. Please only use Excel commands and show which was used. Please show where you got your numbers from when talking about equations and etc. Thank you.

Solutions

Expert Solution

SCATTERPLOT

panel area(X1) floor area(X2) # panel(X3) free burned(y)
2832 2482 12 1759.92
5670 2566 10 2225
6740 2632 23 2885
4478 1204 20 1732.5
4303 1680 11 1662.5
2493 805 6 390
1804 1012 10 683
14459 1296 9 1138
2350 1184 18 683
2198 1215 10 748
3215 1209 9 130
2497 987 12 163
4168 2270 14 1108
2988 1160 9 857.5
4481 1853 11 928
1040 326 7 458
3350 1610 8 683

LINEAR EQUATION : Y = a +bx

a = interceot

b = slope

b= (som of x.y * mean of x * mean of y)/ standard deviation of x * standard deviation of y

a = y bar - b * x bar

between panel area and fee burned- linear equation

panel area(X1) fee burned(y)
2832 1759.92
5670 2225
6740 2885
4478 1732.5
4303 1662.5
2493 390
1804 683
14459 1138
2350 683
2198 748
3215 130
2497 163
4168 1108
2988 857.5
4481 928
1040 458
3350 683

linear equation : y= 0.0996*x +667.98

between floor area and fee burned- linear equation

floor area(X2) free burned(y)
2482 1759.92
2566 2225
2632 2885
1204 1732.5
1680 1662.5
805 390
1012 683
1296 1138
1184 683
1215 748
1209 130
987 163
2270 1108
1160 857.5
1853 928
326 458
1610 683

linear equation : y = 0.8838*x - 252.57

between# panel and fee burned- linear equation

# panel(X3) free burned(y)
12 1759.92
10 2225
23 2885
20 1732.5
11 1662.5
6 390
10 683
9 1138
18 683
10 748
9 130
12 163
14 1108
9 857.5
11 928
7 458
8 683

linear equation : y = 98472*x -80.088

after removing the outlier in the panel area

14459-panel area 1138- fee burned

y = 0.4256*x -383.86

between panel area and fee burned after removal outlier

Y = -383.8617 + 0.4256 X1

Source

DF

Sum of Squares Mean Square F Statistic P-value
Regression
(between ŷi andyi bar)

1

5953159.9054

5953159.9054

26.8510

0.0001391

Residual
(between yi and ŷi)

14

3103955.7000

221711.1214

Total(between yi andyi bar)

15

9057115.6054

603807.7070

regression: sum of square = ( y hat - y bar)^2

residual : sum of square = (y - y hat )^2

F = mean square of regression/mean square of residual

Coeff

SE t-stat lower t0.025(14) upper t0.975(14)

Stand Coeff

p-value

VIF

intercept -383.8617 304.0027 -1.2627 -1035.8825 268.1592 0.000 0.2273
slope 0.4256 0.08212 5.1818 0.2494 0.6017 0.8107 0.0001391 1.0000

SE of intercept =

X = independent variable

x bar = mean of the independent variable

Se = square root of [(sum of square of residual / (n-2)]

t test of intercept = intercept/ SE

SE of slope =

Syx = square root of [(sum of square of residual / (n-2)]

slope t test = slope/ SE

Y and X relationship
R square (R2) equals 0.6573. It means that the predictors (Xi) explain 65.7% of the variance of Y.
Adjusted R square equals 0.6328.
The coefficient of correlation (R) equals 0.8107. It means that there is a very strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F(1,14) = 26.8510, p-value = 0.0001391. Since p-value < α (0.05), we reject the H0.
The linear regression model, Y = b0+ b1X1provides a better fit

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -1.2627, p-value = 0.2273. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

between floor area and fee burned - there is no outlier

Y = -252.5679 + 0.8838 X1

Source

DF

Sum of Squares Mean Square F Statistic P-value
Regression
(between ŷi andyi bar)

1

5480818.3840

5480818.3840

22.9589

0.0002378

Residual
(between yi and ŷi)

15

3580839.9055

238722.6604

Total(between yi andyibar)

16

9061658.2896

566353.6431

Coeff

SE t-stat lower t0.025(15) upper t0.975(15)

Stand Coeff

p-value

VIF

intercept -252.5679 300.8844 -0.8394 -893.8878 388.7521 0.000 0.4144
slope 0.8838 0.1844 4.7915 0.4906 1.2769 0.7777 0.0002378 1.0000

Y and X relationship
R square (R2) equals 0.6048. It means that the predictors (Xi) explain 60.5% of the variance of Y.
Adjusted R square equals 0.5785.
The coefficient of correlation (R) equals 0.7777. It means that there is a strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F(1,15) = 22.9589, p-value = 0.0002378. Since p-value < α (0.05), we reject the H0.
The linear regression model, Y = b0+ b1X1 provides a better fit t

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -0.8394, p-value = 0.4144. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

between # panel and fee burned - there is no outlier

y = -80.0880 + 98.4719 X1

Source

DF

Sum of Squares Mean Square F Statistic P-value
Regression
(between ŷi andyibar)

1

3311716.2806

3311716.2806

8.6393

0.01015

Residual
(between yi and ŷi)

15

5749942.0089

383329.4673

Total(between yi andyibar)

16

9061658.2896

566353.6431

Coeff

SE t-stat lower t0.025(15) upper t0.975(15)

Stand Coeff

p-value

VIF

intercept -80.0880 419.9374 -0.1907 -975.1634 814.9873 0.000 0.8513
slope 98.4719 33.5021 2.9393 27.0639 169.8800 0.6045 0.01015 1.0000

Y and X relationship
R square (R2) equals 0.3655. It means that the predictors (Xi) explain 36.5% of the variance of Y.
Adjusted R square equals 0.3232.
The coefficient of correlation (R) equals 0.6045. It means that there is a strong direct relationship between the predicted data (ŷ) and the observed data (y).

Goodness of fit
Overall regression: right-tailed, F(1,15) = 8.6393, p-value = 0.01015. Since p-value < α (0.05), we reject the H0.
The linear regression model, Y = b0+ b1X1, provides a better fit.

All the independent variables (Xi) are significant.

The Y-intercept : two-tailed, T = -0.1907, p-value = 0.8513. Hence intercept is not significantly different from zero. It is still most likely recommended not to force b to be zero.

so, the best model is panel area and fee burned relationship because there is a strong correlation after removal of the outlier and F value is highest p-value0.0001391.

6)

with three explanatory variable- panel area(x1) , floor area(x2) , # panel(x3) and fee burned

y = -740.43 + 0.0454749*X1 + 0.659653*X2 + 54.6017X3

Variable Parameter S.D. T-STAT
H0: parameter = 0
2-tail p-value 1-tail p-value
(Intercept) -740.4 336.6 -2.2000e+00 0.04651 0.02326
X1 +0.04548 0.03758 +1.2100e+00 0.2478 0.1239
X2 +0.6596 0.1887 +3.4960e+00 0.003946 0.001973
X3 +54.6 26.01 +2.1000e+00 0.05586 0.02793
Multiple Linear Regression - Regression Statistics
Multiple R 0.8532
R-squared 0.7279
Adjusted R-squared 0.6651
F-TEST (value) 11.59
F-TEST (DF numerator) (4-1) =3
F-TEST (DF denominator) 13
p-value 0.0005619
Multiple Linear Regression - Residual Statistics
Residual Standard Deviation 435.5
Sum Squared Residuals 2.465e+06

WITH two explanatory variable - floor area and # panel

y = -650.516 + 0.722435X2 + 54.6614V3X3

Variable Parameter S.D. T-STAT
H0: parameter = 0
2-tail p-value 1-tail p-value
(Intercept) -650.5 333.7 -1.9500e+00 0.07155 0.03578
X2 +0.7224 0.1844 +3.9170e+00 0.001548 0.0007738
X3 +54.66 26.43 +2.0680e+00 0.05766 0.02883
Multiple Linear Regression - Regression Statistics
Multiple R 0.835
R-squared 0.6973
Adjusted R-squared 0.654
F-TEST (value) 16.12
F-TEST (DF numerator) 2
F-TEST (DF denominator) 14
p-value 0.0002329
Multiple Linear Regression - Residual Statistics
Residual Standard Deviation 442.6
Sum Squared Residuals 2.743e+06

Based on F test and slope test of two models we can say that the second model is best ( two variable) because 69 % explained by this two variable if I include third variable help to explained extra 3 %. and coefficient of panel area is insignificant.

y = -740.43 + 0.0454749*X1 + 0.659653*X2 + 54.6017X3

x1 =5000

x2= 2500

x3 =11

y = -740.43 + 0.0454749*5000 + 0.659653*2500 + 54.6017*11

y = 1736.6957


Related Solutions

Your new client, Barbara, has just formed a new corporation that provides consulting services to couples...
Your new client, Barbara, has just formed a new corporation that provides consulting services to couples contemplating marriage. She has learned from her accountant that there will be items in her business that cause her financial accounting income to be different than her taxable income. Barbara wants to know what these income and expense items are and how she will compute her corporation’s taxable income. She also needs guidance on when her corporation will need to file taxes and make...
A bank provides financial services for customers. A customer approaches a bank branch and can open...
A bank provides financial services for customers. A customer approaches a bank branch and can open bank account(s). Before opening an account, the customer must fill a form for creating a customer record and then submit the form to a staff member. The system will create a new customer personal record based on the captured customer’s details from an online form. Consider the above processing requirements description of the Bank Management System application Your tasks are:            1.Draw the Entity...
The pottery shop opened a new branch this year. The following figures show the sales volume...
The pottery shop opened a new branch this year. The following figures show the sales volume and advertising expenses. Sales LY $900,000 Sales TY $1,050,000 Sales plan for next year $975,000 Advertising costs LY $28,500 Advertising costs TY $32,000 Advertising plan $27,500 a. What do the sales trends reflect? b. What do the advertising costs reflect? c. What is the percent of increase in sales from last year to next year's plan? d. What is the percent of increase or...
Western Technical Services is a small engineering firm in Colorado that provides a variety of technical...
Western Technical Services is a small engineering firm in Colorado that provides a variety of technical and drafting services. Western Technical Services employs three engineers, five drafters, and three secretaries. Western has developed a leading reputation in the design of low-capacity and medium-capacity heat exchangers for electrical equipment. A large backlog of jobs has accumulated, and the firm has decided it must complete the back orders. The firm wants to maximize the number of jobs finished each day to eliminate...
Anderson Engineering provides consulting services and custom installations of proprietary industrial waste water treatment facilities. They...
Anderson Engineering provides consulting services and custom installations of proprietary industrial waste water treatment facilities. They also operate a repair shop for repairs and maintenance of the treatment products. The repair facility costs include salaries, insurance and depreciation, and materials and parts. The supervisor would like to develop a model that predicts repair costs. Data for the past year is below: Month Jobs Repair costs January 55 $22,000 February 45 18,000 March 40 17,600 April 50 20,000 May 65 24,000...
Roseville Engineering provides watershed and design services for its customers. Total overhead costs this coming year...
Roseville Engineering provides watershed and design services for its customers. Total overhead costs this coming year are expected to be $8,000,000 ($2,000,000 in the Watershed Department, and $6,000,000 in the Design Department). If the company chooses to use the plantwide approach, overhead will be allocated using direct labor costs. Direct labor costs are expected to total $4,000,000. The Watershed Department expects to incur direct labor costs of $500,000, and the Design Department expects to work 120,000 direct labor hours. Required:...
A new nail salon business just opened, in their first week of business they decided that...
A new nail salon business just opened, in their first week of business they decided that thy would conduct a promotion in which a customer's bill can be randomly selected to receive a discount. When a customer's bill is printed, a program in the cash register randomly determines whether the customer will receive a discount on the bill. The program was written to generate a discount with a probability of 0.2, that is, giving a discount to 20 percent of...
Advance Engineering Corporation (AEC) provides paving and bridge maintenance services. Paving requires 7 hour of labor...
Advance Engineering Corporation (AEC) provides paving and bridge maintenance services. Paving requires 7 hour of labor and 2 tons of material, and bridge maintenance requires 2 hour of labor and 12 tons of material. Currently, 28 hours of labor and 24 tons of material are available. Each Paving contract contributes $50 to profit, and each Bridge Contract contributes $100 to profit. Formulate and solve an integer programing problem to maximize AEC’s profit. (Please solve using branch and bound method)
You are a new company that have just opened up. You need to implement a privileged...
You are a new company that have just opened up. You need to implement a privileged user training module for privileged users. What should the awareness training include and how will it be enforced?
A small company just opened a new massage station at Philadelphia airport. The company has a...
A small company just opened a new massage station at Philadelphia airport. The company has a stand that offers massages to travelers.Customers can select a length of massage between 5 and 20 minutes and there is a unique rate of $30 independently of the length selected by customers. The average length of massage requested by customers is of 15 minutes with standard deviation of 10 minutes. There are TWO employees delivering massages. The average number of potential customers requesting a...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT