In: Statistics and Probability
1) Run a regression model to estimate the cost of a building using average storey height (mean centered) and total floor area (mean centered)
2) Run a regression model to estimate the cost of a building using average storey height (mean centered), total floor area (mean centered), and the type of construction (dummy coded with reinforced concrete as the control group)
Interpret the slopes, intercepts, and regression statistics in the models
Building type | Average floor area (m2) | Total floor area (m2) | avg story height(cms) | COST (HK$) |
1 | 1852 | 81478 | 410 | 1467000000 |
1 | 1608 | 64313 | 411 | 1150000000 |
1 | 1430 | 55783 | 403 | 1028000000 |
1 | 1562 | 57794 | 390 | 1100000000 |
1 | 1109 | 37695 | 391 | 728000000 |
1 | 905 | 28048 | 382 | 558000000 |
1 | 1852 | 81478 | 410 | 1467000000 |
1 | 901 | 30617 | 391 | 631000000 |
1 | 1727 | 69062 | 400 | 1223000000 |
1 | 1161 | 37148 | 394 | 761000000 |
1 | 1004 | 37141 | 400 | 713000000 |
1 | 1216 | 38912 | 390 | 784000000 |
1 | 2007 | 88302 | 422 | 1593000000 |
1 | 2983 | 173000 | 440 | 2649000000 |
2 | 1523 | 70080 | 372 | 1210000000 |
2 | 912 | 28286 | 370 | 607000000 |
2 | 1343 | 53715 | 382 | 977000000 |
2 | 1175 | 32908 | 381 | 700000000 |
2 | 1203 | 40902 | 393 | 811000000 |
2 | 1393 | 52951 | 392 | 1001000000 |
2 | 713 | 20681 | 375 | 468000000 |
2 | 1047 | 37681 | 411 | 747000000 |
2 | 1506 | 63270 | 421 | 1156000000 |
2 | 1642 | 70624 | 423 | 1268000000 |
2 | 1848 | 73936 | 403 | 1333000000 |
2 | 1627 | 60190 | 402 | 1162000000 |
2 | 1301 | 40321 | 384 | 864000000 |
2 | 905 | 25330 | 405 | 561000000 |
2 | 1727 | 72514 | 400 | 1303000000 |
2 | 1414 | 52318 | 392 | 1013000000 |
2 | 2001 | 76022 | 431 | 1487000000 |
2 | 400 | 9200 | 380 | 263000000 |
2 | 3100 | 102190 | 454 | 2112000000 |
2 | 1677 | 83860 | 410 | 1519000000 |
2 | 2415 | 130032 | 420 | 2045000000 |
2 | 1555 | 46637 | 410 | 1025000000 |
2 | 792 | 20596 | 420 | 540000000 |
Building Type | ||||
1 | Reinforced Concrete | |||
2 | Steel |
At first I use regression analysis in excel to show the cost of the building depending upon avg storey of building & total floor cost of two type of building.
So y = alpha + beta1 total floor area + beta2 avg storey building + error
Please look at the SPSS output below :-
In the above output model summary table shows the R-square value. That defines goodness of the fit of the model. Higher the value of R^2 higher is the goodness of fit. Here Rsquare = .98 means 98% of the model is defined by the two independent variables and rest 2% by the error.
Now our null hypothesis is H0 : alpha = 0 = beta
alternative Ha : alpha 0 beta
By seeing the p-value at co-efficient table we can say that p value for the constant that is alpha and avg. storey building that beta1 are less than 0.05 at 95% confidence interval & also less than 0.05 in case of total floor area that coefficient of total floor area beta2. so null hypothesis is rejected in case of constant & beta1 beta2. This model is good & significancde to the test. We can have better rsquare value also.So, we perform an another model where we only consider the building type 1.
Please look at the below output of SPSS :-
Here R square is .992. That is 99.2% of the model is derived by the two independent variable. Now p value of alpha , beta1 are greater than 0.05 & less than 0.05 for beta2. so we cannot reject null hypothesis for alpha & beta1 , which is a contradictory result. But here Rsquare value is more so we could have better fit of the model in this case but all the parameters are not significant to the test.