In: Statistics and Probability
Crossland Construction’s design/build department allows their clients (owners) to select the design and construction team based on their combined experience and track-record. Design/build firms typically have a cash flow problem since they tend to be paid in lump sums when projects are completed or hit milestones. However, their expenses, such as payroll, must be paid regularly. So, such firms need bank lines of credit to finance their initial costs, but in the past year, lines of credit were difficult to negotiate. The data file Crossland Construction contains month-end cash balances for the past 16 months.
a) Plot the data as a time-series graph. Fit a linear line to the data. Discuss what the graph implies concerning the relationship between cash balance and the time variable, month.
b) Fit a linear trend model to the data. Compute the coefficient of determination. Discuss the appropriateness of the linear trend model. What are the strengths and weaknesses of the model?
c) Referring to part b), compute the MAD and MSE for the 16 data points.
d) Plot the data as a time-series graph. Fit a polynomial line to the data. Discuss what the graph implies concerning the relationship between cash balance and the time variable, month.
e) Use the t2 transformation approach and recompute the linear model using the transformed time variable. Discuss whether this model appears to provide a better fit than did the model without the transformation. Compare the coefficients of determination for the two models. Which model seems to be superior, using the coefficient of determination as the criterion?
f) Referring to part e), compute the MAD and MSE for the 16 data values. Discuss how these compare to those that were computed in part c), prior to transformation. Do the measures of fit (R2, MSE, MAD) agree on the best model to use for forecasting purposes?
g) Use the linear trend model (without transformation) for the first 15 months and provide a cash balance forecast for month 16. Then use the transformation model for the first 15 months and provide a cash balance forecast for month 16. Now, compare the accuracy of the forecasts with and without the transformation. Which of the two forecast models would you prefer? Explain your answer.
Month | Cash Balance |
1 | 75 |
2 | 70 |
3 | 77 |
4 | 89 |
5 | 80 |
6 | 92 |
7 | 91 |
8 | 102 |
9 | 106 |
10 | 130 |
11 | 155 |
12 | 160 |
13 | 180 |
14 | 199 |
15 | 240 |
16 |
305 |
This is what you need to do:
a) you need to fit a linear trend line to the graph
b) you need to run a simple linear regression, with month as your independent variable, and cash balance as your dependent variable. The output will give you the R squared value (the coefficient of determination).
c) to calculate MAD and MSE, set up the following columns
Month, Cash Balance, Forecast, Difference, Difference Squared, Absolute Difference
where month is from 1 to 16 cash balances as given in the dataset forecast (use the simple linear regression equation to calculate for each month, i.e. yhat = intercept + slope (month), so you need to substitute each month into the equation to get each month's forecast difference is between the cash balance and forecast difference squared (square of the difference) absolute difference (absolute value of the difference)
So, now you know the values of all the variables that go into the MAD and MSE formulas, so you can calculate MAD and MSE
d) if you click on the edge of the chart, you will get a + symbol on the right corner of the chart, click on that, in the dropdown menu “trendline” option is there, if you click on that, and then on “more options”, you will see a “polynomial option”, choose that, and you will be able to fit a polynomial curve to the data points on the chart (have already done that, see attached excel sheet).
e) create a new variable, x squared (i.e. month squared), i.e. 1 squared, 2 squared, 3 squared, and so on.....
Run a simple linear regression with month squared as the independent variable, and cash balance as the y variable
The output will give you the R squared value (the coefficient of determination).
Compare the R squared value of b) and e)
f) Calculate MAD and MSE for the model in e)
(procedure is the same as what you did in c))
g) run a linear regression with month (first 15 values) as independent variable, and y as dependent variable
In the yhat equation, enter month as 16 and see how this value compares with the actual 16th value in the dataset
run a linear regression with month squared (first 15 values) as independent variable, and y as dependent variable
(a) We will use excel to answer the first part:(based on "this is what you need to do")
use the following excel command to make trend line:
trend line graph:-
(b) For part b, we can use excel as well as data analysis tool pack( which you can install using add-in program in your excel)
i will discuss data tool method. Use the following command:
data analysis tool pack>regression(input Y=cash bal.,input X=months,tick the line fit plot)
click ok.
SUMMARY OUTPUT | |||||||||
Regression Statistics | |||||||||
Multiple R | 0.909284 | ||||||||
R Square | 0.826797 | ||||||||
Adjusted R Square | 0.814425 | ||||||||
Standard Error | 29.24594 | ||||||||
Observations | 16 | ||||||||
ANOVA | |||||||||
df | SS | MS | F | Significance F | |||||
Regression | 1 | 57161.39 | 57161.39 | 66.83003 | 1.06E-06 | ||||
Residual | 14 | 11974.55 | 855.3249 | ||||||
Total | 15 | 69135.94 | |||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | ||
Intercept | 24.225 | 15.3367 | 1.579545 | 0.136534 | -8.66895 | 57.11895 | -8.66895 | 57.11895 | |
X Variable 1 | 12.96618 | 1.586084 | 8.174964 | 1.06E-06 | 9.564365 | 16.36799 | 9.564365 | 16.36799 | |
RESIDUAL OUTPUT | |||||||||
Observation | Predicted Y | Residuals | |||||||
1 | 37.19118 | 37.80882 | |||||||
2 | 50.15735 | 19.84265 | |||||||
3 | 63.12353 | 13.87647 | |||||||
4 | 76.08971 | 12.91029 | |||||||
5 | 89.05588 | -9.05588 | |||||||
6 | 102.0221 | -10.0221 | |||||||
7 | 114.9882 | -23.9882 | |||||||
8 | 127.9544 | -25.9544 | |||||||
9 | 140.9206 | -34.9206 | |||||||
10 | 153.8868 | -23.8868 | |||||||
11 | 166.8529 | -11.8529 | |||||||
12 | 179.8191 | -19.8191 | |||||||
13 | 192.7853 | -12.7853 | |||||||
14 | 205.7515 | -6.75147 | |||||||
15 | 218.7176 | 21.28235 | |||||||
16 | 231.6868 | 73.3161 | |||||||
(e) analysis of X^2
month(x) | x^2 | cb(y) |
1 | 1 | 75 |
2 | 4 | 70 |
3 | 9 | 77 |
4 | 16 | 89 |
5 | 25 | 80 |
6 | 36 | 92 |
7 | 49 | 91 |
8 | 64 | 102 |
9 | 81 | 106 |
10 | 100 | 130 |
11 | 121 | 155 |
12 | 144 | 160 |
13 | 169 | 180 |
14 | 196 | 199 |
15 | 225 | 240 |
16 | 256 | 305 |
using the same analysis tool pack and by the same method-(for the sake of space)
Regression Statistics
Multiple R | 0.973512 |
R Square | 0.947725 |
Adjusted R Square | 0.943991 |
Standard Error | 16.067 |
Observations | 16 |
here we can see that-
adj R^2 in part b=0.8144(almost 81% variability is explained by model)
adj R ^2 in part e=0.9439(almost 94% variability is explained by the model)
hence we conclude that model 2nd(i.e when we take x as x^2) predicted better than model 1st(when we simply use x)
(g) There must be equal number of dependent and independent values to perform simple linear regression. Although you can perform SLR by putting 16th month as 0.(for the sake of space)
Observation | Predicted Y | Residuals |
1 | 102.3294 | -27.3294 |
2 | 107.2691 | -37.2691 |
3 | 112.2088 | -35.2088 |
4 | 117.1485 | -28.1485 |
5 | 122.0882 | -42.0882 |
6 | 127.0279 | -35.0279 |
7 | 131.9676 | -40.9676 |
8 | 136.9074 | -34.9074 |
9 | 141.8471 | -35.8471 |
10 | 146.7868 | -16.7868 |
11 | 151.7265 | 3.273529 |
12 | 156.6662 | 3.333824 |
13 | 161.6059 | 18.39412 |
14 | 166.5456 | 32.45441 |
15 | 171.4853 | 68.51471 |
16 | 97.38971 | 207.6103 |
Similarly, as we did in the part (e) ,you can perform SLR(simple linear regression) for the first 15 squared month i.e X^2