In: Statistics and Probability
The dataset Golfers2008.xlsx saved in Datasets in Blackboard contains data on the top 40 golfers in 2008. This was the year when Tiger Woods won the U.S. Open in June and then had year-ending surgery.
Using all the explanatory variables, run a regression predicting Earnings per Round.
Determine the best fit model by removing any insignificant x-variables. Rerun the analysis with your best fit model. Make a clear notation of which model is your best-fit model by labeling the worksheet of that model “BEST FIT MODEL”.
Age | Events | Rounds | Cuts Made | Top 10s | Wins | Earnings per Round |
45 | 23 | 82 | 18 | 8 | 3 | $80,501 |
32 | 6 | 23 | 6 | 6 | 4 | $251,087 |
37 | 21 | 79 | 20 | 8 | 2 | $65,682 |
28 | 19 | 70 | 18 | 6 | 1 | $69,403 |
47 | 26 | 97 | 24 | 7 | 3 | $48,080 |
22 | 22 | 81 | 19 | 8 | 2 | $57,485 |
26 | 22 | 78 | 19 | 7 | 2 | $56,701 |
36 | 15 | 51 | 12 | 6 | 2 | $84,579 |
35 | 23 | 85 | 19 | 7 | 1 | $46,815 |
35 | 25 | 96 | 24 | 8 | 1 | $41,079 |
36 | 28 | 108 | 27 | 9 | 0 | $33,395 |
38 | 26 | 94 | 23 | 9 | 0 | $36,763 |
31 | 25 | 81 | 16 | 5 | 1 | $37,400 |
38 | 26 | 88 | 20 | 8 | 0 | $34,320 |
31 | 20 | 64 | 14 | 6 | 1 | $45,002 |
38 | 21 | 72 | 16 | 5 | 1 | $37,270 |
31 | 22 | 80 | 18 | 5 | 0 | $32,697 |
43 | 26 | 98 | 22 | 6 | 0 | $26,340 |
28 | 22 | 70 | 14 | 3 | 1 | $36,660 |
38 | 16 | 50 | 11 | 5 | 1 | $50,746 |
30 | 29 | 110 | 25 | 5 | 1 | $22,841 |
37 | 23 | 84 | 21 | 7 | 0 | $29,579 |
41 | 22 | 74 | 16 | 6 | 0 | $32,950 |
34 | 28 | 95 | 19 | 7 | 0 | $25,313 |
34 | 24 | 83 | 19 | 5 | 1 | $28,901 |
27 | 27 | 94 | 21 | 4 | 1 | $24,515 |
44 | 24 | 83 | 19 | 7 | 0 | $27,539 |
39 | 33 | 116 | 24 | 5 | 0 | $19,301 |
39 | 22 | 74 | 15 | 6 | 0 | $29,984 |
26 | 27 | 87 | 18 | 5 | 0 | $25,389 |
36 | 31 | 103 | 20 | 6 | 1 | $21,413 |
26 | 26 | 86 | 19 | 3 | 1 | $25,188 |
44 | 30 | 107 | 24 | 6 | 0 | $20,060 |
28 | 32 | 119 | 27 | 6 | 0 | $17,599 |
25 | 25 | 82 | 16 | 3 | 1 | $25,486 |
27 | 20 | 67 | 15 | 3 | 1 | $30,815 |
36 | 30 | 114 | 26 | 3 | 0 | $17,893 |
29 | 28 | 89 | 16 | 3 | 0 | $22,465 |
27 | 15 | 50 | 12 | 3 | 1 | $39,583 |
34 | 27 | 91 | 18 | 5 | 0 | $21,648 |
Solution:
Here, we have to use the regression model by using excel for the prediction of dependent or response variable earnings per round based on the all independent variables given in the data set.
Required regression model is given as below:
Regression Statistics |
||||||
Multiple R |
0.889899769 |
|||||
R Square |
0.791921598 |
|||||
Adjusted R Square |
0.754089162 |
|||||
Standard Error |
18716.83797 |
|||||
Observations |
40 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
6 |
43998116543 |
7333019424 |
20.93234451 |
5.89276E-10 |
|
Residual |
33 |
11560560782 |
350320023.7 |
|||
Total |
39 |
55558677325 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
133191.7361 |
27969.62313 |
4.76201397 |
3.71311E-05 |
76287.11022 |
190096.3619 |
Age |
-217.8285416 |
550.5505735 |
-0.395655825 |
0.694905337 |
-1337.9321 |
902.2750166 |
Events |
-15255.38379 |
4171.90591 |
-3.656694116 |
0.000881233 |
-23743.19014 |
-6767.577442 |
Rounds |
5162.007669 |
1717.492414 |
3.005549037 |
0.005034762 |
1667.743098 |
8656.272241 |
Cuts Made |
-10206.14138 |
3537.932973 |
-2.884775222 |
0.006850496 |
-17404.1201 |
-3008.162659 |
Top 10s |
4668.563854 |
2301.427516 |
2.028551333 |
0.050637046 |
-13.72560943 |
9350.853317 |
Wins |
15009.16535 |
3925.005194 |
3.823986112 |
0.00055291 |
7023.682284 |
22994.64842 |
For above regression model, two independent variables such as age and Top 10s are not statistically significant as their corresponding P-values are greater than the 5% level of significance or alpha value 0.05.
So, we will remove these two independent variables from this regression model and rerun this regression again.
After rerunning the regression model by using excel, we get the following regression model:
Regression Statistics |
||||||
Multiple R |
0.874288986 |
|||||
R Square |
0.76438123 |
|||||
Adjusted R Square |
0.737453371 |
|||||
Standard Error |
19339.57245 |
|||||
Observations |
40 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
4 |
42468010134 |
10617002534 |
28.38626048 |
1.48365E-10 |
|
Residual |
35 |
13090667191 |
374019062.6 |
|||
Total |
39 |
55558677325 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
144725.6433 |
23535.56187 |
6.149232555 |
4.91279E-07 |
96945.91281 |
192505.3737 |
Events |
-15169.03589 |
4276.771411 |
-3.546842801 |
0.001131595 |
-23851.34338 |
-6486.728396 |
Rounds |
4666.583541 |
1743.158061 |
2.677085713 |
0.011227519 |
1127.784563 |
8205.382518 |
Cuts Made |
-7718.911929 |
3428.149071 |
-2.251626685 |
0.03072514 |
-14678.42449 |
-759.3993655 |
Wins |
15911.90544 |
4023.767478 |
3.954479359 |
0.000356073 |
7743.223232 |
24080.58765 |
The final regression equation is given as below:
Earnings per round = 144725.6433 - 15169.03589*events + 4666.583541*Rounds - 7718.911929*Cuts made + 15911.90544*Wins