Question

In: Statistics and Probability

The dataset Golfers2008.xlsx saved in Datasets in Blackboard contains data on the top 40 golfers in...

The dataset Golfers2008.xlsx saved in Datasets in Blackboard contains data on the top 40 golfers in 2008. This was the year when Tiger Woods won the U.S. Open in June and then had year-ending surgery.

Using all the explanatory variables, run a regression predicting Earnings per Round.

Determine the best fit model by removing any insignificant x-variables.   Rerun the analysis with your best fit model. Make a clear notation of which model is your best-fit model by labeling the worksheet of that model “BEST FIT MODEL”.

Age Events Rounds Cuts Made Top 10s Wins Earnings per Round
45 23 82 18 8 3 $80,501
32 6 23 6 6 4 $251,087
37 21 79 20 8 2 $65,682
28 19 70 18 6 1 $69,403
47 26 97 24 7 3 $48,080
22 22 81 19 8 2 $57,485
26 22 78 19 7 2 $56,701
36 15 51 12 6 2 $84,579
35 23 85 19 7 1 $46,815
35 25 96 24 8 1 $41,079
36 28 108 27 9 0 $33,395
38 26 94 23 9 0 $36,763
31 25 81 16 5 1 $37,400
38 26 88 20 8 0 $34,320
31 20 64 14 6 1 $45,002
38 21 72 16 5 1 $37,270
31 22 80 18 5 0 $32,697
43 26 98 22 6 0 $26,340
28 22 70 14 3 1 $36,660
38 16 50 11 5 1 $50,746
30 29 110 25 5 1 $22,841
37 23 84 21 7 0 $29,579
41 22 74 16 6 0 $32,950
34 28 95 19 7 0 $25,313
34 24 83 19 5 1 $28,901
27 27 94 21 4 1 $24,515
44 24 83 19 7 0 $27,539
39 33 116 24 5 0 $19,301
39 22 74 15 6 0 $29,984
26 27 87 18 5 0 $25,389
36 31 103 20 6 1 $21,413
26 26 86 19 3 1 $25,188
44 30 107 24 6 0 $20,060
28 32 119 27 6 0 $17,599
25 25 82 16 3 1 $25,486
27 20 67 15 3 1 $30,815
36 30 114 26 3 0 $17,893
29 28 89 16 3 0 $22,465
27 15 50 12 3 1 $39,583
34 27 91 18 5 0 $21,648

Solutions

Expert Solution

Solution:

Here, we have to use the regression model by using excel for the prediction of dependent or response variable earnings per round based on the all independent variables given in the data set.

Required regression model is given as below:

Regression Statistics

Multiple R

0.889899769

R Square

0.791921598

Adjusted R Square

0.754089162

Standard Error

18716.83797

Observations

40

ANOVA

df

SS

MS

F

Significance F

Regression

6

43998116543

7333019424

20.93234451

5.89276E-10

Residual

33

11560560782

350320023.7

Total

39

55558677325

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

133191.7361

27969.62313

4.76201397

3.71311E-05

76287.11022

190096.3619

Age

-217.8285416

550.5505735

-0.395655825

0.694905337

-1337.9321

902.2750166

Events

-15255.38379

4171.90591

-3.656694116

0.000881233

-23743.19014

-6767.577442

Rounds

5162.007669

1717.492414

3.005549037

0.005034762

1667.743098

8656.272241

Cuts Made

-10206.14138

3537.932973

-2.884775222

0.006850496

-17404.1201

-3008.162659

Top 10s

4668.563854

2301.427516

2.028551333

0.050637046

-13.72560943

9350.853317

Wins

15009.16535

3925.005194

3.823986112

0.00055291

7023.682284

22994.64842

For above regression model, two independent variables such as age and Top 10s are not statistically significant as their corresponding P-values are greater than the 5% level of significance or alpha value 0.05.

So, we will remove these two independent variables from this regression model and rerun this regression again.

After rerunning the regression model by using excel, we get the following regression model:

Regression Statistics

Multiple R

0.874288986

R Square

0.76438123

Adjusted R Square

0.737453371

Standard Error

19339.57245

Observations

40

ANOVA

df

SS

MS

F

Significance F

Regression

4

42468010134

10617002534

28.38626048

1.48365E-10

Residual

35

13090667191

374019062.6

Total

39

55558677325

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

144725.6433

23535.56187

6.149232555

4.91279E-07

96945.91281

192505.3737

Events

-15169.03589

4276.771411

-3.546842801

0.001131595

-23851.34338

-6486.728396

Rounds

4666.583541

1743.158061

2.677085713

0.011227519

1127.784563

8205.382518

Cuts Made

-7718.911929

3428.149071

-2.251626685

0.03072514

-14678.42449

-759.3993655

Wins

15911.90544

4023.767478

3.954479359

0.000356073

7743.223232

24080.58765

The final regression equation is given as below:

Earnings per round = 144725.6433 - 15169.03589*events + 4666.583541*Rounds - 7718.911929*Cuts made + 15911.90544*Wins


Related Solutions

Use "PLUC" data and the description for the dataset on the blackboard. Is there sufficient evidence...
Use "PLUC" data and the description for the dataset on the blackboard. Is there sufficient evidence that the population mean of "PLUC.pre" is different than that of "PLUC.post"? Use R to find the p-value for the test. ***Answer is 8.739E-08*** PLUC.post PLUC.pre 6.483496 8.078464 8.607279 8.539505 12.41932 13.32073 11.72048 8.640824 12.26601 9.979111 11.15877 8.878284 7.527066 10.6834 10.60626 8.728163 6.276827 10.68463 10.1099 11.35035 6.520483 12.71441 11.91813 8.892171 10.66304 8.830107 9.777328 10.50259 9.220989 5.537055 11.79612 8.710783 11.11839 12.75601 8.965028 6.423624 11.48719 3.823811...
Use "PLUC" data and the description for the dataset on the blackboard. Conduct a two sample...
Use "PLUC" data and the description for the dataset on the blackboard. Conduct a two sample independent t test to test if the population means of heights of male is higher than that of female. Use R to calculate the p-value. ***Answer is 0.8974*** sex hgt m 45.68187 m 54.76593 m 43.80479 f 46.1765 m 57.60508 f 40.02826 f 52.50647 f 43.14426 m 45.27999 m 41.95513 m 43.67319 f 58.09449 m 42.47022 f 55.91853 m 44.01857 f 43.25757 m 57.4945...
Use "PLUC" data and the description for the dataset on the blackboard. Conduct a two sample...
Use "PLUC" data and the description for the dataset on the blackboard. Conduct a two sample independent t test to test if the population means of heights of male is higher than that of female. Use R to calculate the p-value. ***Answer is 0.8974*** sex hgt m 45.68187 m 54.76593 m 43.80479 f 46.1765 m 57.60508 f 40.02826 f 52.50647 f 43.14426 m 45.27999 m 41.95513 m 43.67319 f 58.09449 m 42.47022 f 55.91853 m 44.01857 f 43.25757 m 57.4945...
R has a number of datasets built in. One such dataset is called mtcars. This data...
R has a number of datasets built in. One such dataset is called mtcars. This data set contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models) as reported in a 1974 issue of Motor Trend Magazine. We do not have to read in these built-in datasets. We can just attach the variables by using the code attach(mtcars) We can just type in mtcars and see the entire dataset. We can see the variable...
Download the dataset CARS1 from BlackBoard. a. Do not worry about outliers. Assume the data is...
Download the dataset CARS1 from BlackBoard. a. Do not worry about outliers. Assume the data is correct and any outliers will remain in the dataset. b. Do scatterplot and analyze the results. c. Test for correlation (correlation coefficient) d. Regress weight (column 2) against gas mileage in the city (column 1). Make sure you make gas mileage the dependent (Y) variable. e. Determine and fully explain R2 MPG City Weight 19 3545 23 2795 23 2600 19 3515 23 3245...
Load the USArrests sample dataset from the built-in datasets (data(USArrests)) into R using a dataframe (Note:...
Load the USArrests sample dataset from the built-in datasets (data(USArrests)) into R using a dataframe (Note: Row names are states, not numerical values!). Use the kmeans package to perform a clustering of the data with increasing values of k from 2 to 10 - you will need to decide whether or not to center/scale the observations - justify your choice. Plot the within-cluster sum of squares for each value of k - what is the optimal number of clusters? Use...
The file HW_05.xlsx contains data from a survey of 105 randomly selected households. a. Interpret the...
The file HW_05.xlsx contains data from a survey of 105 randomly selected households. a. Interpret the ANOVA table for this model. In particular, does this set of independent variables provide at least some power in explaining the variation in the dependent variable? Report the F ratio statistics and p- value for this hypothesis test. b. Interpret coefficients of independent variables in the model. c. Using the regression output, determine which of the independent variables should be excluded from the regression...
You will be performing an analysis on a dataset that contains data on fertility and life...
You will be performing an analysis on a dataset that contains data on fertility and life expectancy for 198 different countries. All data is from the year 2013. The fertility numbers are the average number of children per woman in each of the countries. The life expectancy numbers are the average life expectancy in each of the countries. You will be turning in a paper that should include section headings, graphics and tables when appropriate and complete sentences which explain...
The file P02_35.xlsx contains data from a survey of 500 randomly selected households. a. Suppose you...
The file P02_35.xlsx contains data from a survey of 500 randomly selected households. a. Suppose you decide to generate a systematic random sample of size 25 from this population of data. How many such samples are there? What is the mean of Debt for each of the first three such samples, using the data in the order given? b. If you wanted to estimate the (supposedly unknown) population mean of Debt from a systematic random sample as in part a,...
The file P08_06.xlsx contains data on repetitive task times for each of two workers. John has...
The file P08_06.xlsx contains data on repetitive task times for each of two workers. John has been doing this task for months, whereas Fred has just started. Each time listed is the time (in seconds) to perform a routine task on an assembly line. The times shown are in chronological order. a. Calculate a 95% confidence interval for the standard deviation of times for John. Do the same for Fred. What do these indicate? b. Given that these times are...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT