Question

In: Math

1) Use worksheet “baseball stats” to perform a multiple regression analysis on the dataset found in...

1) Use worksheet “baseball stats” to perform a multiple regression analysis on the dataset found in BB2011 tab, using Wins as the dependent variable, and League, ERA, Runs Scored, Hits Allowed, Walks Allowed, Saves, and Errors as candidates for the independent variables. Perform the analysis at the 5% significance level.

Team League Wins E.R.A. Runs Scored Hits Allowed Walks Allowed Saves Errors
Baltimore 0 69 4.89 708 1568 535 32 110
Boston 0 90 4.20 875 1366 540 36 92
Chicago White Sox 0 79 4.10 654 1463 439 42 79
Cleveland 0 80 4.23 704 1482 463 38 110
Detroit 0 95 4.04 787 1406 492 52 103
Kansas City 0 71 4.44 730 1487 557 37 95
Los Angeles Angels 0 86 3.57 667 1388 476 39 93
Minnesota 0 63 4.58 619 1564 480 32 119
New York Yankees 0 97 3.73 867 1423 507 47 102
Oakland 0 74 3.71 645 1380 519 39 124
Seattle 0 67 3.90 556 1369 436 39 108
Tampa Bay 0 91 3.58 707 1263 504 32 73
Texas 0 96 3.79 855 1327 461 38 114
Toronto 0 81 4.32 743 1433 540 33 110
Arizona 1 94 3.80 731 1414 442 58 90
Atlanta 1 89 3.48 641 1332 521 52 83
Chicago Cubs 1 71 4.33 654 1439 580 40 134
Cincinnati 1 79 4.16 735 1414 539 39 91
Colorado 1 73 4.43 735 1471 522 41 98
Houston 1 56 4.51 615 1477 560 25 116
Los Angeles Dodgers 1 82 3.54 644 1287 507 40 85
Miami 1 72 3.95 625 1403 500 40 93
Milwaukee 1 96 3.63 721 1348 440 47 111
New York Mets 1 77 4.19 718 1482 514 43 116
Philadelphia 1 102 3.02 713 1320 404 47 74
Pittsburgh 1 72 4.04 610 1513 535 43 112
St. Louis 1 90 3.74 762 1461 448 47 116
San Diego 1 71 3.42 593 1324 521 44 94
San Francisco 1 86 3.20 570 1260 559 52 104
Washington 1 80 3.58 624 1403 477 49 104

a) Create a full write up, where you write your statistical analysis step by step. In your write up, make sure you address the following points.

·        Methodology and steps that you took to get to your final regression equation.

·        Final regression equation output.

·        What is the final regression equation?

·        Interpret all the coefficients in the equation.

·        Speak to whether the signs on the coefficients make sense.

·        Interpret R squared.

·        Include a full residual analysis.

·        What is the residual of the Tampa Bay observation?

b) Now, use tab BB2012 to make predictions of wins in 2012, using the model you created with the 2011 stats.  

·        How many games are the Giants (SFG) expected to win in 2012?

·        Which team is predicted by the model to have the worst record in 2012?

·        Which team is predicted by the model to have the best record in 2012?

Team WINS E.R.A. Runs Scored Hits Allowed Walks Allowed Saves Errors
ARI 81 3.93 734 1432 417 39 91
ATL 94 3.42 700 1310 464 47 89
BAL 93 3.9 712 1433 481 55 109
BOS 69 4.7 734 1449 529 35 100
CHC 61 4.51 613 1399 573 28 105
CHW 85 4.02 748 1365 503 37 70
CIN 97 3.34 669 1356 427 56 89
CLE 68 4.78 667 1503 543 43 97
COL 64 5.22 758 1637 566 36 122
DET 88 3.75 726 1409 438 40 97
HOU 55 4.56 583 1493 540 31 118
KCR 72 4.3 676 1504 542 44 113
LAA 89 4.02 767 1339 483 38 97
LAD 86 3.34 637 1277 539 40 97
MIA 69 4.09 609 1448 495 38 104
MIL 83 4.22 776 1458 525 44 99
MIN 66 4.77 701 1536 465 35 107
NYM 74 4.09 650 1368 488 36 102
NYY 95 3.84 804 1401 431 51 76
OAK 94 3.48 713 1360 462 47 110
PHI 81 3.83 684 1387 409 42 100
PIT 79 3.86 651 1357 490 45 112
SDP 76 4.01 651 1356 539 43 123
SEA 75 3.76 619 1359 449 43 71
SFG 94 3.68 718 1361 489 53 112
STL 88 3.71 765 1420 436 42 107
TBR 90 3.19 697 1233 469 50 113
TEX 93 3.99 808 1378 446 43 86
TOR 73 4.64 716 1439 574 29 102
WSN 98 3.33 731 1296 497 51 96

Solutions

Expert Solution

solution:

Regression line is given as

where

y= dependent/ response variable

= overall mean effect

=effect due to variable x1

=effect due to variable x2

and so on

= random error/ residual effect and

here, the dependent variable or y= Wins

we can minimise the error by using method of least squares, ie,

hence solving further, we get Sum of squares to construct ANOVA and calculate R2 values which are as follows:

ANOVA
df SS MS F Significance F
Regression 7 3465.793 495.1132 34.78095 1.80939E-10
Residual 22 313.174 14.23518
Total 29 3778.967

The R values are:

Regression Statistics89
Multiple R 0.957668
R Square 0.917127
Adjusted R Square 0.890758
Standard Error 3.772954
Observations 30

The regression line is given as:

Wins = 93.70706 - 1.45486 League +-9.65899 E.R.A + 0.084965 Runs Scored -0.02043 Hits Scored -0.03499 Walks Allowed + 0.425586 Saves - 0.04132 Errors +

Here, R2=0.917127 and adjusted R2=0.89075

that indicates that 91%of response variable is explained by the variables of the model.

Now, let us interpret the significance of coefficients

the hypothesis is H01= there is no significance difference between effects of League and similar null hypothesis for other variables respectively. The alrenative hypothesis H11= there is significance between effects of League and similar alternative hypothesis for other variables respectively.

Hence, the decision criteria is to reject H0 if the p-value < 0.05

The following table tells about the coefficients and p-values and confidence intervals of the model :

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 93.70706 20.93209 4.476717 0.000188279 50.29655 137.1176 50.29655 137.1176
League -1.45486 1.720024 -0.84584 0.406749342 -5.02197 2.11225 -5.02197 2.11225
E.R.A. -9.65899 4.368845 -2.21088 0.037743681 -18.7194 -0.59856 -18.7194 -0.59856
Runs Scored 0.084965 0.0097 8.759181 1.26918E-08 0.064848 0.105082 0.064848 0.105082
Hits Allowed -0.02043 0.019296 -1.05872 0.301213942 -0.06045 0.019588 -0.06045 0.019588
Walks Allowed -0.03499 0.020262 -1.72707 0.098172005 -0.07702 0.007027 -0.07702 0.007027
Saves 0.425586 0.143276 2.970398 0.007062248 0.12845 0.722723 0.12845 0.722723
Errors -0.04132 0.054939 -0.75211 0.459958838 -0.15526 0.072616 -0.15526 0.072616

Here, the variables, League, Hit Allowed, Walks Allowed and Errors are significant and ERA, Runs Scored and Saved are insignificant.

The signs of the coefficients do not tell anything about the significance or insignificance of the variable.

The residual for the variables as per the ANOVA table is 14.23518


Related Solutions

i. Use MS Excel Data Analysis ToolPak to perform a multiple regression analysis using Quality as...
i. Use MS Excel Data Analysis ToolPak to perform a multiple regression analysis using Quality as the response variable and Helpfulness and Clarity as the explanatory variables. Write down the corresponding coefficient estimates and provide the regression output. j. Perform an F-test for the overall usefulness of the model in part i) using a 5% significance level. Make sure you follow all the steps for hypothesis testing indicated in the Instructions section and clearly state your conclusion. k. Test manually...
I'm using 2005 NFL stats to come up with a multiple linear regression analysis models with...
I'm using 2005 NFL stats to come up with a multiple linear regression analysis models with the winning percentage being the dependent variable. My question would be, what are the most significant variables that are used in deciding an NFL team's capacity to win? Passing yards, rushing game, defense or field goals are some of my independent variables. But I’m considering adding the defensive stats to the regression. How do I complete the introduction and model subtopics for my presentation?...
I'm using 2005 NFL stats to come up with a multiple linear regression analysis models with...
I'm using 2005 NFL stats to come up with a multiple linear regression analysis models with the winning percentage being the dependent variable. My question would be, what are the most significant variables that are used in deciding an NFL team's capacity to win? Passing yards, rushing game, defense or field goals are some of my independent variables. But I’m considering adding the defensive stats to the regression. How do I complete my presentation subtopics? Presentation: I. Introduction: Summarize your...
Refer to the TV Revenue data set. Perform a complete multiple regression analysis that might be...
Refer to the TV Revenue data set. Perform a complete multiple regression analysis that might be used to predict net revenue using all provided explanatory variables (there are 4 explanatory variables). Complete all steps for the multiple regression as outlined in class and modify the original model if necessary. Use an alpha = .10 for all hypotheses tests. Make sure you show each required step for any hypothesis test. Provide all required Minitab output with your written responses. Obs NetRevenue...
Use the paired dataset given in the below table to conduct a simple regression analysis as...
Use the paired dataset given in the below table to conduct a simple regression analysis as follows: (10 pts) Using these 4 paired data points, fill up the table calculating all of the terms such as (x2, x*y) used in the formulas of this question. Fill up the other column titles on your own and make the calculations. If needed add coloumns. (the table should be written in your answer paper) xi yi xi2 xi*yi 1 2 32 2 1...
1) A multiple regression model to predict nacho sales at a baseball game yields the following...
1) A multiple regression model to predict nacho sales at a baseball game yields the following coefficients: Intercept 1500 Home Team Score 80 Temp. (Degrees F) 100 Home Team Loss? -2000 Assuming that all variables in this model are significant, what would be the expected result of the home team scoring another run? A- There would be no effect B- Nacho sales would increase by 80 C- Nacho sales would decrease by 80 D- Not enough info 2) The regression...
1) Use the following information from a multiple regression analysis to complete parts​ (a) through​ (c)...
1) Use the following information from a multiple regression analysis to complete parts​ (a) through​ (c) below. n =25    b1 =30    b 2 =10    Sb1=8    Sb 2 =6 b) Construct a​ 95% confidence interval estimate of the population​ slope, β1. ____≤β1≤_____ ​(Round to four decimal places as​ needed.) c) At the 0.05 level of​ significance, determine whether each independent variable makes a significant contribution to the regression model. On the basis of these​ results, indicate the independent variables to include...
Question #11 – Regression Analysis Use the data provided to: Perform the “Tests to Check the...
Question #11 – Regression Analysis Use the data provided to: Perform the “Tests to Check the Validity of a Regression” Show both the calculated and critical values Estimate Y when X = 4 (round to 2 decimal places) Use a level of significance of 5% (α = .05). Clearly show the null and alternate hypothesis. Graphs are not required. X Y 3 14 7 26 6 23 4 17 7 28 5 20 8 29 2 11
For the following experiment, you will use the dataset BodyFat in the Sweeney Stats Group. Bodyfat...
For the following experiment, you will use the dataset BodyFat in the Sweeney Stats Group. Bodyfat was collected from a group of 252 CrossFit athletes. The average bodyfat for the American population is 23.4. Test the conclusion that the average bodyfat for Crossfit Athletes is less than the average bodyfat for the American population. Test using = .05. In addition to running the hypothesis test, answer the following question: Would you use this data to claim that CrossFit Athletes are...
The standard project is to use multiple regression analysis to analyze a data set. The data...
The standard project is to use multiple regression analysis to analyze a data set. The data set is a study of student persistent enrolling in the next semester based on Gender, Age, GPA, a 22 questionnaire on self-efficacy, and student enrollment status. The educational researcher wants to study the relationship between student enrollment status as it relates to gender, age, GPA, and the total response to a 22 questionnaire survey. a. The estimated multiple regression analysis equation. b. Does the...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT