Question

In: Statistics and Probability

the data BaseballTimes contains 4 quantitative variables that might be useful for predicting game "Time". Runs,...

the data BaseballTimes contains 4 quantitative variables that might be useful for predicting game "Time".

Runs, Margin, Pitchers, and Attendance

Game League Runs Margin Pitchers Attendance Time
CLE-DET AL 14 6 6 38774 168
CHI-BAL AL 11 5 5 15398 164
BOS-NYY AL 10 4 11 55058 202
TOR-TAM AL 8 4 10 13478 172
TEX-KC AL 3 1 4 17004 151
OAK-LAA AL 6 4 4 37431 133
MIN-SEA AL 5 1 5 26292 151
CHI-PIT NL 23 5 14 17929 239
LAD-WAS NL 3 1 6 26110 156
FLA-ATL NL 19 1 12 17539 211
CIN-HOU NL 3 1 4 30395 147
MIL-STL NL 12 12 9 41121 185
ARI-SD NL 11 7 10 32104 164
COL-SF NL 9 5 7 32695 180
NYM-PHI NL 15 1 16 45204 317

From among the four predictors choose a model for each of the following goals

a. Maximize the coefficient of determination R^2

b. Maximize the adjusted R^2

c. Minimize Mallow's Cp

d. Considering models a-c, whamt model would you choose to predict game Time and why?

e. Using stepwise procedure(forward backwards, elimination process) find the "best" model

Solutions

Expert Solution

Using the above data , We run the Multiple linear Regression ,

We have to predict Game Time using the predictors Runs, Margin, Pitchers, and Attendance

The model is :

Time = B_0 + B_1(Runs) + B_2(Margin) + B_3(Pitchers) + B_4(Attendance)

Where B_0 , B_1 , B_2 , B_3 and B_4 are the regression coefficients.

Hypothesis :

Ho :  B_0 = B_1 = B_2 = B_3 = B_0 = 0

i.e. All variables are insignificant

V/s

H1 : at leats one coefficient is not Zero.

i.e . the variables are significant.

Calculation table :

coefficients

estimate

t value

p value

B_0

88.0151

4.952

0.00057

B_1

1.5614

0.931

0.3736

B_2

-3.7278

-1.793

0.1032

B_3

8.7322

3.514

0.005594

B_4

0.000726

1.424

0.1848

at 5% level of significance we reject Ho and conclude that the variable Pitchers (B_3) is significant.

therefor the model is ,

Time = B_0 + B_3(Pitchers)

a) Maximize the coefficient of determination (R2) :-

R-squared: 0.8557

i.e. Our model is the good , the all predictors explained the 85.57% variations on dependent variable.

b) Maximize the adjusted R2 :-

Adjusted R-squared: 0.798

i.e That means the 79.8% variation is expalined by only those variable which are statistically significant . (here , our Pitchers variable is statistically significant) .

c) Minimize Mallow's Cp :-

where ,SSE = is the error sum of squares for the model with P regressors,

S2 = is the residual mean square after regression on the complete set of K regressors and can be estimated by mean square error MSE

N = is the sample size.

and P = no. pf regressors.

Using the above formula we calculate Mallow's Cp

Cp = 3

Here , Mallows' Cp-statistic = 3 is the size of the bias that is introduced into the predicted responses by having an underspecified model.

d) Choosed Model is :-

Time = B_0 + B_3(Pitchers)

Because , The coefficeients B_0 and B_3 are significantally affect to prdict the Time.

and this model have the less AIC i.e. 93.83

e) Using stepwise procedure :-

using the stepwise procedure the best model is,

Time = B_0 + B_3(Pitchers)

because the AIC=93.83 . and have a adjusted R2 =0.798

here ,  B_0 = 94.84 and B_3 = 10.71

the model is ,

Time = 94.84 + 10.71(Pitchers)

-----ALL THE BEST-----


Related Solutions

for the quantitative variables x and y are given in the table below. These data are...
for the quantitative variables x and y are given in the table below. These data are plotted in the scatter plot shown next to the table. In the scatter plot, sketch an approximation of the least-squares regression line for the data. x y 7.4 6.6 4.0 5.0 2.0 3.9 3.1 5.0 9.4 8.4 4.7 5.9 6.6 6.2 8.5 6.9 7.9 7.3 1.6 3.7 4.4 4.6 2.8 3.9 2.7 3.1 5.7 6.3 5.9 7.3 9.7 8.3 6.9 6.3 8.8 8.4 x...
Time series are particularly useful to track variables such as revenues, costs, and profits over time....
Time series are particularly useful to track variables such as revenues, costs, and profits over time. Time series models help evaluate performance and make predictions. Consider the following and respond in a minimum of 175 words: Time series decomposition seeks to separate the time series (Y) into 4 components: trend (T), cycle (C), seasonal (S), and irregular (I). What is the difference between these components? The model can be additive or multiplicative.When we do use an additive model? When do...
Describe how might data be collected? Be sure to examine and discuss quantitative and qualitative or...
Describe how might data be collected? Be sure to examine and discuss quantitative and qualitative or mixed methods options.
is a dot diagram useful for observing trends in data over time?
is a dot diagram useful for observing trends in data over time?
What useful accounting data might be captured in a specialized payroll journal, in addition to the...
What useful accounting data might be captured in a specialized payroll journal, in addition to the net wages paid to each employee? In addition to the invoice from the supplier, what other evidence does the accounting department need to collect - and where is this evidence found - in order to safely authorize payment of the invoice?
Regression analysis is usually performed using quantitative data of both dependent and independent variables. It can...
Regression analysis is usually performed using quantitative data of both dependent and independent variables. It can extend to cases where one of the variables (or both) is quantitative. Using scholarly citations, find a study that extends the regression where either dependent, independent, or both are qualitative. Give: a brief description of the study a description of the dependent and independent variables the conclusion.
4. A subway has good service 70% of the time and runs less frequently 30% of...
4. A subway has good service 70% of the time and runs less frequently 30% of the time because of signal problems. When there are signal problems, the amount of time in minutes that you have to wait at the platform is described by the pdf probability density function with signal problems = pT|SP(t) = .1e −.1t. But when there is good service, the amount of time you have to wait at the platform is probability density function with good...
The family college data set contains a sample of 792 cases with two variables, teen and...
The family college data set contains a sample of 792 cases with two variables, teen and parents, and is summarized in Table below. The teen variable is either college or not, where the teenager is labeled as college if she went to college immediately after high school. The parent variable takes the value degree if at least one parent of the teenager completed a college degree. Parents Degree Parents No Degree Total Teen College 231 214 445 Teen Not college...
Section on Data: Categorical and Quantitative Variables 1) How would you rank cities? Various organizations rank...
Section on Data: Categorical and Quantitative Variables 1) How would you rank cities? Various organizations rank cities and produce lists of the 10 or the 100 best based on various measures. Create a list of criteria that you would use to rank cities. Include at least 8 variables, and give reasons for your choices. Say whether each variable is quantitative or categorical.
2. The data set `MLB-TeamBatting-S16.csv` contains MLB Team Batting Data for selected variables. Load the data...
2. The data set `MLB-TeamBatting-S16.csv` contains MLB Team Batting Data for selected variables. Load the data set from the given url using the code below. This data set was obtained from [Baseball Reference](https://www.baseball-reference.com/leagues/MLB/2016-standard-batting.shtml). * Tm - Team    * Lg - League: American League (AL), National League (NL) * BatAge - Batters’ average age * RPG - Runs Scored Per Game * G - Games Played or Pitched * AB - At Bats * R - Runs Scored/Allowed * H...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT