Question

In: Statistics and Probability

the data BaseballTimes contains 4 quantitative variables that might be useful for predicting game "Time". Runs,...

the data BaseballTimes contains 4 quantitative variables that might be useful for predicting game "Time".

Runs, Margin, Pitchers, and Attendance

Game	League	Runs	Margin	Pitchers	Attendance	Time
CLE-DET	AL	14	6	6	38774	168
CHI-BAL	AL	11	5	5	15398	164
BOS-NYY	AL	10	4	11	55058	202
TOR-TAM	AL	8	4	10	13478	172
TEX-KC	AL	3	1	4	17004	151
OAK-LAA	AL	6	4	4	37431	133
MIN-SEA	AL	5	1	5	26292	151
CHI-PIT	NL	23	5	14	17929	239
LAD-WAS	NL	3	1	6	26110	156
FLA-ATL	NL	19	1	12	17539	211
CIN-HOU	NL	3	1	4	30395	147
MIL-STL	NL	12	12	9	41121	185
ARI-SD	NL	11	7	10	32104	164
COL-SF	NL	9	5	7	32695	180
NYM-PHI	NL	15	1	16	45204	317

From among the four predictors choose a model for each of the following goals

a. Maximize the coefficient of determination R^2

b. Maximize the adjusted R^2

c. Minimize Mallow's Cp

d. Considering models a-c, whamt model would you choose to predict game Time and why?

e. Using stepwise procedure(forward backwards, elimination process) find the "best" model

Expert Solution

Using the above data , We run the Multiple linear Regression ,

We have to predict Game Time using the predictors Runs, Margin, Pitchers, and Attendance

The model is :

Time = B_0 + B_1(Runs) + B_2(Margin) + B_3(Pitchers) + B_4(Attendance)

Where B_0 , B_1 , B_2 , B_3 and B_4 are the regression coefficients.

Hypothesis :

Ho : B_0 = B_1 = B_2 = B_3 = B_0 = 0

i.e. All variables are insignificant

V/s

H₁ : at leats one coefficient is not Zero.

i.e . the variables are significant.

Calculation table :

coefficients	estimate	t value	p value
B_0	88.0151	4.952	0.00057
B_1	1.5614	0.931	0.3736
B_2	-3.7278	-1.793	0.1032
B_3	8.7322	3.514	0.005594
B_4	0.000726	1.424	0.1848

at 5% level of significance we reject Ho and conclude that the variable Pitchers (B_3) is significant.

therefor the model is ,

Time = B_0 + B_3(Pitchers)

a) Maximize the coefficient of determination (R²) :-

R-squared: 0.8557

i.e. Our model is the good , the all predictors explained the 85.57% variations on dependent variable.

b) Maximize the adjusted R² :-

Adjusted R-squared: 0.798

i.e That means the 79.8% variation is expalined by only those variable which are statistically significant . (here , our Pitchers variable is statistically significant) .

c) Minimize Mallow's C_p :-

where ,SSE = is the error sum of squares for the model with P regressors,

S² = is the residual mean square after regression on the complete set of K regressors and can be estimated by mean square error MSE

N = is the sample size.

and P = no. pf regressors.

Using the above formula we calculate Mallow's Cp

Cp = 3

Here , Mallows' C_p-statistic = 3 is the size of the bias that is introduced into the predicted responses by having an underspecified model.

d) Choosed Model is :-

Time = B_0 + B_3(Pitchers)

Because , The coefficeients B_0 and B_3 are significantally affect to prdict the Time.

and this model have the less AIC i.e. 93.83

e) Using stepwise procedure :-

using the stepwise procedure the best model is,

Time = B_0 + B_3(Pitchers)

because the AIC=93.83 . and have a adjusted R² =0.798

here , B_0 = 94.84 and B_3 = 10.71

the model is ,

Time = 94.84 + 10.71(Pitchers)

-----ALL THE BEST-----

orchestra answered 2 years ago

for the quantitative variables x and y are given in the table below. These data are...

for the quantitative variables x and y are given in the table below. These data are plotted in the scatter plot shown next to the table. In the scatter plot, sketch an approximation of the least-squares regression line for the data. x y 7.4 6.6 4.0 5.0 2.0 3.9 3.1 5.0 9.4 8.4 4.7 5.9 6.6 6.2 8.5 6.9 7.9 7.3 1.6 3.7 4.4 4.6 2.8 3.9 2.7 3.1 5.7 6.3 5.9 7.3 9.7 8.3 6.9 6.3 8.8 8.4 x...

Time series are particularly useful to track variables such as revenues, costs, and profits over time....

Time series are particularly useful to track variables such as revenues, costs, and profits over time. Time series models help evaluate performance and make predictions. Consider the following and respond in a minimum of 175 words: Time series decomposition seeks to separate the time series (Y) into 4 components: trend (T), cycle (C), seasonal (S), and irregular (I). What is the difference between these components? The model can be additive or multiplicative.When we do use an additive model? When do...

Describe how might data be collected? Be sure to examine and discuss quantitative and qualitative or...

Describe how might data be collected? Be sure to examine and discuss quantitative and qualitative or mixed methods options.

is a dot diagram useful for observing trends in data over time?

What useful accounting data might be captured in a specialized payroll journal, in addition to the...

What useful accounting data might be captured in a specialized payroll journal, in addition to the net wages paid to each employee? In addition to the invoice from the supplier, what other evidence does the accounting department need to collect - and where is this evidence found - in order to safely authorize payment of the invoice?

Regression analysis is usually performed using quantitative data of both dependent and independent variables. It can...

Regression analysis is usually performed using quantitative data of both dependent and independent variables. It can extend to cases where one of the variables (or both) is quantitative. Using scholarly citations, find a study that extends the regression where either dependent, independent, or both are qualitative. Give: a brief description of the study a description of the dependent and independent variables the conclusion.

4. A subway has good service 70% of the time and runs less frequently 30% of...

4. A subway has good service 70% of the time and runs less frequently 30% of the time because of signal problems. When there are signal problems, the amount of time in minutes that you have to wait at the platform is described by the pdf probability density function with signal problems = pT|SP(t) = .1e −.1t. But when there is good service, the amount of time you have to wait at the platform is probability density function with good...

The family college data set contains a sample of 792 cases with two variables, teen and...

The family college data set contains a sample of 792 cases with two variables, teen and parents, and is summarized in Table below. The teen variable is either college or not, where the teenager is labeled as college if she went to college immediately after high school. The parent variable takes the value degree if at least one parent of the teenager completed a college degree. Parents Degree Parents No Degree Total Teen College 231 214 445 Teen Not college...

Section on Data: Categorical and Quantitative Variables 1) How would you rank cities? Various organizations rank...

Section on Data: Categorical and Quantitative Variables 1) How would you rank cities? Various organizations rank cities and produce lists of the 10 or the 100 best based on various measures. Create a list of criteria that you would use to rank cities. Include at least 8 variables, and give reasons for your choices. Say whether each variable is quantitative or categorical.

2. The data set `MLB-TeamBatting-S16.csv` contains MLB Team Batting Data for selected variables. Load the data...

2. The data set `MLB-TeamBatting-S16.csv` contains MLB Team Batting Data for selected variables. Load the data set from the given url using the code below. This data set was obtained from [Baseball Reference](https://www.baseball-reference.com/leagues/MLB/2016-standard-batting.shtml). * Tm - Team * Lg - League: American League (AL), National League (NL) * BatAge - Batters’ average age * RPG - Runs Scored Per Game * G - Games Played or Pitched * AB - At Bats * R - Runs Scored/Allowed * H...

Question

the data BaseballTimes contains 4 quantitative variables that might be useful for predicting game "Time". Runs,...

Solutions

Expert Solution

Related Solutions

for the quantitative variables x and y are given in the table below. These data are...

Time series are particularly useful to track variables such as revenues, costs, and profits over time....

Describe how might data be collected? Be sure to examine and discuss quantitative and qualitative or...

is a dot diagram useful for observing trends in data over time?

What useful accounting data might be captured in a specialized payroll journal, in addition to the...

Regression analysis is usually performed using quantitative data of both dependent and independent variables. It can...

4. A subway has good service 70% of the time and runs less frequently 30% of...

The family college data set contains a sample of 792 cases with two variables, teen and...

Section on Data: Categorical and Quantitative Variables 1) How would you rank cities? Various organizations rank...

2. The data set `MLB-TeamBatting-S16.csv` contains MLB Team Batting Data for selected variables. Load the data...