Question

In: Statistics and Probability

A study considered a sample of 50 observations used to predict SALES. Included in the analysis...

A study considered a sample of 50 observations used to predict SALES. Included in the analysis were 9 predictors variables, ( Independent Variables).

  1. Based on the correlation matrix shown below, is there any concern about Multicollinearity?

Correlations

X 1

    X 2

X 3

X 4

X 5

X 6

X 7

X 8

X 9

X 2        

0.804

X 3

0.625    

0.443

X 4

0.032

0.032

0.231

X 5

0.159

0.214

0.177

-0.194

X 6

0.319

0.373

0.308

0.054

0.293

X 7

-0.016

0.030

0.079

0.168

-0.309

0.067

X 8

-0.026

0.103

0.015

0.151

-0.311

0.059

0.912

X 9

0.169

-0.027

-0.104

0.017

-0.248

0.114

0.174

0.223

SALES

0.764

0.630

0.756

0.149

0.171

0.426

0.145

0.141

-0.068

  • If YES, list all pairs of variables involved:
  • What limit did you use?
  • What action must be taken to eliminate multicollinearity? Be very specific.
  • Which variable appears to have the weakest relationship with SALES?
  1. The following is the initial Regression Printout. Variable X2, and X8 were not included in the regression.
  • Why?
  • Use the F test and a 0.05 level of significance to determine whether the Regression model is significant.

Analysis of Variance

Source

DF

Adj SS

Adj MS

F-Value

Regression

7

145157

20736.7

19.62

Error

42

44385

1056.8

Total

49

189543

  • What is the value of the F Critical Point?

  • How many degrees of freedom did you use?

  • Compute the R-Square:

  • Show your work and compute the Adjusted R-Square:

Term

Coef

SE Coef

T-Value

Constant

-15.8

22.1

-0.71

X 1

11.93

2.34

X 3

5.75

1.77

3.24

X 4

0.000023

0.000096

0.24

X 5

-0.59

2.84

X 6

4.08

2.07

X 7

0.0891

0.0530

1.68

X 9

-0.0302

0.0150

-2.02

  • Based on the Table above, what is the regression equation?

Sales =

  • Three T-Values are missing. Compute them. The T-values are:

For X1 =                               X5 =                                X6 =      

Use the t test and a 0.05 level of significance to determine the significance of each independent variable.

  • What is the value of the T Critical Point?
  • How many degrees of freedom did you use?
  • Would you eliminate any variable from the model?
  • Which variable?

Solutions

Expert Solution

Based on the correlation matrix shown below, is there any concern about Multicollinearity?

Yes , there any concern about Multicollinearity

This pairs of variables involved : Pair ( X1, X2) and Pair ( X7, X8 )

For Pair ( X7 , X8 ) the correlation is 0.912 , which implies they are highly correlated with each other.

Also Pair ( X1 ,X2 ) has correlation = 0.804 , so these pair is also highly correlated


-What limit did you use?
Limit we have used is " > | 0.70 | " for any pair

- What action must be taken to eliminate multicollinearity?  Be very specific.  

The only think can be done is to remove any of one variables of each pair which have lowest relationship with SALES .

Like in pair ( X7 , X8 ) X8 has lowest relationship with SALES ( which is 0.141 ) , so it is better to remove it .

-Which variable appears to have the weakest relationship with SALES?
X9 variable have the weakest relationship with SALES ( which is -0.068 )

-The following is the initial Regression Printout. Variable X2, and X8 were not included in the regression. Why?  

Pair ( X1, X2) and Pair ( X7, X8 ) are highly correlated , among pair ( X1, X2) X2 have the weakest relationship with SALES and among pair ( X7, X8 ) X8 have the weakest relationship with SALES .So remove X2 and X8

Source

DF

Adj SS

Adj MS

F-Value

Regression

7

145157

20736.7

19.62

Error

42

44385

1056.8

Total

49

189543

Use the F test and a 0.05 level of significance to determine whether the Regression model is significant.

To test

H0 : bi = 0 , i = 1,3,4,5,6,7,9    { Model is not significant }

H1 : bi 0    for atleast one i    { Model is significant }

Test Statistics F:

F = MSR / MSRES = 20736.7 / 1056.8 = 19.62216

Thus calculated F- value is 19.62

What is the value of the F Critical Point

It is given by

is F-distributed with df1= 7 and df2=42 degree of freedom and =0.05,

It can be computed from statistical book or more accurately from any software like R,Excel

From R

> qf(1-0.05,df1=7,df2=42)
[1] 2.23707

Thus value of the F Critical Point is 2.23707

We reject null hypothesis if calculated F-value is less than

How many degrees of freedom did you use?

is F-distributed with df1= 7 and df2=42 degree of freedom and =0.05,

Conclusion -

Since F- value = 19.62 > 2.23707 i.e F- value >

So we reject null hypothesis at 5% of level of significance at hence conclude that model is significant.

-Compute the R-Square:

Formula :

R-Square = 1 - SSRES / TSS

                = 1 - 44385 / 189543

            = 0.7658315

Thus R-Square = 0.7658315

-Show your work and compute the Adjusted R-Square:

Formula :

Adjusted R-Square: = 1 - SSRES/ df(Error) / TSS / df(Total)

        or = 1 - SSRES/ (n-k ) / TSS / (n-1)         

Here k = 7        ( number of regressor )

also   n-k = 42    ( given)     

               = 1 - ( 44385 / 42 ) / ( 189543 / 49 )

               = 0.7268034

Thus Adjusted R-Square = 0.7268034

Based on the Table above, what is the regression equation?

Sales = -15.8+11.93*X1+5.75*X3+0.000023*X4-0.59*X5+4.08*X6+0.0891*X7-0.0302*X8

Three T-Values are missing. Compute them. The T-values are:

t-VALUE = coef / SE coef

For X1 =  11.93/2.34 = 5.098291              

For X5 =   -0.59/2.84 =-0.2077465

For X6 =     4.08/2.07 = 1.971014

Term

Coef

SE Coef

T-Value

Constant

-15.8

22.1

-0.71

X 1

11.93

2.34

5.098291

X 3

5.75

1.77

3.24

X 4

0.000023

0.000096

0.24

X 5

-0.59

2.84

-0.2077465

X 6

4.08

2.07

1.971014

X 7

0.0891

0.0530

1.68

X 9

-0.0302

0.0150

-2.02

-Use the t test and a 0.05 level of significance to determine the significance of each independent variable.

-What is the value of the T Critical Point?

value of the T Critical Point is given by

Here n-(k+1) = 42

is t-distribute with df=42 degree of freedom and =0.05,

It can be computed from statistical book or more accurately from any software like R,Excel

From R

> qt(1-0.05/2,42)
[1] 2.018082

Thus value of the T Critical Point is 2.018082

-How many degrees of freedom did you use?

42 degree of freedom      ( defined in above part )

-Would you eliminate any variable from the model?

Here hypothesis to test are

H0 : bj = 0        { j th variable is not significant }

vs

H1 : bi 0    { j th variable contributes significantly to model }

Test Statistics is t-value

where t-value = = coef / SE coef

t-value are obtained for each variables .

We reject null hypothesis if calculated absolute t-value is greater than T Critical Point 2.018082

i.e is | t-value | > 2.018082

We can see variable X1, X3 , X9 have t-value greater than 2.01

Hence we conclude that variable X1 , X3 , X9 contributes significantly to our model .

Other remaining variables can be removed from our model .

-Which variable?

We remove variable X3 , X4 , X5 , X6 and X7


Related Solutions

A study considered a sample of 50 observations used to predict SALES. Included in the analysis...
A study considered a sample of 50 observations used to predict SALES. Included in the analysis were 9 predictors variables, ( Independent Variables). X 1     X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 2         0.804 X 3 0.625      0.443 X 4 0.032 0.032 0.231 X 5 0.159 0.214 0.177 -0.194 X 6 0.319 0.373 0.308 0.054 0.293 X 7 -0.016 0.030 0.079 0.168 -0.309 0.067 X 8 -0.026 0.103 0.015 0.151...
A sample of 50 observations is selected from a normal population. The sample mean is 47,...
A sample of 50 observations is selected from a normal population. The sample mean is 47, and the population standard deviation is 7. Conduct the following test of hypothesis using the 0.10 significance level: H0: μ = 48 H1: μ ≠ 48 a. Is this a one- or two-tailed test? (Click to select)  Two-tailed test  One-tailed test b. What is the decision rule? Reject H0 and accept H1 when z does not lie in the region from  to. c. What is the value...
The sample variance of a random sample of 50 observations from a normal population was found...
The sample variance of a random sample of 50 observations from a normal population was found to be s2 = 80. Can we infer at the 1% significance level (i.e., a = .01) that the population variance is less than 100 (i.e., x < 100) ? Repeat part a changing the sample size to 100 What is the affect of increasing the sample size?
A sample of 1600 observations from a normal distribution has sample mean 50 and sample standard...
A sample of 1600 observations from a normal distribution has sample mean 50 and sample standard deviation 10. a. What is the point estimate for the population mean of X? b. Write a 95% confidence interval for the population mean of X (Use the t table to obtain the critical value and round to two decimal places). ( , ) c. Write a 99% confidence interval for the population mean of X (Use the t table to obtain the critical...
A random sample of 42 observations is used to estimate the population variance. The sample mean...
A random sample of 42 observations is used to estimate the population variance. The sample mean and sample standard deviation are calculated as 74.5 and 5.6, respectively. Assume that the population is normally distributed. a. Construct the 90% interval estimate for the population variance. (Round intermediate calculations to at least 4 decimal places and final answers to 2 decimal places.) b. Construct the 99% interval estimate for the population variance. (Round intermediate calculations to at least 4 decimal places and...
A random sample of 20 observations is used to estimate the population mean. The sample mean...
A random sample of 20 observations is used to estimate the population mean. The sample mean and the sample standard deviation are calculated as 162.5 and 22.60, respectively. Assume that the population is normally distributed. a. Construct the 99% confidence interval for the population mean. (Round intermediate calculations to at least 4 decimal places. Round "t" value to 3 decimal places and final answers to 2 decimal places.) b. Construct the 95% confidence interval for the population mean. (Round intermediate...
A random sample of 29 observations is used to estimate the population mean. The sample mean...
A random sample of 29 observations is used to estimate the population mean. The sample mean and the sample standard deviation are calculated as 130.2 and 29.60, respectively. Assume that the population is normally distributed Construct the 95% confidence interval for the population mean. Construct the 99% confidence interval for the population mean Use your answers to discuss the impact of the confidence level on the width of the interval. As the confidence level increases, the interval becomes wider. As...
A random sample of 27 observations is used to estimate the population mean. The sample mean...
A random sample of 27 observations is used to estimate the population mean. The sample mean and the sample standard deviation are calculated as 113.9 and 20.40, respectively. Assume that the population is normally distributed. [You may find it useful to reference the t table.] a. Construct the 90% confidence interval for the population mean. (Round intermediate calculations to at least 4 decimal places. Round "t" value to 3 decimal places and final answers to 2 decimal places.) b. Construct...
A random sample of 27 observations is used to estimate the population variance. The sample mean...
A random sample of 27 observations is used to estimate the population variance. The sample mean and sample standard deviation are calculated as 44 and 4.5, respectively. Assume that the population is normally distributed. (You may find it useful to reference the appropriate table: chi-square table or F table) a. Construct the 95% interval estimate for the population variance. (Round intermediate calculations to at least 4 decimal places and final answers to 2 decimal places.) b. Construct the 99% interval...
discuss what key Analysis/documentation/items should be included in a)study Analysis Report b)Study Analysis presentation 15marks each...
discuss what key Analysis/documentation/items should be included in a)study Analysis Report b)Study Analysis presentation 15marks each with references
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT