In: Statistics and Probability
The prices of Rawlston, Inc. stock (y) over a period of 12 days, the number of shares (in 100s) of company's stocks sold (x1), the volume of exchange (in millions) on the New York Stock Exchange (x2), and the daily profit of the company (in thousands) (x3) are shown below.
day | y | x1 | x2 | x3 |
1 | 87.50 | 950 | 11.00 | 40 |
2 | 86.00 | 945 | 11.25 | 45 |
3 | 84.00 | 940 | 11.75 | 27 |
4 | 78.00 | 930 | 11.75 | 22 |
5 | 84.50 | 935 | 12.00 | 34 |
6 | 84.00 | 935 | 13.00 | 51 |
7 | 82.00 | 932 | 13.25 | 43 |
8 | 80.00 | 938 | 14.50 | 41 |
9 | 78.50 | 925 | 15.00 | 45 |
10 | 79.00 | 900 | 16.50 | 42 |
11 | 77.00 | 875 | 17.00 | 35 |
12 | 76.50 | 870 | 17.50 | 34 |
f.) At a 0.05 significance level, determine which variables are significantly adding to the model from part c and which are not.
g.) Compare the p-values from the simple regression in part a and p-value for the x1 variable in the multiple regression in part f and explain what their difference means.
h.) Were there any variables shown in part f that were not significant? If so remove them and give a new regression model. Is it better than in parts a and/or c? Worse? Why? If it is better use this new one for the remaining problems.
i.) If in a given day, the number of shares of the company that were sold was 91,500, the volume of exchange on the New York Stock Exchange was 19 million, and the company had a profit of $47,000 what would you expect the price of the stock to be? Use whichever model you decided was the best in part h. Also give a 95% prediction interval for your prediction for Stock Price from the Excel output of whichever model you chose.
Using R
> data=read.excel() # Read data
> View(data)
#model
> Model=lm(y~.,data=data)
> summary(Model)
Call:
lm(formula = y ~ ., data = data)
Residuals:
Min 1Q Median 3Q Max
-3.3397 -0.8509 0.2164 1.0543 2.1222
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 149.42046 62.60706 2.387 0.0441 *
x1 -0.05378 0.06033 -0.891 0.3988
x2 -1.94061 0.68283 -2.842 0.0217 *
x3 0.21516 0.08547 2.517 0.0360 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.842 on 8 degrees of freedom
Multiple R-squared: 0.8225, Adjusted R-squared: 0.756
F-statistic: 12.36 on 3 and 8 DF, p-value: 0.00226
f.) At a 0.05 significance level, determine which variables are significantly adding to the model from part c and which are not.
---> X2 and X3 and Significant beacuase P-value of both variable is less than 0.05.
g.) Compare the p-values from the simple regression in part a and p-value for the x1 variable in the multiple regression in part f and explain what their difference means.
---->
> SimpleModel=lm(y~x1,data=data)
> summary(SimpleModel)
Call:
lm(formula = y ~ x1, data = data)
Residuals:
Min 1Q Median 3Q Max
-4.1980 -1.0841 0.7839 1.3754 3.0958
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -20.39035 25.21595 -0.809 0.43754
x1 0.11031 0.02731 4.039 0.00237 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.411 on 10 degrees of freedom
Multiple R-squared: 0.62, Adjusted R-squared: 0.582
F-statistic: 16.31 on 1 and 10 DF, p-value: 0.002365
Conclusion- Here difference because when we put x2 and x3 in model then x1 is not significant because x2 and x3 is more important than x1 or more correlated than x1.
h.) Were there any variables shown in part f that were not significant? If so remove them and give a new regression model. Is it better than in parts a and/or c? Worse? Why? If it is better use this new one for the remaining problems.
----> After removing not significant variable x1 new regression model is
> model3=lm(y~x2+x3,data=data)
> summary(model3)
Call:
lm(formula = y ~ x2 + x3, data = data)
Residuals:
Min 1Q Median 3Q Max
-3.3505 -0.8100 0.3582 1.1148 2.0752
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.73127 3.94960 23.732 2e-09 ***
x2 -1.37057 0.23651 -5.795 0.000261 ***
x3 0.16924 0.06741 2.511 0.033282 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.821 on 9 degrees of freedom
Multiple R-squared: 0.8049, Adjusted R-squared: 0.7616
F-statistic: 18.57 on 2 and 9 DF, p-value: 0.0006396
Conclusion : Yes it is better than part a model because Adjusted R-squared: 0.7616 is increases that's why this one is better model than previous .