In: Statistics and Probability
Answer IN R CODE please. Using the data below,
Create a scatterplot of y vs x (show this) and fit it a simple linear regression model using y as the response and plot the regression line (with the data). Show this as well. Test whether x is a significant predictor and create a 95% CI around the slope coefficient. What does the coefficient of determinations represent?
For x=20, create a CI for E(Y|X=20). Show this.
For x=150, can you use the model to estimate E(Y|X=150). Discuss.
Does the model appear to be linear with respect to x. Explain. Discuss, and if not, provide alternative model and repeat steps 1-6.
y |
x |
||
1 |
311.8481 |
30.77326 |
|
2 |
440.9428 |
32.40036 |
|
3 |
41.6744 |
13.89724 |
|
4 |
417.7435 |
30.82836 |
|
5 |
177.3642 |
21.17247 |
|
6 |
639.0727 |
41.70052 |
|
7 |
179.9235 |
20.52949 |
|
8 |
19.64963 |
16.78782 |
|
9 |
1030.218 |
47.05621 |
|
10 |
211.6078 |
24.73312 |
|
11 |
468.797 |
33.30568 |
|
12 |
281.9641 |
27.20706 |
|
13 |
360.4149 |
28.98507 |
|
14 |
626.3254 |
33.98696 |
|
15 |
692.872 |
40.61913 |
|
16 |
840.8116 |
44.14024 |
|
17 |
71.51774 |
14.71966 |
|
18 |
97.75643 |
18.69047 |
|
19 |
251.0697 |
26.53534 |
|
20 |
81.51288 |
19.51529 |
|
21 |
270.3445 |
28.00065 |
|
22 |
1221.873 |
49.81578 |
|
23 |
110.3152 |
20.3347 |
|
24 |
595.4412 |
38.29436 |
|
25 |
126.2188 |
13.26268 |
|
26 |
11.15999 |
16.73084 |
|
27 |
230.5542 |
24.64804 |
|
28 |
77.3025 |
15.99319 |
|
29 |
1117.463 |
48.8532 |
|
30 |
122.5684 |
18.10108 |
|
31 |
932.665 |
44.75007 |
|
32 |
911.0599 |
44.23208 |
|
33 |
255.6625 |
24.33537 |
|
34 |
810.0097 |
41.18667 |
|
35 |
210.4745 |
20.06741 |
|
36 |
9.884425 |
11.10681 |
|
37 |
75.98362 |
11.67823 |
|
38 |
153.6595 |
20.20392 |
|
39 |
578.7254 |
38.05732 |
|
40 |
93.28379 |
12.89079 |
|
41 |
378.1102 |
27.82776 |
|
42 |
203.9408 |
25.8318 |
|
43 |
837.9018 |
43.87759 |
|
44 |
44.45671 |
11.49288 |
|
45 |
1145.79 |
48.94833 |
|
46 |
1073.485 |
47.3091 |
|
47 |
431.1394 |
30.53461 |
|
48 |
343.5504 |
28.65658 |
|
49 |
810.0665 |
41.25828 |
Please provide all relevant work in R code. The commands, the output and any interpretations/conclusions that are necessary.
now code :
### we have to do regression analysis : here y is dependent and x is independent :
y
x
### Create a scatterplot of y vs x (show this) and
## fit it a simple linear regression model using y as the
## response and plot the regression line (with the data)
## scatter plot :
plot(x,y , main = "scatter plot")
## fit regression model :
reg = lm(y~x)
reg
## estimated regression equation : yhat = -420.37 + ( 28.97*x)
summary(reg)
#### Q) Test whether x is a significant predictor and
create
## a 95% CI around the slope coefficient.
## to test for coefficient of x : slope
## to test : Ho : β1 = 0 vs H1 : β1 ≠ 0
## test statistics : t = (b1 - β1) / se b1
b1 = 28.975
seb1 = 1.111
t = b1 / seb1
t
## p value = 2e-16
## Decision : we reject Ho if p value is less than alpha value
using p value approach
## here p value is less than alpha value hence it is significant
.
## Conclusion : slope is significant at given alpha level . that is x is significant .
#### Q1) 95 % confidence interval for slope :
## β1 = ( b1 ± t critical value * seb1 )
t_critical = abs(qt(0.05/2,47))
t_critical
lower_level = 28.975 - ( 2.011741*1.111 )
lower_level
upper_level = 28.975 + ( 2.011741*1.111 )
upper_level
## 95 % confidence interval for β1 = (26.739955 ,31.210044)
#### Q ) What does the coefficient of determinations
represent?
## Answer : The coefficient of determination (denoted by R2) is a
key output of regression
## analysis. It is interpreted as the proportion of the variance in
the dependent variable
## that is predictable from the independent variable. it is lies in
0 to 1 .
#### Q ) For x=20, create a CI for E(Y|X=20). Show this. we can use direct command :
new.dat = data.frame(x=20)
predict(reg , newdata= new.dat ,interval = "confidence")
## lower limit = 126.2353 and upper limit = 192.0172
#### Q) For x=150, can you use the model to estimate E(Y|X=150). Discuss.
new.dat1 = data.frame(x=150)
predict(reg , newdata= new.dat1 ,interval = "confidence")
## Lower limit = 3653.956 , upper limit = 4197.698
#### Q) Does the model appear to be linear with respect to x. Explain. Discuss,
## to test : Ho : overall model is not linear vs H1 : overall model is singificant .
## test statistics = F = 680.5 and p value = 0
## Decision : we reject Ho if p value is less than alpha value
using p value approach
## here p value is less than alpha value we reject Ho
## Conclusion ; there is enough evidence to conclude that
overall model is significant .