In: Statistics and Probability
Answer IN R CODE please. Using the data below,
Create a scatterplot of y vs x (show this) and fit it a simple linear regression model using y as the response and plot the regression line (with the data). Show this as well. Test whether x is a significant predictor and create a 95% CI around the slope coefficient. What does the coefficient of determinations represent?
For x=20, create a CI for E(Y|X=20). Show this.
For x=150, can you use the model to estimate E(Y|X=150). Discuss.
Does the model appear to be linear with respect to x. Explain. Discuss, and if not, provide alternative model and repeat steps 1-6.
| 
 y  | 
 x  | 
||
| 
 1  | 
 311.8481  | 
 30.77326  | 
|
| 
 2  | 
 440.9428  | 
 32.40036  | 
|
| 
 3  | 
 41.6744  | 
 13.89724  | 
|
| 
 4  | 
 417.7435  | 
 30.82836  | 
|
| 
 5  | 
 177.3642  | 
 21.17247  | 
|
| 
 6  | 
 639.0727  | 
 41.70052  | 
|
| 
 7  | 
 179.9235  | 
 20.52949  | 
|
| 
 8  | 
 19.64963  | 
 16.78782  | 
|
| 
 9  | 
 1030.218  | 
 47.05621  | 
|
| 
 10  | 
 211.6078  | 
 24.73312  | 
|
| 
 11  | 
 468.797  | 
 33.30568  | 
|
| 
 12  | 
 281.9641  | 
 27.20706  | 
|
| 
 13  | 
 360.4149  | 
 28.98507  | 
|
| 
 14  | 
 626.3254  | 
 33.98696  | 
|
| 
 15  | 
 692.872  | 
 40.61913  | 
|
| 
 16  | 
 840.8116  | 
 44.14024  | 
|
| 
 17  | 
 71.51774  | 
 14.71966  | 
|
| 
 18  | 
 97.75643  | 
 18.69047  | 
|
| 
 19  | 
 251.0697  | 
 26.53534  | 
|
| 
 20  | 
 81.51288  | 
 19.51529  | 
|
| 
 21  | 
 270.3445  | 
 28.00065  | 
|
| 
 22  | 
 1221.873  | 
 49.81578  | 
|
| 
 23  | 
 110.3152  | 
 20.3347  | 
|
| 
 24  | 
 595.4412  | 
 38.29436  | 
|
| 
 25  | 
 126.2188  | 
 13.26268  | 
|
| 
 26  | 
 11.15999  | 
 16.73084  | 
|
| 
 27  | 
 230.5542  | 
 24.64804  | 
|
| 
 28  | 
 77.3025  | 
 15.99319  | 
|
| 
 29  | 
 1117.463  | 
 48.8532  | 
|
| 
 30  | 
 122.5684  | 
 18.10108  | 
|
| 
 31  | 
 932.665  | 
 44.75007  | 
|
| 
 32  | 
 911.0599  | 
 44.23208  | 
|
| 
 33  | 
 255.6625  | 
 24.33537  | 
|
| 
 34  | 
 810.0097  | 
 41.18667  | 
|
| 
 35  | 
 210.4745  | 
 20.06741  | 
|
| 
 36  | 
 9.884425  | 
 11.10681  | 
|
| 
 37  | 
 75.98362  | 
 11.67823  | 
|
| 
 38  | 
 153.6595  | 
 20.20392  | 
|
| 
 39  | 
 578.7254  | 
 38.05732  | 
|
| 
 40  | 
 93.28379  | 
 12.89079  | 
|
| 
 41  | 
 378.1102  | 
 27.82776  | 
|
| 
 42  | 
 203.9408  | 
 25.8318  | 
|
| 
 43  | 
 837.9018  | 
 43.87759  | 
|
| 
 44  | 
 44.45671  | 
 11.49288  | 
|
| 
 45  | 
 1145.79  | 
 48.94833  | 
|
| 
 46  | 
 1073.485  | 
 47.3091  | 
|
| 
 47  | 
 431.1394  | 
 30.53461  | 
|
| 
 48  | 
 343.5504  | 
 28.65658  | 
|
| 
 49  | 
 810.0665  | 
 41.25828  | 
Please provide all relevant work in R code. The commands, the output and any interpretations/conclusions that are necessary.







now code :
### we have to do regression analysis : here y is dependent and x is independent :
y
x
### Create a scatterplot of y vs x (show this) and
## fit it a simple linear regression model using y as the
## response and plot the regression line (with the data)
## scatter plot :
plot(x,y , main = "scatter plot")
## fit regression model :
reg = lm(y~x)
reg
## estimated regression equation : yhat = -420.37 + ( 28.97*x)
summary(reg)
#### Q) Test whether x is a significant predictor and
create
## a 95% CI around the slope coefficient.
## to test for coefficient of x : slope
## to test : Ho : β1 = 0 vs H1 : β1 ≠ 0
## test statistics : t = (b1 - β1) / se b1
b1 = 28.975
seb1 = 1.111
t = b1 / seb1
t
## p value = 2e-16
## Decision : we reject Ho if p value is less than alpha value
using p value approach
## here p value is less than alpha value hence it is significant
.
## Conclusion : slope is significant at given alpha level . that is x is significant .
#### Q1) 95 % confidence interval for slope :
## β1 = ( b1 ± t critical value * seb1 )
t_critical = abs(qt(0.05/2,47))
t_critical
lower_level = 28.975 - ( 2.011741*1.111 )
lower_level
upper_level = 28.975 + ( 2.011741*1.111 )
upper_level
## 95 % confidence interval for β1 = (26.739955 ,31.210044)
#### Q ) What does the coefficient of determinations
represent?
## Answer : The coefficient of determination (denoted by R2) is a
key output of regression
## analysis. It is interpreted as the proportion of the variance in
the dependent variable
## that is predictable from the independent variable. it is lies in
0 to 1 .
  
#### Q ) For x=20, create a CI for E(Y|X=20). Show this. we can use direct command :
new.dat = data.frame(x=20)
predict(reg , newdata= new.dat ,interval = "confidence")
## lower limit = 126.2353 and upper limit = 192.0172
#### Q) For x=150, can you use the model to estimate E(Y|X=150). Discuss.
new.dat1 = data.frame(x=150)
predict(reg , newdata= new.dat1 ,interval = "confidence")
## Lower limit = 3653.956 , upper limit = 4197.698
#### Q) Does the model appear to be linear with respect to x. Explain. Discuss,
## to test : Ho : overall model is not linear vs H1 : overall model is singificant .
## test statistics = F = 680.5 and p value = 0
## Decision : we reject Ho if p value is less than alpha value
using p value approach
## here p value is less than alpha value we reject Ho
## Conclusion ; there is enough evidence to conclude that
overall model is significant .