In: Statistics and Probability
# Problem 5: Use R to compute:
# (a)(6 pts) Set three random vectors of 1000 random numbers as
follows:
# x1: from Unif(0,5),
# x2: from Binom(70,0.2), and
# x3: from Exp(1).
# Define y = x1 + 4*x2 + 8*x3 + 10 + Err, where Err ~
N(0,sd=10).
# (b)(6 pts) Create a dataframe df containing these four
variables
# and create scatterplots of all pairs of these four variables.
# (c) Create a multiple regression model for y as a linear function
of
# x1, x2, and x3 and print its summary.
# (d) Run your entire code for parts (a)-(c) several times and keep
an eye
# at summary outputs.
# - Which of the variables has the 'slope' that has the largest
variance?
# - Explain which of the summary parameters describes this?
# - Also, if you look at the corresponding scatter-plots can
you
# intuitively explain why?
Que.a
R codes:
x1=runif(1000,0,5)
x2=rbinom(1000,70,0.2)
x3=rexp(1000,1)
err=rnorm(1000,0,10)
y=x1+4*x2+8*x3+10+err
Que.b
df=data.frame(y,x1,x2,x3)
pairs(df)
Que.c
fit=lm(y~x1+x2+x3)
summary(fit)
Call:
lm(formula = y ~ x1 + x2 + x3)
Residuals:
Min 1Q Median 3Q Max
-39.845 -6.678 -0.379 6.880 26.515
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.05660 1.52263 7.261 7.71e-13 ***
x1 0.96776 0.22269 4.346 1.53e-05 ***
x2 3.96956 0.09681 41.005 < 2e-16 ***
x3 7.57268 0.34585 21.896 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.14 on 996 degrees of freedom
Multiple R-squared: 0.682, Adjusted R-squared: 0.681
F-statistic: 712 on 3 and 996 DF, p-value: < 2.2e-16
Que.d
Second time summary:
Call:
lm(formula = y ~ x1 + x2 + x3)
Residuals:
Min 1Q Median 3Q Max
-31.929 -7.056 0.482 7.096 28.405
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.23254 1.55254 6.591 7.08e-11 ***
x1 1.24208 0.21768 5.706 1.52e-08 ***
x2 3.93675 0.09766 40.310 < 2e-16 ***
x3 7.85921 0.32384 24.269 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.29 on 996 degrees of freedom
Multiple R-squared: 0.6834, Adjusted R-squared: 0.6825
F-statistic: 716.8 on 3 and 996 DF, p-value: < 2.2e-16
Third time summary:
Call:
lm(formula = y ~ x1 + x2 + x3)
Residuals:
Min 1Q Median 3Q Max
-30.0539 -6.7064 -0.1629 6.5287 29.0719
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.65145 1.42878 7.455 1.95e-13 ***
x1 1.02400 0.21292 4.809 1.75e-06 ***
x2 3.91771 0.08943 43.808 < 2e-16 ***
x3 8.22864 0.33107 24.855 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.793 on 996 degrees of freedom
Multiple R-squared: 0.7198, Adjusted R-squared: 0.7189
F-statistic: 852.7 on 3 and 996 DF, p-value: < 2.2e-16
Fourth time summary:
Call:
lm(formula = y ~ x1 + x2 + x3)
Residuals:
Min 1Q Median 3Q Max
-35.798 -7.677 0.038 7.261 28.929
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.9068 1.5575 6.361 3.05e-10 ***
x1 0.8755 0.2260 3.874 0.000114 ***
x2 4.0105 0.1008 39.790 < 2e-16 ***
x3 8.0653 0.3077 26.215 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.34 on 996 degrees of freedom
Multiple R-squared: 0.6987, Adjusted R-squared: 0.6978
F-statistic: 769.8 on 3 and 996 DF, p-value: < 2.2e-16
Variable X3 has largest variance. Because its standard error is greater than other 2 variables.