Question

In: Statistics and Probability

# Problem 5: Use R to compute: # (a)(6 pts) Set three random vectors of 1000...

# Problem 5: Use R to compute:

# (a)(6 pts) Set three random vectors of 1000 random numbers as follows:
# x1: from Unif(0,5),
# x2: from Binom(70,0.2), and
# x3: from Exp(1).
# Define y = x1 + 4*x2 + 8*x3 + 10 + Err, where Err ~ N(0,sd=10).


# (b)(6 pts) Create a dataframe df containing these four variables
# and create scatterplots of all pairs of these four variables.


# (c) Create a multiple regression model for y as a linear function of
# x1, x2, and x3 and print its summary.


# (d) Run your entire code for parts (a)-(c) several times and keep an eye
# at summary outputs.
# - Which of the variables has the 'slope' that has the largest variance?
# - Explain which of the summary parameters describes this?
# - Also, if you look at the corresponding scatter-plots can you
# intuitively explain why?

Solutions

Expert Solution

Que.a

R codes:

x1=runif(1000,0,5)
x2=rbinom(1000,70,0.2)
x3=rexp(1000,1)
err=rnorm(1000,0,10)
y=x1+4*x2+8*x3+10+err

Que.b

df=data.frame(y,x1,x2,x3)
pairs(df)

Que.c

fit=lm(y~x1+x2+x3)
summary(fit)

Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
Min 1Q Median 3Q Max
-39.845 -6.678 -0.379 6.880 26.515

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.05660 1.52263 7.261 7.71e-13 ***
x1 0.96776 0.22269 4.346 1.53e-05 ***
x2 3.96956 0.09681 41.005 < 2e-16 ***
x3 7.57268 0.34585 21.896 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.14 on 996 degrees of freedom
Multiple R-squared: 0.682, Adjusted R-squared: 0.681
F-statistic: 712 on 3 and 996 DF, p-value: < 2.2e-16

Que.d

Second time summary:

Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
Min 1Q Median 3Q Max
-31.929 -7.056 0.482 7.096 28.405

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.23254 1.55254 6.591 7.08e-11 ***
x1 1.24208 0.21768 5.706 1.52e-08 ***
x2 3.93675 0.09766 40.310 < 2e-16 ***
x3 7.85921 0.32384 24.269 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.29 on 996 degrees of freedom
Multiple R-squared: 0.6834, Adjusted R-squared: 0.6825
F-statistic: 716.8 on 3 and 996 DF, p-value: < 2.2e-16

Third time summary:


Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
Min 1Q Median 3Q Max
-30.0539 -6.7064 -0.1629 6.5287 29.0719

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.65145 1.42878 7.455 1.95e-13 ***
x1 1.02400 0.21292 4.809 1.75e-06 ***
x2 3.91771 0.08943 43.808 < 2e-16 ***
x3 8.22864 0.33107 24.855 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.793 on 996 degrees of freedom
Multiple R-squared: 0.7198, Adjusted R-squared: 0.7189
F-statistic: 852.7 on 3 and 996 DF, p-value: < 2.2e-16

Fourth time summary:

Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
Min 1Q Median 3Q Max
-35.798 -7.677 0.038 7.261 28.929

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.9068 1.5575 6.361 3.05e-10 ***
x1 0.8755 0.2260 3.874 0.000114 ***
x2 4.0105 0.1008 39.790 < 2e-16 ***
x3 8.0653 0.3077 26.215 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.34 on 996 degrees of freedom
Multiple R-squared: 0.6987, Adjusted R-squared: 0.6978
F-statistic: 769.8 on 3 and 996 DF, p-value: < 2.2e-16

Variable X3 has largest variance. Because its standard error is greater than other 2 variables.


Related Solutions

The set of all vectors in R 5 whose coordinates sum to zero forms a subspace....
The set of all vectors in R 5 whose coordinates sum to zero forms a subspace. The following vectors are a generating set for the space. u1 = (2, −3, 4, −5, 2) u2 = (−6, 9, −12, 15, −6) u3 = (3, −2, 7, −9, 1) u4 = (2, −8, 2, −2, 6) u5 = (−1, 1, 2, 1 − 3) u6 = (0, −3, −18, 9, 12) u7 = (1, 0, −2, 3, −2) u8 = (2, −1,...
Let (R 3 , ×) be the set of 3d vectors equipped with the operation of...
Let (R 3 , ×) be the set of 3d vectors equipped with the operation of vector crossproduct. Which of the following properties does this operation satisfy (give proofs in all cases)? (a) has identity element(s)? (If so, determine all identity elements.) (b) has idempotent element(s)? (If so, determine all idempotent elements.) (c) commutative? (d) associative?
Use R to compute the probability that a Normal random variable with mean 0 and variance...
Use R to compute the probability that a Normal random variable with mean 0 and variance 3.7 is less than.2. Find the 93% quantile of the above random variable by using the R command and also by using the z -table.
Use R studio to do this problem. This problem uses the wblake data set in the...
Use R studio to do this problem. This problem uses the wblake data set in the alr4 package. This data set includes samples of small mouth bass collected in West Bearskin Lake, Minnesota, in 1991. Interest is in predicting length with age. Finish this problem without using Im() (a) Compute the regression of length on age, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Write a sentence or two...
Generate 1000 random numbers from ??2, 5? starting with standard normal random numbers in R.
Generate 1000 random numbers from ??2, 5? starting with standard normal random numbers in R.
Problem 5 (6 pts). A and B play a series of games. Each game is independently...
Problem 5 (6 pts). A and B play a series of games. Each game is independently won by A with probability p and by B with probability 1 − p. They stop when the total number of wins of one of the players is two greater than that of the other player. The player with the grater number of total wins is declared the winner of the series. Find the probability that a total of 4 games are played.
Directions: Use SPSS to compute the Regression Line. Problem: Using the following set of data and...
Directions: Use SPSS to compute the Regression Line. Problem: Using the following set of data and Excel, compute the regression line. The data set represents the number of hours of training to predict how severe injuries will be if someone is injured playing football. Briefly summarize your findings. Training Injuries Training Injuries 12 8 11 5 3 7 16 7 22 2 14 8 12 5 15 3 11 4 16 7 31 1 22 3 27 5 24 8...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT