Question

In: Statistics and Probability

solve using r, include code in your answer (a) Generate 25 variables, each of which consists...

solve using r, include code in your answer

(a) Generate 25 variables, each of which consists of 25 random samples from a standard normal. Store these variables in a data frame – call it df.train – and randomly select one variable to be the response – rename it y. (The end result should be a data frame with 25 observations on 25 variables but with no relationships between any of the variables.)Now  Repeat the same steps  to create a test set called df.test.

(b) Write a loop that will successively linearly regress y on one additional predictor each time through. That is, the first time through the loop you should build a linear model with only one predictor (the first one in your data frame). The ith time through the loop, you should build a linear model where y is regressed on the first i predictors. Record the training and test error each time so that at the end of the procedure you have two vectors (call them MSE.train and MSE.test) that contain the MSEs from each model.

(c) Plot the training and test errors vs the linear model size (number of predictors) on the same plot in different colors. Add a legend to the plot to distinguish them.

(d) What happens to the training error as more predictors are added to the model? What about the test error?

Solutions

Expert Solution

Rcode is given below ( Nice and challenging question. Enjoyed solving it)


Related Solutions

Find data of a process or generate your own data which consists of at least 25...
Find data of a process or generate your own data which consists of at least 25 observations – sample size should be between 3-6. Apply Statistical Process Control method to check whether the process is in control or not. What are the values for each individual observation and mean and R value for each sample? Draw X chart and R chart. Is the process in control? If not, what can be reasons? Hint: The process should contain continuous variables something...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
The code that creates this program using Python: Your program must include: You will generate a...
The code that creates this program using Python: Your program must include: You will generate a random number between 1 and 100 You will repeatedly ask the user to guess a number between 1 and 100 until they guess the random number. When their guess is too high – let them know When their guess is too low – let them know If they use more than 5 guesses, tell them they lose, you only get 5 guesses. And stop...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R. SHOW YOUR CODE AND ANSWERS, USING AN RMD FILE...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R. SHOW YOUR CODE AND ANSWERS, USING AN RMD FILE (SHOW ANSWERS IN R MARKDOWN FORWAT WITH CODE AND ANSWERS) PROBLEM 1 A study of 400 glaucoma patients yields a sample mean of 140 mm and a sample standard deviation of 25 mm for the the following summaries for the systolic blood pressure readings. Construct the 95% and 99% confidence intervals for μ, the population average systolic blood pressure for glaucoma patients. PROBLEM 2...
Using the R package to answer the following two questions. You MUST submit your R code...
Using the R package to answer the following two questions. You MUST submit your R code for analysis. 2. Below are heights for a simple random sample of n = 15 young trees (in cm). (50 pts) 27, 33, 33, 34, 36, 37, 39, 40, 40, 41, 41, 42, 44, 46, 47. (a) Test the hypothesis that the mean tree height is equal to 38 cm. (b) Calculate the 95% confidence interval for the population mean of young trees. (c)...
<<Using R code>> Set seed nuumber as 12345" every time you generate random numbers. For each...
<<Using R code>> Set seed nuumber as 12345" every time you generate random numbers. For each anser, use # to explain if necessary. 3. Use data "thusen" in ibrary ISwR" 3-1) Remove missing observations in the data, name this set as thu1, and print the first 6 and last 6 observations. 3-2) Rename a variable "short.velocity" -> "x", "blood.glucose" -> "y". 3-3) Draw a scatter plot for "y" by "x", give title "velocity vs.glucose". Put tick marks of x-axis at...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R. SHOW YOUR CODE PROBLEM 1 A study of 400...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R. SHOW YOUR CODE PROBLEM 1 A study of 400 glaucoma patients yields a sample mean of 140 mm and a sample standard deviation of 25 mm for the the following summaries for the systolic blood pressure readings. Construct the 95% and 99% confidence intervals for μ, the population average systolic blood pressure for glaucoma patients. PROBLEM 2 Suppose that fasting plasma glucose concentrations (FPG) in some population are normally distributed with a mean...
Answer the following questions using R (include both the input and output for each question). I)...
Answer the following questions using R (include both the input and output for each question). I) A random variable P has a Poisson distribution with a mean of 10. Solve for the probability that random variable P is greater than 8. II) What is the probability that in 30 tosses of a fair coin, the head comes up 10 or 15 times? III) What is the probability that a normal random variable is less than 40, assuming that the variable...
Answer the following questions using R (include both the input and output for each question). I)...
Answer the following questions using R (include both the input and output for each question). I) A random variable P has a Poisson distribution with a mean of 10. Solve for the probability that random variable P is greater than 8. II) What is the probability that in 30 tosses of a fair coin, the head comes up 10 or 15 times? III) What is the probability that a normal random variable is less than 40, assuming that it has...
The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...
The Book of R (Question 20.2) Please answer using R code. Continue using the survey data frame from the package MASS for the next few exercises. The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise. Assuming independence...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT