In: Statistics and Probability
solve using r, include code in your answer
(a) Generate 25 variables, each of which consists of 25 random samples from a standard normal. Store these variables in a data frame – call it df.train – and randomly select one variable to be the response – rename it y. (The end result should be a data frame with 25 observations on 25 variables but with no relationships between any of the variables.)Now Repeat the same steps to create a test set called df.test.
(b) Write a loop that will successively linearly regress y on one additional predictor each time through. That is, the first time through the loop you should build a linear model with only one predictor (the first one in your data frame). The ith time through the loop, you should build a linear model where y is regressed on the first i predictors. Record the training and test error each time so that at the end of the procedure you have two vectors (call them MSE.train and MSE.test) that contain the MSEs from each model.
(c) Plot the training and test errors vs the linear model size (number of predictors) on the same plot in different colors. Add a legend to the plot to distinguish them.
(d) What happens to the training error as more predictors are added to the model? What about the test error?