In: Statistics and Probability
To import the Carseats dataset into Rstudio:
library("ISLR")
data(Carseats)
view(Carseats)
Then, provide necessary codes for the following:
a. Split the data into a training set and a test set.
b. Fit a linear model using least squares on the training set to predict Sales using the entire collection of predictors.
Report Cp , BIC, R2 , and RSS for this model
c. Use the fitted model to predict responses for the test data and report the test error (RSS) obtained.
d. Compare the performance of (i) best subset selection; (ii) forward subset selection; (iii) backward subset selection.
For each method:
Use the training set to select the best model for each number of predictors.
Plot Cp , BIC, adjusted R2 , and RSS for all of the models at once.
Select one of models as your final model. Report the model, and the training error (RSS) for your chosen model.
Use your chosen model to fit the test data and report the test error (RSS) obtained
Answer:
By using given data,
R-code for the problem is
library(ISLR)
library(randomForest)
library(rpart)
data=Carseats
smp_size <- floor(0.5 * nrow(data))
## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(data)), size = smp_size)
train <- data[train_ind, ]
test <- data[-train_ind, ]
part b
fit <- rpart(Sales~., method="class", data=train)
printcp(fit) # display the results
plotcp(fit) # visualize cross-validation results
summary(fit) # detailed summary of splits
plot(fit, uniform=TRUE,
main="Classification Tree for Carseats")
text(fit, use.n=T, all=T, cex=.8)
part c
fit1=randomForest(Sales~.,data=train)
print(fit1)
importance(fit1)
part d
fit2=randomForest(Sales~.,mtry=10,data=train)
importance(fit2)
fit3=randomForest(Sales~.,mtry=2,data=train)
importance(fit3)
#here no difference in changing importance by moving mtry