In: Physics
Use the ns() function to fit a natural cubic spline to predict nox using dis. Perform a 5-fold cross-validation in order to select the best degrees of freedom upto 10 degrees.
Please use set.seed(1) on the first line of your R code.
For cross-validation or another approach in order to select the best degrees of freedom for a regression spline on this data.
folds <- sample(1:10, size = 506, replace = TRUE) errors <- matrix(NA, 10, 9) models <- list() for (k in 1:10) { for (i in 1:9) { models[[i]] <- lm(nox ~ bs(nox, df = i), data = Boston[folds != k,]) pred <- predict(models[[i]], Boston[folds == k,]) errors[k, i] <- sqrt(mean((Boston$nox[folds == k] - pred)^2)) } } errors <- apply(errors, 2, mean) data_frame(RMSE = errors) %>% mutate(df = row_number()) %>% ggplot(aes(df, RMSE, fill = df == which.min(errors))) + geom_col() + theme_tufte() + guides(fill = FALSE) + scale_x_continuous(breaks = 1:9) + coord_cartesian(ylim = range(errors))
To plot degrees of freedom :
errors <- list() models <- list() pred_df <- data_frame(V1 = 1:506) for (i in 1:9) { models[[i]] <- lm(nox ~ bs(dis, df = i), data = Boston) preds <- predict(models[[i]]) pred_df[[i]] <- preds errors[[i]] <- sqrt(mean((Boston$nox - preds)^2)) } names(pred_df) <- paste(1:9, 'Degrees of Freedom') data_frame(RMSE = unlist(errors)) %>% mutate(df = row_number()) %>% ggplot(aes(df, RMSE, fill = df == which.min(errors))) + geom_col() + guides(fill = FALSE) + theme_tufte() + scale_x_continuous(breaks = 1:9) + coord_cartesian(ylim = range(errors))