In: Statistics and Probability
Load the USArrests sample dataset from the built-in datasets (data(USArrests)) into R using a dataframe (Note: Row names are states, not numerical values!). Use the kmeans package to perform a clustering of the data with increasing values of k from 2 to 10 - you will need to decide whether or not to center/scale the observations - justify your choice. Plot the within-cluster sum of squares for each value of k - what is the optimal number of clusters? Use the tidyverse and fviz cluster plotting method from factoextra to plot the optimal clustering.
Use program R. Provide code and plots.