In: Computer Science
Machine Learning - clustering
Having generated a dendrogram, can we “prune” it?
Prune is a technique in machine learning that reduces the size of clustering trees by removing branches of the tree. Pruning reduces the complexity of clustering trees and improves predictive accuracy by the reduction of overtraining.
Clustering having generated a dendrogram We can Prune it based on top down or bottom up fashion. A top down pruning will traverse nodes from top level nodes to bottom level nodes and cuts subtrees starting at the root, while a bottom up pruning will traverse nodes from bottom level nodes to top level nodes and cuts subtrees start at the leaf nodes.
There are two popular pruning algorithms.
The function prune(T,t) defines the tree obtained by pruning the subtrees ‘t’ from the tree ‘T’. Once the series of trees has been created, the best tree is chosen based accuracy as measured by a training set or cross-validation.
We can prune a Clustering tree having generated a dendrogram based on the labels:
example:
par(mfrow = c(1,2)) dend15 %>% set("labels_colors") %>% plot(main = "main tree", ylim = c(0,4)) dend15 %>% set("labels_colors") %>% prune(c("1","5")) %>% plot(main = "Prune tree", ylim = c(0,4))