In: Computer Science
Hierarchical clustering is sometimes used to generate K
clusters, K > 1 by taking the clusters at the K th level of the
dendrogram. (Root is at level 1.) By looking at the clusters
produced in this way, we can evaluate the behavior of hierarchical
clustering on different types of data and clusters, and also
compare hierarchical approaches to K-means.
The following is a set of one-dimensional points: {6, 12, 18, 24,
30, 42, 48}.
(a)
For each of the following sets of initial centroids, create two
clusters by assigning each point to the nearest centroid, and then
calculate the total squared error for each set of two clusters.
Show both the clusters and the total squared error for each set of
centroids.
i.{18, 45}
ii. {15, 40}
(b)
Do both sets of centroids represent stable solutions; i.e., if the
K-means algorithm was run on this set of points using the given
centroids as the starting centroids, would there be any change in
the clusters generated?
(c)
What are the two clusters produced by single link?
( d) Which technique, K-means or single link, seems to produce the
"most natural" clustering in this situation? (For K-means, take the
clustering with the lowest squared error.)
(e)
What definition(s) of clustering does this natural clustering
correspond to? (Well-separated, center-based, contiguous, or
density.)
(f)
What well-known characteristic of the K-means algorithm explains
the previous behavior?
Solution:
a.)
i.) According to the question, the clusters are {6 ,12 ,18 ,24 ,30} and {42 ,48} with square root total
(122 + 62 + 02 + 62 + 122)+(32 + 32) = 360+18 =378
ii.) According to the question, the clusters are {6 ,12 ,18 ,24} and {30 ,42 ,48} with square root total
(92 + 32 + 32 + 92 )+ (102+22 + 82) = 180+168 =348
b.) Unexpectedly, the underlying centroids stay unaltered after clustering. So the clusters are stable in both of the cases.
c.)
d.)
K-means chooses the second clustering from the initial part,
while single connection chooses the first clustering in the initial
part.
Of the two, single link cluster produces more natural clusters
(whose centroids are further separated).
e.)
Single linked clustering tend to be more contiguous and is biased towards more dense clusters
f.)
The natural clusters have various sizes, which K - means handle’s ineffectively. It will break up the larger cluster and move the centroids together.