Question

In: Computer Science

Hierarchical clustering is sometimes used to generate K clusters, K > 1 by taking the clusters...

Hierarchical clustering is sometimes used to generate K clusters, K > 1 by taking the clusters at the K th level of the dendrogram. (Root is at level 1.) By looking at the clusters produced in this way, we can evaluate the behavior of hierarchical clustering on different types of data and clusters, and also compare hierarchical approaches to K-means.
The following is a set of one-dimensional points: {6, 12, 18, 24, 30, 42, 48}.

(a)
For each of the following sets of initial centroids, create two clusters by assigning each point to the nearest centroid, and then calculate the total squared error for each set of two clusters. Show both the clusters and the total squared error for each set of centroids.
i.{18, 45}
ii. {15, 40}
(b)
Do both sets of centroids represent stable solutions; i.e., if the K-means algorithm was run on this set of points using the given centroids as the starting centroids, would there be any change in the clusters generated?
(c)
What are the two clusters produced by single link?
( d) Which technique, K-means or single link, seems to produce the "most natural" clustering in this situation? (For K-means, take the clustering with the lowest squared error.)
(e)
What definition(s) of clustering does this natural clustering correspond to? (Well-separated, center-based, contiguous, or density.)
(f)
What well-known characteristic of the K-means algorithm explains the previous behavior?

Solutions

Expert Solution

Solution:

a.)

i.) According to the question, the clusters are {6 ,12 ,18 ,24 ,30} and {42 ,48} with square root total

(122  + 62 + 02 + 62 + 122)+(32 + 32) = 360+18 =378

ii.) According to the question, the clusters are {6 ,12 ,18 ,24} and {30 ,42 ,48} with square root total

(92  + 32 + 32 + 92 )+ (102+22 + 82) = 180+168 =348

b.) Unexpectedly, the underlying centroids stay unaltered after clustering. So the clusters are stable in both of the cases.

c.)

d.)

K-means chooses the second clustering from the initial part, while single connection chooses the first clustering in the initial part.
Of the two, single link cluster produces more natural clusters (whose centroids are further separated).

e.)

Single linked clustering tend to be more contiguous and is biased towards more dense clusters

f.)

The natural clusters have various sizes, which K - means handle’s ineffectively. It will break up the larger cluster and move the centroids together.


Related Solutions

In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering.
In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering.
Question: In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering. The...
Question: In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering. The hybrid algorithm will use hierarchical clustering to produce stable clusters and k-means clustering will initialize seeds based on the centroids of the produced stable clusters (instead of randomly initialized seeds) Background Information: Both hierarchal clustering and k-means clustering group similar data objects into clusters. However, the two algorithms have their pros and cons. For example, hierarchical clustering produces stable clusters while k-means clustering generates...
Apply K-Mean Clustering for the following data sets for two clusters. Tabulate all the assignments. In...
Apply K-Mean Clustering for the following data sets for two clusters. Tabulate all the assignments. In order to get full credit, show your all work done step by step including the cell calculations using excel functions. Sample No X Y 1 185 72 2 170 56 3 168 60 4 179 68 5 182 72 6 188 77
We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering,...
We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering, GMM, Naive Bayes, and decision trees. For this assignment, I would like you to choose three models from the above list and describe two problems that each of the models could potentially be used to solve. You can do one big post with all three models and six solvable problems or do three separate posts if you prefer.   Short Explanation of Decision Trees Decision...
Why do business from clusters and what are the advantages or disadvantages of clustering?
Why do business from clusters and what are the advantages or disadvantages of clustering?
State similarities and differences between Fuzzy c-means and hierarchical clustering based on Gaussian distributions.
State similarities and differences between Fuzzy c-means and hierarchical clustering based on Gaussian distributions.
What is clustering? Explain how K-Means Clustering Algorithm works? What are the Advantages and disadvantages of...
What is clustering? Explain how K-Means Clustering Algorithm works? What are the Advantages and disadvantages of Clustering ALgorithms discussed in our class (K-Means,Hierchal)? Which Clustering Algorithm is better K-Means or hierarchical Clustering? Explain with a proper example which is better algorithm?
K-means clustering: a. In the k-means lab, you examined different values for k using the "knee"...
K-means clustering: a. In the k-means lab, you examined different values for k using the "knee" heuristic to pick the best value of k. Explain what is so special about the k values on the “knee”? Hint: There are two properties that together make these values of k special. b. Give an example of a type of data (data type) that k-means should not be used for and explain why.
2. Consider a data set {3, 20, 35, 62, 80}, perform hierarchical clustering using complete linkage...
2. Consider a data set {3, 20, 35, 62, 80}, perform hierarchical clustering using complete linkage and plot the dendogram to visualize it. R code needed with full steps including packages
One way to cluster objects is called k-means clustering. The goal is to find k different...
One way to cluster objects is called k-means clustering. The goal is to find k different clusters, each represented by a "prototype", defined as the centroid of cluster. The centroid is computed as follows: the jth value in the centroid is the mean (average) of the jth values of all the members of the cluster. Our goal is for every member a cluster to be closer to that cluster's prototype than to any of the other prototypes. Thus a prototype...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT