Question

In: Statistics and Probability

In your own words, summarize the steps of K-means clustering. Make sure to give example(s). What...

In your own words, summarize the steps of K-means clustering. Make sure to give example(s). What are the advantages and disadvantages of the K-means clustering? Any limitations?

Solutions

Expert Solution

Step 1: Initialization

  • The first thing k-means does, is randomly choose K examples (data points) from the dataset (the 4 green points) as initial centroids and that’s simply because it does not know yet where the center of each cluster is. (a centroid is the center of a cluster).

Step 2: Cluster Assignment

  • Then, all the data points that are the closest (similar) to a centroid will create a cluster. If we’re using the Euclidean distance between data points and every centroid, a straight line is drawn between two centroids, then a perpendicular bisector (boundary line) divides this line into two clusters.from Introduction to Clustering and K-means Algorithm

Step 3: Move the centroid

  • Now, we have new clusters, that need centers. A centroid’s new value is going to be the mean of all the examples in a cluster.

We’ll keep repeating step 2 and 3 until the centroids stop moving, in other words, K-means algorithm is converged

K-Means Advantages :

1) If variables are huge, then K-Means most of the times computationally faster than hierarchical clustering, if we keep k smalls.

2) K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular.

K-Means Disadvantages :

1) Difficult to predict K-Value.

2) With global cluster, it didn't work well.

3) Different initial partitions can result in different final clusters.

4) It does not work well with clusters (in the original data) of Different size and Different density

Example

kmeans algorithm is very popular and used in a variety of applications such as market segmentation, document clustering, image segmentation and image compression, etc. The goal usually when we undergo a cluster analysis is either:

Get a meaningful intuition of the structure of the data we’re dealing with.

Cluster-then-predict where different models will be built for different subgroups if we believe there is a wide variation in the behaviors of different subgroups. An example of that is clustering patients into different subgroups and build a model for each subgroup to predict the probability of the risk of having heart attack.

Limitations

  • k-means assumes the variance of the distribution of each attribute (variable) is spherical;
  • all variables have the same variance;
  • the prior probability for all k clusters is the same, i.e., each cluster has roughly equal number of observations;

If any one of these 3 assumptions are violated, then k-means will fail.


Related Solutions

What is clustering? Explain how K-Means Clustering Algorithm works? What are the Advantages and disadvantages of...
What is clustering? Explain how K-Means Clustering Algorithm works? What are the Advantages and disadvantages of Clustering ALgorithms discussed in our class (K-Means,Hierchal)? Which Clustering Algorithm is better K-Means or hierarchical Clustering? Explain with a proper example which is better algorithm?
Question 1. What is k-means clustering? How does it work? Give a few examples that you...
Question 1. What is k-means clustering? How does it work? Give a few examples that you would use this algorithm. ---------------- Question 2. What is k-nearest neighbor? How does it work? Give a few examples that you would use this algorithm.
Define in your own words what is a learning organization and give an example.
Define in your own words what is a learning organization and give an example.
In your own words, what is sustainable marketing? and Give an example of a company that...
In your own words, what is sustainable marketing? and Give an example of a company that offers a societal marketing concept and briefly explain what their approach is. (Please do not use Tom's Shoes. It's an overused example
What is KCL? Give an example of the application of KCL. In this example, make sure...
What is KCL? Give an example of the application of KCL. In this example, make sure to include all the KCL equations.
Explain, in your own words, what Thomson means when she claims that “Being a good K...
Explain, in your own words, what Thomson means when she claims that “Being a good K is being good qua K.” According to Thomson, what is a “goodness-fixing kind”? According to Thomson, what is a “function kind”? Are function kinds “goodness-fixing kinds”? Why or why not? According to Thomson, what is a “natural kind” (i.e. a “non-function kind” like “beefsteak tomato” and “tiger”)? Are natural kinds “goodness-fixing kinds?” Why or why not? According to Thomson, why is being physically fit...
Solve the below questions using your own words PLEASE!! Make sure to write by your own...
Solve the below questions using your own words PLEASE!! Make sure to write by your own words or paraphrase 1. What is the difference between Windows and Linux server 2. Give some advantages and disadvantages Windows and Linux Operating System
Explain in your own words what each of the following terms mean, and give an example...
Explain in your own words what each of the following terms mean, and give an example of each: Positive correlation Negative correlation No correlation
In your own words explain what working capital management is and give an example of one...
In your own words explain what working capital management is and give an example of one way Organic Produce Corporation’s financial managers could improve the company’s working capital position.
With your own words, explain what a colligative property is. Give one detailed example.
With your own words, explain what a colligative property is. Give one detailed example.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT