Question

In: Computer Science

One way to cluster objects is called k-means clustering. The goal is to find k different...

One way to cluster objects is called k-means clustering. The goal is to find k different clusters, each represented by a "prototype", defined as the centroid of cluster. The centroid is computed as follows: the jth value in the centroid is the mean (average) of the jth values of all the members of the cluster. Our goal is for every member a cluster to be closer to that cluster's prototype than to any of the other prototypes. Thus a prototype is a good representative for the items in a cluster. We start the algorithm with k initial clusters and centroids. We then alternate between two steps: For each sample, find the nearest prototype. This assigns members to each cluster. Set the prototype of each cluster as the centroid of its members. We stop when the cluster assignment doesn't change, or when we've reached a maximum number of iterations.

Can someone help me with a program in python that computes the Euclidean distance and computes cluster prototype (centroid)?

Thank you!

Solutions

Expert Solution

Hi, K-means clustering is a very standard clustering technique and there are many open source api's that support this. However I will recommend you to use the sklearn's implementation. You can also view the sklearn code on github.

The function is as follows:

class sklearn.cluster.KMeans(n_clusters=8, init=’kmeans++’, n_init=10, max_iter=300, tol=0.0001, precompute_distances=’auto’,verbose=0, random_state=None, copy_x=True, n_jobs=None, algorithm=’auto’)

now let me first explain you what this function is:

This function uses Kmeans++ that is an advanced version of Kmeans with the differences only in initialization.

Keep the rest of the parameters as default.

The code will be as follows:

from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
print(kmeans.cluster_centers_)

Now let me please explain the code

Here X is our Data which is an array of data points in 2D space. Now in the second line of code we will fit that data  and the centroids will be computed by the function automatically.

In the last line of code we will print the centroids.


Related Solutions

K-means clustering: a. In the k-means lab, you examined different values for k using the "knee"...
K-means clustering: a. In the k-means lab, you examined different values for k using the "knee" heuristic to pick the best value of k. Explain what is so special about the k values on the “knee”? Hint: There are two properties that together make these values of k special. b. Give an example of a type of data (data type) that k-means should not be used for and explain why.
In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering.
In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering.
We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering,...
We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering, GMM, Naive Bayes, and decision trees. For this assignment, I would like you to choose three models from the above list and describe two problems that each of the models could potentially be used to solve. You can do one big post with all three models and six solvable problems or do three separate posts if you prefer.   Short Explanation of Decision Trees Decision...
COMPLETE A LOGISTIC REGRESSION, AS WELL AS A K-MEANS CLUSTER ANALYSIS IN EXCEL? Using the data...
COMPLETE A LOGISTIC REGRESSION, AS WELL AS A K-MEANS CLUSTER ANALYSIS IN EXCEL? Using the data to find four clusters of cities. Write a short report about the clusters you find. Does the clustering make sense? Can you provide descriptive, meaningful names for the clusters? SHOW GRAPHS PLEASE (Scatter plot/cluster) Metropolitan_Area Cost_Living Transportation Jobs Education Abilene, TX 96.32 36.54 17.28 49.29 Akron, OH 47.31 69.68 86.11 71.95 Albany, GA 86.12 28.02 32.01 26.62 Albany-Schenectady-Troy, NY 25.22 82.71 52.97 99.43 Albuquerque,...
Using the ruspini dataset provided with the cluster package in R, perform a k-means analysis. Document...
Using the ruspini dataset provided with the cluster package in R, perform a k-means analysis. Document the findings and justify the choice of K. Hint: use data(ruspini) to load the dataset into the R workspace.
Use the k-means algorithm and Euclidean distance to cluster the following eight examples into three clusters:...
Use the k-means algorithm and Euclidean distance to cluster the following eight examples into three clusters: A1 = (26, 18), A2 = (20, 26), A3(14, 20), A4(24, 20), A5(14, 30), A6(22, 18), A7(8, 18), A8(12, 14) a. Suppose that the initial seeds (centres of each cluster) are A2, A3, and A8. Run the k-means algorithm for one epoch only. At the end of this epoch, o show the new clusters (i.e., the examples belonging to each cluster); o show the...
Try to use K means clustering to segment an image. You can use Matlab function: kmeans(...
Try to use K means clustering to segment an image. You can use Matlab function: kmeans( )
Question 1. What is k-means clustering? How does it work? Give a few examples that you...
Question 1. What is k-means clustering? How does it work? Give a few examples that you would use this algorithm. ---------------- Question 2. What is k-nearest neighbor? How does it work? Give a few examples that you would use this algorithm.
In your own words, summarize the steps of K-means clustering. Make sure to give example(s). What...
In your own words, summarize the steps of K-means clustering. Make sure to give example(s). What are the advantages and disadvantages of the K-means clustering? Any limitations?
Which of the following is not an assumption of the one-way ANOVA? A. Group means are...
Which of the following is not an assumption of the one-way ANOVA? A. Group means are equal B. Group variances are equal C. data between and within groups are independent D. Observations are normally distributed within groups.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT