In: Computer Science
I need this code to be written in Python:
Given a dataset, D, which consists of (x,y) pairs, and a list of cluster assignments, C, write a function centroids(D, C) that computes the new (x,y) centroid for each cluster. Your function should return a list of the new cluster centroids, with each centroid represented as a list of x, y:
def centroid(D, C):
The function to get the new centroid was not given, so I took it as average of all data points associated with that cluster. (k-means)
Program:
def centroid(D, C):
# the list of new centroids
new_centroids = []
# number of data points in a cluster (associated with
a centroid)
number_of_elements_in_cluster = []
# number of clusters -- starting from 1 to n
max_cluster_number = 0
# number of data points
n1 = len(D)
# get number of clusters by getting the max cluster
number
for i in range(0,n1):
if(C[i]>max_cluster_number):
max_cluster_number = C[i]
# initialze the new centroids as 0,0
# and the number of elements in each cluster = 0
for i in range(0, max_cluster_number):
new_centroids.append([0,0])
number_of_elements_in_cluster.append(0)
# get the sum of data points in each cluster
# add that data point in the number of elements for
that cluster
for i in range(0,n1):
new_centroids[C[i]-1][0] +=
D[i][0]
new_centroids[C[i]-1][1] +=
D[i][1]
number_of_elements_in_cluster[C[i]-1] += 1
# Divide each cluster sum by number of data points to
get the means
# which is the new centroid
for i in range(0, max_cluster_number):
new_centroids[C[i]-1][0] /=
number_of_elements_in_cluster[C[i]-1]
new_centroids[C[i]-1][1] /=
number_of_elements_in_cluster[C[i]-1]
return new_centroids
# sample run
D = [[2,0],[3,5],[4,6]]
C = [2,1,1]
print(centroid(D,C))