In: Computer Science
Using Python,
Establish random centroids according to the k , and create your own algorithm to make clusters: create a functions kmeans() that accepts a dataframe and k as parameters, and returns the clusters created.
Here is the code of kmean clustering using python programming language please go through it very carefully try to understand each and every line of code.
Before we start we need to import some modules like pandas and all libraries..
Import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.cluster import KMeans
Now here we'll import the data set what type of data we are having.
data=pd.read_csv("Data set name will be here.csv")
data.head() # it will print the 10 or 20 line of top most data from dataset.
## Now standarlising the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
## Here the statitistics of scaled data
pd.DataFrame(data_scaled).describe()
### Here defining the k mean function
kmeans = KMeans(n_clusters=2, init='k-means++')
## Fitting the k mean clusturing on scaled data
kmeans.fit(data_scaled)
## Inartia on fitted data
kmeans.inertia_
## Fitting multiple k-means algorithms and storing the values in an empty list.
SSE = []
for cluster in range(1,20):
kmeans = KMeans(n_jobs = -1, n_clusters = cluster, init='k-means++)
kmeans.fit(data_scaled)
SSE.append(kmeans.inertia_)
## Converting the results into a dataframe and plotting them
frame = pd.DataFrame({'Cluster':range(1,20), 'SSE':SSE})
plt.figure(figsize=(12,6))
plt.plot(frame['Cluster'], frame['SSE'], marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
End of the code .....