In: Computer Science
Question: In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering. The hybrid algorithm will use hierarchical clustering to produce stable clusters and k-means clustering will initialize seeds based on the centroids of the produced stable clusters (instead of randomly initialized seeds)
Background Information: Both hierarchal clustering and k-means clustering group similar data objects into clusters. However, the two algorithms have their pros and cons. For example, hierarchical clustering produces stable clusters while k-means clustering generates instable clusters due to random initial centroids(seeds). In hierarchal clustering once data objects are wrongfully merged, the data objects cannot be moved to another group while k-means clustering can re-assign data objects to a different group.
hierarchical clustering identifies groups in a tree-like structure but suffers from computational complexity in large datasets while K-means clustering is efficient but designed to identify homogeneous spherically-shaped clusters a hybrid non-parametric clustering approach that amalgamates the two methods to identify general-shaped clusters and that can be applied to larger datasets. Specifically, we first partition the dataset into spherical groups using K-means.next merge these groups using hierarchical methods with a data-driven distance measure as a stopping criterion.
The algorithm has the following steps:
Removing scatter from the dataset: The algorithm first removes scatter from the dataset from consideration.
Finding a partition: Our algorithm has two phases. The first focuses on finding a (potentially) large number (K0) of homogeneous spherical groups while the next merges these groups according to some criterion. We call these phases the K-means and hierarchical phases. The exact details of these phases are as follows:
The K-means phase: For a given K0 and initialization, the K-means phase uses its namesake algorithm with multiple (m) initializations to identify K0 homogeneous spherically-distributed groups. This phase yields K0 groups {?1, ?2, …, ?K0} with means μ1,μ2, …,μK0. Each obtained cluster ?k is now considered to be one entity. Therefore, we now have K0 entities labeled as ?1, ?2, …, ?K0 for consideration.
Hierarchical phase: For given K* and distance d(·, ·), we successively merge the K-means groups as follows:
Set i* = 1 and d1∗=1. Define ?̃j(1) = ?j for all j.
For j ∈ 1…(K0 − i*) C∼j(i∗+1)=C∼j(i∗). Find k, l such that k < l and d(C∼ki∗,C∼li∗)=min1≤m<q≤(K0-i∗+1)d(C∼mi∗,C∼qi∗). Set C∼k(i∗+1)=C∼k(i∗)∪C∼l(i∗) and if l <K0 − i* + 1 then C∼l(i∗+1)=C∼K0-i∗+1(i∗), define di∗∗=d(C∼ki∗,C∼li∗). Set i* = i* + 1.
If i* = K0 or i* = K0 − K* + 1 terminate, else return to Step 2(b)
For the hierarchical phase of Step 2, we calculate the distance between two clusters obtained from the K-means step by assuming (non-homogeneous) spherically-dispersed Gaussian-distributed groups in the dataset. Specifically, we let X1,X2, …,Xn* be independent p-variate observations with Xi~Np(μζi,σζi2I), where ζi ∈ {1, 2, …, K} for i = 1, 2, …, n*. Here we assume that μk’s are all distinct and that nk is the number of observations in cluster k. Then the density for the Xi’s is given by f(X)=∑k=1KI(X∈Ck)ϕ(X;μk,σk2I), where ?k is a cluster indexed by the Np(μk,σk2I) density and I(X ∈ ?k ) is an indicator function specifying whether observation X belongs to the kth group having a p-dimensional multivariate normal density ϕ(X;μk,σk2I)∝σk-pexp[-12σk2(X-μk)′(X-μk)], k = 1, …, K. Define the distance measure
Dk(Xi)=(Xi-μk)′(Xi-μk)σk2
(1)
and the variable
Yj,l(X) = ?j(X) - ?l(X), where X ∈ ?l,
(2)
and Yl,j (X) similarly. Using the spherically-dispersed Gaussian models formulated above, Yj,l (X) is a random variable which represents the difference in squared distances of X ∈ ?l to the center of ?j and to the center of ?l. Then plj=Pr[Yj,l(X)<0] is the probability that an observation from ?l is classified into ?j and is calculated as follows.
Theorem 1
Let X ~ Np(μl, Σl), with Σl a positive-definite matrix. Further, let Yj,l (X) = ?j (X)−?l (X), where Dk(X)=(X-μk)′∑k-1(X-μk) for k ∈ {j, l}. Let λ1, λ2, …, λp be the eigenvalues of ∑j∣l≡∑l12∑j-1∑l12 with corresponding eigenvectors γ1, γ2, … γp. Then Yj,l (X) is distributed as ∑i=1pI(λi≠1)[(λi-1)Ui-λiδi2/(λi-1)]+∑i=1pI(λi=1)δi(2Zi+δi), where Ui′s are independent non-central χ2random variables with one degree of freedom and non-centrality parameter λi2δi2/(λi-1)2 with δi=γi′∑l-12(μl-μj) for i ∈ {1, 2, …, p} ∩ {i : λl ≠ 1}, independent of Zi’s, which are independent standard normal random variables, for i ∈ {1, 2, …, p} ∩ {i : λi = 1}.
Forming multiple partitions and choosing the optimal P*: Repeat Step 2 N = ML times with M different K0s and L different K*s to form multiple partitions. Determine the optimal hierarchical partition P*.