What is clustering:
A cluster refers to a collection of data points aggregated
together because of certain similarities.
Data clustering approaches can group similar data into clusters.
The grouped Data will usually reveal important meanings.
Data that are close to each other tend to share some external
relationship. This relationship can be established to group the
data into clusters.
How K-Means Clustering Algorithm works:
- In k∗-means, we also use the random
initialization method to choose k∗ starting centers, and first
assign all points into k∗clusters.
- Then we get feature values of mean
for each cluster,as shown in lines 1-3. Next, k∗-means performs
hierarchical clustering along with k-means adjusting iteration in
lines 4-22 and nearest clusters associated with top-n distances
merging in line 23-25.
- Line 6 describe the proposed
cluster pruning strategy. We use collections CS to store the
neighbor clusters of specific cluster which after prune remote
clusters by Lemma 2.
- Then, we only need to verify the
adjustable clusters in search space of CS for each point in
Ci.
- Once percentage of moved points is
lower than given θ in first round k-means optimized update
principle will be started and radius will be updated during this
process( lines 7- 14).
- For algorithm’s efficiency, we
maintain a value r in each cluster to update its radius( lines
9-13). At the end of each iteration, each cluster mean m and it’s
radius is replaced by ,
directly.Lines 23-25 show top-n nearest clusters merging which
reduce number of clusters from
- Therefore, parameter n is not
fixed, but ranges from given n to 1. For each round refining of
k∗-means, we use a decrease strategy to determine value of n, and a
top-1 may be performed at final round to make number of clusters
reach at k.
----------------------------------------------------------------------------------------------------------------------