In: Computer Science
BIRCH Hierarchical algorithm:
BIRCH stands for Balanced Iterative Reducing and Clustering using Hierarchies.It is basically used as a clustering technique. Also it reduces the clusters and it does so in a iterative manner.
Shape of clusters: It is used for scalabe-shaped clusters.
Input Parameters: Parameters which would be specified N d-dimensional data points that would be required. N is the number any integer number 0 to infinite and d is the 2-dimension or 3-dimension any kind of data points.
It plays a very important role in scalibility of very large data set.Example if you have 50000 or 100000 data in a cluster then we use this BIRCH technique to reduce data.
Technique is suggested where data is taken as an average. Example first the raw data is divided into average or taken they are mean and then it is put for sampling purpose.
This BIRCH technique is designed for two prime features a) Clustering feature(F) = <n,LS,SS>
b) CF tree feature
CF tree is a height-balanced tree that stores the clustering features for HC.
Clusters of data points is represented by a triple <n,LS,SS>
n=number of items in the sub clusters
LS=Linear sum of pointers
SS=Square sum of pointers
ALGORITHM:
Step 1: It stores the data into the memory. Whenever you have the raw data in very first scan the CF tree is build with these data point.
sub-phases-
a) Fast: processing on sub-clusters and not on data point.
b) Accurate: outliers are separated.
c) Less order sensitive: Initial order of data done by CF tree.
Step 2: Condense Data-
Data resizing for the optimization for the Step 3 i.e. global clustering.
More outliers are removed.
Crowded sub-clusters are grouped together.
Condensing is optional.
Step 3: Global clustering-
Here we have to use some clustering algorithm ( Hierarchical clustering, K-means). If we have some natural clusters then node spanning across the entire cluster. So it's like if you are trying t fit in a particular memory map then it won't fit. So for reducing that we use global clustering.
Step 4: Cluster refining-
Once the outlier are removed then we have refine it for a particular extends.
a) Extra passes for re-arrangement for centroid.
b) Optional RC refining
c) Converge to minimum and discards more outliers.
Here CF tree hold limited or finite number of data points. So for this it is not considered as a ideal representation technique. User can not see or view BIRCH as a natural kind of cluster due to it's limited number of capability to hold that many data points. So