In: Statistics and Probability
After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B claims that this does not make any sense and suggests that A re-read the definition of an anomaly. Do you think it is meaningful to cluster anomalies? Explain.
Before proceeding towards the conclusion of a data mining problem, we need to keep two things is mind; first there is no one site fits all solution for a data science problem and garbage in is equal to garbage out. The former means that if data miner A wants to find clusters of outliers, it may be possible because depending on the number of data points and the impact of the study. The later is means that depending on the budget, dataset and timely constraints of the project, it is essential to move on with the cluster anomalies.
One of the studies [TR CS] suggest that outward anonymity, and wide prevalence of saving sensitive information on computers has attracted a large number of criminals and hacker hobbyists. This has resulted in an overwhelming increase in attacks from outlier vulnerabilities and raised recent interest in Anomaly Detection for outliers, in which a model is built of normal behaviour and significant deviations from the model are flagged anomalous.
More inference can be done based on the training data and the steps involved in the data cleaning processes as outlier detection is vital during that stage.
References:
[TR CS] Muhammad H. Arshad and Philip K. Chan: Identifying Outliers via Clustering for Anomaly Detection, 2003-19