Question

In: Statistics and Probability

After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B...

After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B claims that this does not make any sense and suggests that A re-read the definition of an anomaly. Do you think it is meaningful to cluster anomalies? Explain.

Solutions

Expert Solution

Before proceeding towards the conclusion of a data mining problem, we need to keep two things is mind; first there is no one site fits all solution for a data science problem and garbage in is equal to garbage out. The former means that if data miner A wants to find clusters of outliers, it may be possible because depending on the number of data points and the impact of the study. The later is means that depending on the budget, dataset and timely constraints of the project, it is essential to move on with the cluster anomalies.

One of the studies [TR CS] suggest that outward anonymity, and wide prevalence of saving sensitive information on computers has attracted a large number of criminals and hacker hobbyists. This has resulted in an overwhelming increase in attacks from outlier vulnerabilities and raised recent interest in Anomaly Detection for outliers, in which a model is built of normal behaviour and significant deviations from the model are flagged anomalous.

More inference can be done based on the training data and the steps involved in the data cleaning processes as outlier detection is vital during that stage.

References:

[TR CS] Muhammad H. Arshad and Philip K. Chan: Identifying Outliers via Clustering for Anomaly Detection, 2003-19


Related Solutions

After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B...
After performing anomaly detection, data miner A wants to find clusters of outliers. Data miner B claims that this does not make any sense and suggests that A re-read the definition of an anomaly. Do you think it is meaningful to cluster anomalies? Explain.
A data miner wants to identify how price and advertising drive sales for the company and...
A data miner wants to identify how price and advertising drive sales for the company and wants to forecast but does not like to use algorithms. Which of the methods below represents the best solution.         a. regression               b. Clustering               c. segmentation           d. Neural Nets
Please find if there are any outliers in this data set using the Z-score.
Miles 40001 53402 53500 59817 59902 63436 64090 64342 64544 65605 66998 67998 69568 69922 71978 72069 73341 74276 74425 77098 77202 77437 77539 79294 82256 82464 85092 85288 85586 85861 86813 88798 89323 89341 89641 92609 92857 94219 95066 95774 97831 101769 102534 105662 109465 116269 116803 118444 121352 138114 Please find if there are any outliers in this data set using the Z-score. This must be done in excel so please show the formulas used!
Use the data below and find the clusters using a single link technique. Use Euclidean distance...
Use the data below and find the clusters using a single link technique. Use Euclidean distance and draw the dendrogram. X Y P1 0.35 0.48 P2 0.17 0.33 P3 0.3 0.28 P4 0.21 0.18 P5 0.08 0.29
1.After performing an ANOVA test, with (3,4) degrees of freedom, for data collected during an experiment...
1.After performing an ANOVA test, with (3,4) degrees of freedom, for data collected during an experiment trying to determine if there is at least one difference between groups. You get a calculated F value of 7.52. Using the table below, find the appropriate critical F value. What should be your conclusion(s), based on those 2 F values and α? Check all that apply. Critical values of F (α= 0.05) Degrees of freedom (x,y) 1 2 3 4 5 1 161.45...
How to find confidence intervals LCL and UCL for 90%, 95%, 99% after performing t-test :two...
How to find confidence intervals LCL and UCL for 90%, 95%, 99% after performing t-test :two sample assuming equal variances? (mean b- mean a)- CI? following (mean b- mean a)+CI? Or ?
A trucking company wants to find out if their drivers are still alert after driving long...
A trucking company wants to find out if their drivers are still alert after driving long hours. So, they give a test for alertness to two groups of drivers. They give the test to 395 drivers who have just finished driving 4 hours or less and they give the test to 565 drivers who have just finished driving 8 hours or more. The results of the tests are given below. Passed Failed Drove 4 hours or less 290 105 Drove...
A trucking company wants to find out if their drivers are still alert after driving long...
A trucking company wants to find out if their drivers are still alert after driving long hours. So, they give a test for alertness to two groups of drivers. They give the test to 400 drivers who have just finished driving 4 hours or less and they give the test to 585 drivers who have just finished driving 8 hours or more. The results of the tests are given below.                                                 Passed       Failed Drove 4 hours or less                 310   ...
A trucking company wants to find out if their drivers are still alert after driving long...
A trucking company wants to find out if their drivers are still alert after driving long hours. So, they give a test for alertness to two groups of drivers. They give the test to 400 drivers who have just finished driving 4 hours or less and they give the test to 585 drivers who have just finished driving 8 hours or more. The results of the tests are given below. Passed Failed Drove 4 hours or less 310 90 Drove...
Use the accompanying data set to complete the following actions. a. Find the quartiles. b. Find...
Use the accompanying data set to complete the following actions. a. Find the quartiles. b. Find the interquartile range. c. Identify any outliers. 58  65  55  64  61  57  61  63  59  55  57  64  59  54  79
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT