In: Computer Science
Anomaly Dection
For avoiding false discoveries, data mining introduces many statistical testing methods to define reliable data mining results. Select at least two statistical methods and discuss how they find unreliable patterns and relationship in the data mining results. Also, you have to mention how these methods can be applied to typical data mining tasks to help ensure that the resulting models and patterns are valid
Need 300 words with no plagrism
Digital world is developing day by day. New devices are beinginstalled everywhere and number of datas produced are increased. There are billions and billions of datas stored in the cloud storage systems today. Retrieving the accurate relevant data from this ocean of data is always a challenge. Engineers had developed many techniques to store and manage these data, but the daily increase of heterogenity of data make it harder every day.
Data mining techniques are a lot available. But one with completely relevant and error free is not possible. The only idea is to reduce maximum error and increase the precision. Data anomelies are a huge challenge for data mining tools. The artificial intelligence and programs working behind may fail to remove the redundant data from the set perhaps due to high similarity with original data. Today, may tools are evolving to tackle this problem.
One of the very popular tool for anomaly detection is Weka data mining. Weka work with a collection of highly engineered algorithm. The algorithms can be used on a dataset or can be used inside our java codes as a function call. Weka features many facilities like preprocessing, mining, classification, regression, attribute selection, experiments, visualization etc. All these together helps to retrieve the data more precisely and reduce anomaly.
Next tool is Shogun. Shogun is a free open source toolbox for data mining. It is implemented using Cpp language. The main aim of Shogun is the datasets on support vector machines for regression and classification purposes. The one special feature of Shogun is the full implementation of Hidden Marklov models. This tool allow continous easy combining of multiple data representation, algorithm classes and general tools.
Other tools are Rapid miner, Dataiku DSS community, ELKI etc.
All these kind of tools are specialized for data mining and to retrieve data with maximum accuracy and remove anomaly. This tools will let the learning and research processes easier and faster in the scientific world. These methods can be applied to typical data mining tasks to help ensure that the resulting models and patterns are valid because we can observe all those tools are providing such good features that we use in classic data mning togather and on multiple data sets. This will definitely produce much efficient datasets as result.