Question

In: Computer Science

Anomaly Dection For avoiding false discoveries, data mining introduces many statistical testing methods to define reliable...

Anomaly Dection

For avoiding false discoveries, data mining introduces many statistical testing methods to define reliable data mining results. Select at least two statistical methods and discuss how they find unreliable patterns and relationship in the data mining results. Also, you have to mention how these methods can be applied to typical data mining tasks to help ensure that the resulting models and patterns are valid

Need 300 words with no plagrism

Solutions

Expert Solution

Digital world is developing day by day. New devices are beinginstalled everywhere and number of datas produced are increased. There are billions and billions of datas stored in the cloud storage systems today. Retrieving the accurate relevant data from this ocean of data is always a challenge. Engineers had developed many techniques to store and manage these data, but the daily increase of heterogenity of data make it harder every day.

Data mining techniques are a lot available. But one with completely relevant and error free is not possible. The only idea is to reduce maximum error and increase the precision. Data anomelies are a huge challenge for data mining tools. The artificial intelligence and programs working behind may fail to remove the redundant data from the set perhaps due to high similarity with original data. Today, may tools are evolving to tackle this problem.

One of the very popular tool for anomaly detection is Weka data mining. Weka work with a collection of highly engineered algorithm. The algorithms can be used on a dataset or can be used inside our java codes as a function call. Weka features many facilities like preprocessing, mining, classification, regression, attribute selection, experiments, visualization etc. All these together helps to retrieve the data more precisely and reduce anomaly.

Next tool is Shogun. Shogun is a free open source toolbox for data mining. It is implemented using Cpp language. The main aim of Shogun is the datasets on support vector machines for regression and classification purposes. The one special feature of Shogun is the full implementation of Hidden Marklov models. This tool allow continous easy combining of multiple data representation, algorithm classes and general tools.

Other tools are Rapid miner, Dataiku DSS community, ELKI etc.

All these kind of tools are specialized for data mining and to retrieve data with maximum accuracy and remove anomaly. This tools will let the learning and research processes easier and faster in the scientific world. These methods can be applied to typical data mining tasks to help ensure that the resulting models and patterns are valid because we can observe all those tools are providing such good features that we use in classic data mning togather and on multiple data sets. This will definitely produce much efficient datasets as result.


Related Solutions

There is a strong linkage between statistical data analysis and data mining. Some people think of...
There is a strong linkage between statistical data analysis and data mining. Some people think of data mining as automated and scalable methods for statistical data analysis. Do you agree or disagree with this perception? Present one statistical analysis method that can be automated and/or scaled up nicely by integration with current data mining methodology.
What is statistical hypothesis? Define the following terms with reference to testing of a hypothesis- Null...
What is statistical hypothesis? Define the following terms with reference to testing of a hypothesis- Null and Alternative Hypothesis Critical Region Significance Level Types of Hypothesis Tests Two types of Errors in Hypothesis Tests
Many standard statistical methods that you will study in Part II of this book are intended...
Many standard statistical methods that you will study in Part II of this book are intended for use with distributions that are symmetric and have no outliers. These methods start with the mean and standard deviation, x and s. For example, standard methods would typically be used for the IQ and GPA data here data457.dat. (a) Find x and s for the IQ data. (Round your answers to two decimal places.) x = s = (b) Find the median IQ...
Many standard statistical methods that you will study in Part II of this book are intended...
Many standard statistical methods that you will study in Part II of this book are intended for use with distributions that are symmetric and have no outliers. These methods start with the mean and standard deviation, x and s. For example, standard methods would typically be used for the IQ and GPA data here data215.dat. (a) Find x and s for the IQ data. (Round your answers to two decimal places.) X= s= Here are the numbers obs gpa iq...
When carrying out hypothesis testing using statistical methods, we are trying to infer something about what’s...
When carrying out hypothesis testing using statistical methods, we are trying to infer something about what’s going on in a population, based on what’s going on in a random sample from that population. If we want to learn about the population mean annual spending by households on video games, describe a hypothesis testing procedure for testing the hypothesis that this spending level equals 100 dollars.
Define big data and data mining. What purpose does collecting huge amounts of data serve? Consider...
Define big data and data mining. What purpose does collecting huge amounts of data serve? Consider Twitter. Do you believe big data is accurate and reliable? Why or why not? What type(s) of sampling methods could be used with big data? What sampling errors could occur and how could they be avoided? Has collecting big data been helpful for businesses? Why or why not? What do you see happening in the future with big data?
Question Define big data and data mining. What purpose does collecting huge amounts of data serve?...
Question Define big data and data mining. What purpose does collecting huge amounts of data serve? Consider Twitter. Do you believe big data is accurate and reliable? Why or why not? What type(s) of sampling methods could be used with big data? What sampling errors could occur and how could they be avoided? Has collecting big data been helpful for businesses? Why or why not? What do you see happening in the future with big data?
What subject deals with different methods of developing useful information from large data bases? data mining...
What subject deals with different methods of developing useful information from large data bases? data mining data manipulation     big data data warehousing
Question 3.4 in Statistical Methods and Data Analysis by Lyman Ott and Michael Longnecker 3.4 The...
Question 3.4 in Statistical Methods and Data Analysis by Lyman Ott and Michael Longnecker 3.4 The regulations of the board of health in a particular state specify that the fluoride level must not exceed 1.5 parts per million (ppm). The 25 measurements given here represent the fluoride levels for a sample of 25 days. Although fluoride levels are measured more than once per day, these data represent the early morning readings for the 25 days sampled. .75 .86 .84 .85...
I am working problem 11.32 on page 610 of Ott and Longnecker's 'Statistical Methods and Data...
I am working problem 11.32 on page 610 of Ott and Longnecker's 'Statistical Methods and Data Analysis. I cannot figure out how the answer for the regression equation is: y = -1.73 + 1.32x and not y = 3.21 + 0.46x if you would include all points. If you exclude some outliers you can approach the former solution, but I can never attain the solution. Also, the narrative provided by Chegg to accompany the solution is too small to be...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT