In: Math
There is a strong linkage between statistical data analysis and data mining. Some people think of data mining as automated and scalable methods for statistical data analysis. Do you agree or disagree with this perception? Present one statistical analysis method that can be automated and/or scaled up nicely by integration with current data mining methodology.
Th difference between data mining and statistics , we can c
Data Mining | Statistics |
Explore and gather data first, builds model to detect patterns and make theories. | It provides theories to test using statistical. |
Data used is Numeric or Non numeric. | Data used is Numeric. |
Inductive Process (Generation of new theory from data) | Deductive Process (Does not involve making any predictions) |
Data collection is less important. | Data collection is more important. |
Data Cleaning is done in data mining. | Clean data is used to apply statistical method. |
Needs less user interaction to validate model hence, easy to automate. | Needs user interaction to validate model hence, difficult to automate. |
Suitable for large data sets | Suitable for smaller data sets |
It’s an algorithm which learns from data without using any programming rule. | Formalization of relationship in data in the form of mathematical equation |
Use heuristics think (rules used to form judgments and make decisions) | Does not have scope for heuristic think. |
Classification, Clustering, Neural network, Association, Estimation, Sequence based analysis, Visualization | Descriptive Statistical, Inferential Statistical |
Financial Data Analysis, Retail Industry, Telecommunication Industry, Biological Data Analysis, Certain Scientific Applications etc. | Demography, Actuarial Science, Operation research, Biostatistics, Quality Control etc. |
Standard descriptive analytics can be automated to generate insights at one for better visualization and decision making purposes and inferential statistics including t test,z test anova methods have been automated which can be helpul for data mining techniques but sampling distributions are tough to estimate and automate