Question

In: Computer Science

1. List the steps of data mining processes and the corresponding major methods. 2. What are...

1. List the steps of data mining processes and the corresponding major methods.

2. What are the common accuracy metrics for data-mining algorithms?

3. Search the available literature for additional metrics that measure algorithms for accuracy, suitability for a particular purpose, etc.

Solutions

Expert Solution

Steps In The Data Mining Process

The data mining process is divided into two parts i.e. Data Preprocessing and Data Mining. Data Preprocessing involves data cleaning, data integration, data reduction, and data transformation. The data mining part performs data mining, pattern evaluation and knowledge representation of data.

[image source]

Why do we preprocess the data?

There are many factors that determine the usefulness of data such as accuracy, completeness, consistency, timeliness. The data has to quality if it satisfies the intended purpose. Thus preprocessing is crucial in the data mining process. The major steps involved in data preprocessing are explained below.

#1) Data Cleaning

Data cleaning is the first step in data mining. It holds importance as dirty data if used directly in mining can cause confusion in procedures and produce inaccurate results.

Basically, this step involves the removal of noisy or incomplete data from the collection. Many methods that generally clean data by itself are available but they are not robust.

This step carries out the routine cleaning work by:

(i) Fill The Missing Data:

Missing data can be filled by methods such as:

  • Ignoring the tuple.
  • Filling the missing value manually.
  • Use the measure of central tendency, median or
  • Filling in the most probable value.

(ii) Remove The Noisy Data: Random error is called noisy data.

Methods to remove noise are :

Binning: Binning methods are applied by sorting values into buckets or bins. Smoothening is performed by consulting the neighboring values.

Binning is done by smoothing by bin i.e. each bin is replaced by the mean of the bin. Smoothing by a median, where each bin value is replaced by a bin median. Smoothing by bin boundaries i.e. The minimum and maximum values in the bin are bin boundaries and each bin value is replaced by the closest boundary value.

  • Identifying the Outliers
  • Resolving Inconsistencies

#2) Data Integration

When multiple heterogeneous data sources such as databases, data cubes or files are combined for analysis, this process is called data integration. This can help in improving the accuracy and speed of the data mining process.

Different databases have different naming conventions of variables, by causing redundancies in the databases. Additional Data Cleaning can be performed to remove the redundancies and inconsistencies from the data integration without affecting the reliability of data.

Data Integration can be performed using Data Migration Tools such as Oracle Data Service Integrator and Microsoft SQL etc.

#3) Data Reduction

This technique is applied to obtain relevant data for analysis from the collection of data. The size of the representation is much smaller in volume while maintaining integrity. Data Reduction is performed using methods such as Naive Bayes, Decision Trees, Neural network, etc.

Some strategies of data reduction are:

  • Dimensionality Reduction: Reducing the number of attributes in the dataset.
  • Numerosity Reduction: Replacing the original data volume by smaller forms of data representation.
  • Data Compression: Compressed representation of the original data.

#4) Data Transformation

In this process, data is transformed into a form suitable for the data mining process. Data is consolidated so that the mining process is more efficient and the patterns are easier to understand. Data Transformation involves Data Mapping and code generation process.

Strategies for data transformation are:

  • Smoothing: Removing noise from data using clustering, regression techniques, etc.
  • Aggregation: Summary operations are applied to data.
  • Normalization: Scaling of data to fall within a smaller range.
  • Discretization: Raw values of numeric data are replaced by intervals. For Example, Age.

#5) Data Mining

Data Mining is a process to identify interesting patterns and knowledge from a large amount of data. In these steps, intelligent patterns are applied to extract the data patterns. The data is represented in the form of patterns and models are structured using classification and clustering techniques.

#6) Pattern Evaluation

This step involves identifying interesting patterns representing the knowledge based on interestingness measures. Data summarization and visualization methods are used to make the data understandable by the user.

#7) Knowledge Representation

Knowledge representation is a step where data visualization and knowledge representation tools are used to represent the mined data. Data is visualized in the form of reports, tables, etc.

Data mining Development in short (the following screenshot is from internet):


Related Solutions

What is data mining? What are the steps in the data mining pipeline? What are the...
What is data mining? What are the steps in the data mining pipeline? What are the different kinds of tasks in data mining? What are the different types of data? How do you manage data quality? What is sampling? How do you design algorithms for sampling? How do you compute similarity and dissimilarity in data? How do you mine frequent itemsets and discover association rules from transaction data? How are space constraints handles in transactional data while designing algorithms to...
What is the purpose of heat fixing a slide? List the 4 major steps of the...
What is the purpose of heat fixing a slide? List the 4 major steps of the gram stain and explain what is happening to both Gram positive and Gram negative cells in each step. Why don’t we heat fix in the capsule stain? What are the major things that are done in the acid fast stain procedure that allow acid fast cells to be stained? Hypothetically, let's say you perform an endospore stain and observe that the endospores formed in...
What are the 5 defined steps in the Data Mining process to gain knowledge? PLEASE EXPLAIN...
What are the 5 defined steps in the Data Mining process to gain knowledge? PLEASE EXPLAIN IN DETAIL
1. What are the fundamental activities that are common to all software processes? 2. List 3...
1. What are the fundamental activities that are common to all software processes? 2. List 3 generic process models that are used in software engineering? 3. Why are iterations usually limited when the waterfall model is used? 4. What are the three benefits of incremental development, compared to the waterfall model? 5. What are the development stages in integration and configuration? 6. What are the principal requirements engineering activities? 7. Why is it increasingly irrelevant to distinguish between software development...
1. what specific steps of muscle contraction cycle cost energy? list relevant steps 2. how are...
1. what specific steps of muscle contraction cycle cost energy? list relevant steps 2. how are the processes leading to muscle cramps associated with intense activity in rigor mortis related?
course: IT Data Mining and Data Warehousing There are several typical cube computation methods such as...
course: IT Data Mining and Data Warehousing There are several typical cube computation methods such as Multi-Way, BUC, and Star-cubing. Briefly describe each one of these methods outlining the key points. note: need a unique answer and no handwriting please.
ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science...
ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science is a hot new career 4 I don't love my major or career Use the corpus of documents shown in the above table to answer the quiz questions below. What is the inverse document frequency (IDF) of the term "love"? (Round your answer to 2 decimal places). What is the TF-IDF value (importance) of the term "data" to document 1? (Round your answer to...
Need Advise, As a Student in Business Analyst and Data mining what steps should i take...
Need Advise, As a Student in Business Analyst and Data mining what steps should i take as a student to make myself more competitive for a current job market in the U.S. Thanks.
What subject deals with different methods of developing useful information from large data bases? data mining...
What subject deals with different methods of developing useful information from large data bases? data mining data manipulation     big data data warehousing
In this exercise, you created master data, and you entered transactions corresponding to business processes. In...
In this exercise, you created master data, and you entered transactions corresponding to business processes. In many businesses, those persons who create master data are not allowed to create business process transactions. Similarly, those who create business process transactions are not allowed to create master data. Why would organizations establish these restrictions?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT