In: Computer Science
Please describe the steps taken in Data Preprocessing. Give examples and explain what was done in each of these steps.
PLEASE PROVIDE DETAILED EXPLANATION , SO I CAN UNDERSTAND EASILY
COURSE : DATA MINING
These are the steps involved in data processing:-
(i) Data cleaning:- In this process we remove all irrelevant data. It is basically removal of noisy data. In data cleaning we also put values in place of missing values. You know missing values are null values and we can't operate with null values. So replacement of null with 0 is also done for numerical values.
(ii) Data integration:- After the process of data cleaning we integrate data coming from different sources. for example there may be some excel sheets and mysql tables , so we need to integrate into one file.
(iii) Data selection:- Once we get data into repository after data integration, the volume of data might be very huge. So we can't process all data at a time. We need to select only those data section where we want to work.
(iv) Data transformation:- It involves normalization, hierarchy generation and attribute making. As we all of us know that normalization is done to reduce redundant data. The hierarchy is created for analyzing data more easily and attribute making is creating new attributes from original ones. for example we are creating full name from first name and last name attributes.
(v) Data mining:- It is the core process of data processing. Here we apply intelligent methods to filter data to extract patterns. It includes association, classification, clustering, time series analysis and much more.
(vi) Pattern evaluation and data representation:- In the last step we evaluate the data patterns based on data mining methods and finally we represent data to analyze it. we also call it as data reporting .
DON'T FORGET TO HIT LIKE.
THANKS BY HEART