In: Computer Science
Data mining
i have a data column where the required mark is 20 but the data fed in to the column is more than 40
, how do you advise me to clean that data,
and also please state what data is it for instance noisy or what and also name the method to solve it and please tell how to clean it
thanks
ANS:-
FOR CLEANSING SUCH INFORMATION WE NEED TO DO DATA CLEANING FUNCTIONS.
DATA CLEANING:-
The quality of your data is important in getting the final analysis. Any data that is often incomplete, noisy and inconsistent can affect your results. Data mining data mining is the process of finding and removing corrupt or inaccurate records from a record set, table or database
DATA CLEANING FUNCTIONS: -
1. Remove Invalid Values
The first and most important thing you have to do is remove the useless pieces of data from your system. Any useless or inactive data is not what you need. The context of your problem may not match.
You may only need to estimate the average age of your sales staff. After that their email address will not be required. Another example might be looking at how many customers you have connected with per month. In this case, you would not need data from the people you reached last month.
However, before you can delete a specific piece of data, make sure it doesn't work because you may need to check its corresponding values over time (to check consistency). And if you can get a second opinion from an experienced expert before extracting data, feel free to do so.
You will not want to cancel certain values and regret the decision later. But once you are sure the data is not working, finish it.
RANGE:
Some types of numbers must be in a certain width. For example, the number of products you can transport per day should be low and high. There will certainly be some data range. There will be a starting point and an end point.
Incompatible column:
If your DataFrame (Database is a two-dimensional data structure, i.e., data aligned with tabular fonts in rows and columns) contains inactive columns or you will never use them to drop them to give more focus to the columns you will be working on. Let's look at an example of how to deal with such data setup. Let’s take an example of student data setup using the pandas DataFrame.