- Datasets is a set of collected data that is designed into
database tables. This datasets are used in machine learning or
artificial intelligence. This data sets are used to train the
machines.
- Missing or corrupted data can occur in these data sets.
- In data sets some values are missed ,so the missed values are
called as missing data or corrupted data.
There are few solutions to handle the missing or corrupted data
following as :
- Deleting rows or columns of missed values :
This is the first approach to handle the missed or corrupted data.
It is correct approach to delete rows or columns where there are
empty cells. By deleting the row or column, there is no appearance
of missed data.
- Replace with continuous values : This is
another approach to handle missing or corrupted data.The empty
cells are filled with any guessed values according to before rows
or columns.So that there is no empty cells because they are
filled.
- Impute categorical columns : By this approach
, a new category column is assigned. So that missing values are
places with most frequent category value.
- Predicting missing values : This is another
approach to handle the missing data. Based on the filled values in
a row or column some algorithms can predict the missing values. By
this way missed values are filled.
Therefore, these are ways to handle the missing or corrupted
data.