In: Statistics and Probability
Provide an example where classification would be used as the analysis. Describe the predictor(s), the response variable, and explain whether prediction or inference would be of primary interest.
The data classification analysis function is the process of assigning columns into meaningful categories that can be used to organize and focus subsequent analysis work.
The following attributes used in IBM for classification of data.
Data Class:
A system defined semantic businesses use categories for the column.
Data Subclass:
A user defined semantic businesses use category within a data class.
User Class:
A user defined category independent of data class.
For the system inferred data class, a column is categorized into the following system defined data classification designations:
Identifier
Product Code
Indicator
Date
Quantity
Text
Large Object
System capability :
The system automatically applies the data classification algorithm to each column whenever it performs column Analysis processing.
Decisions and actions :
In this we to make classification in each column. Either accept the system inferred data class or override the inferred data class by selecting another.
Predictors variables means input values , some variables are:
Evaluate for data type
Length,
Unique cardinality.
The above inputs are used to validate values.