In: Computer Science
Is the following example of classfication, regression, or clustering problems? Why? For a data mining project, a student collects information on income, age, sex, profession, and home zip code for fans of the 9 different New York Sports teams. She wants to build a model to predict which team someone roots for.
The given example is of a classification problem.
Regression and classification are supervised learning approaches that map an input to an output based on example input-output pairs, while clustering is a unsupervised learning approach.
Classification predictive modeling is the task of approximating a mapping function (f) from input variables (X) to discrete output variables (y).
Regression predictive modeling is the task of approximating a mapping function (f) from input variables (X) to a continuous output variable (y).
Clustering is the task of partitioning the dataset into groups, called clusters.
Since both Classification and Clustering are used for the categorisation of objects into one or more classes based on the features, they appear to be a similar process as the basic difference is minute. In the case of Classification, there are predefined labels assigned to each input instances according to their properties whereas in clustering those labels are missing.
Classification -
Clustering -
In our case, the output or the value to be predicted are 9 discrete values for 9 different New York Sports teams based on the input features income, age, sex, profession, and home zip code.
Since we have a discrete output variable and we also have the predefined labels assigned to each input instances, it is a classification problem.