Question

In: Computer Science

Is the following example of classfication, regression, or clustering problems? Why? For a data mining project,...

Is the following example of classfication, regression, or clustering problems? Why? For a data mining project, a student collects information on income, age, sex, profession, and home zip code for fans of the 9 different New York Sports teams. She wants to build a model to predict which team someone roots for.

Solutions

Expert Solution

The given example is of a classification problem.

Regression and classification are supervised learning approaches that map an input to an output based on example input-output pairs, while clustering is a unsupervised learning approach.

Classification predictive modeling is the task of approximating a mapping function (f) from input variables (X) to discrete output variables (y).

Regression predictive modeling is the task of approximating a mapping function (f) from input variables (X) to a continuous output variable (y).

Clustering is the task of partitioning the dataset into groups, called clusters.

Since both Classification and Clustering are used for the categorisation of objects into one or more classes based on the features, they appear to be a similar process as the basic difference is minute. In the case of Classification, there are predefined labels assigned to each input instances according to their properties whereas in clustering those labels are missing.

Classification -

Clustering -

In our case, the output or the value to be predicted are 9 discrete values for 9 different New York Sports teams based on the input features income, age, sex, profession, and home zip code.

Since we have a discrete output variable and we also have the predefined labels assigned to each input instances, it is a classification problem.


Related Solutions

Data Mining Techniques Please discuss whether or not the following problems are data mining tasks. Explain...
Data Mining Techniques Please discuss whether or not the following problems are data mining tasks. Explain why. (a). Retrieve students' records from a relational table with grade = "A". [5 points] (b). From the table of students' information, check if attributes last name and address have any correlations. [5 points] (c). Find all the documents from the text database containing keywords "data mining". [5 points] (d). Divide the text database into several groups, each group containing near-duplicate or similar documents....
(a) Briefly explain the data mining process. (b) What are the different problems that data mining...
(a) Briefly explain the data mining process. (b) What are the different problems that data mining can solve in general? Explain.
What is Clustering? What is the purpose of clustering? What assumptions are needed for clustering? Why...
What is Clustering? What is the purpose of clustering? What assumptions are needed for clustering? Why do you need to transform qualitative variables to be presented by numbers in clustering? What is the purpose of normalizing quantitative variables?
What are some pros and cons to data mining? Provide an example of when data mining...
What are some pros and cons to data mining? Provide an example of when data mining was used and the outcome provided an incorrect assumption or issue. How can these types of situations be avoided in the future?
1) Describe a real-world example that uses one of the Data Mining Tasks and why is...
1) Describe a real-world example that uses one of the Data Mining Tasks and why is this task best suited to this example? PLEASE EXPLAIN IN DETAIL.
Why is data mining a key piece of analytics?
Why is data mining a key piece of analytics?
Provide an example on how data mining can turn a large collection of data into knowledge...
Provide an example on how data mining can turn a large collection of data into knowledge that can help meet a current global challenge in order to improve healthcare outcomes.
What is Data mining application and how it works in telemedicine? with example in how it...
What is Data mining application and how it works in telemedicine? with example in how it works in telemedicine
We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering,...
We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering, GMM, Naive Bayes, and decision trees. For this assignment, I would like you to choose three models from the above list and describe two problems that each of the models could potentially be used to solve. You can do one big post with all three models and six solvable problems or do three separate posts if you prefer.   Short Explanation of Decision Trees Decision...
The standard project is to use multiple regression analysis to analyze a data set. The data...
The standard project is to use multiple regression analysis to analyze a data set. The data set is a study of student persistent enrolling in the next semester based on Gender, Age, GPA, a 22 questionnaire on self-efficacy, and student enrollment status. The educational researcher wants to study the relationship between student enrollment status as it relates to gender, age, GPA, and the total response to a 22 questionnaire survey. a. The estimated multiple regression analysis equation. b. Does the...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT