Question

In: Computer Science

2. ___ is an unsupervised data mining technique requiring no a priori hypothesis or model about...

2. ___ is an unsupervised data mining technique requiring

no a priori hypothesis or model about initial patterns or

relationships that may exist within the data.

A) Regression Analysis

B) Clustering Analysis

C) Neural Networks

D) Decision Trees

3. Which of the following are potential data quality

concerns?

A) Dirty data

B) Missing values

C) Inconsistent data

D) Data not integrated

E) All of the above

6. An analyst attempts to investigate how the customer’s

yearly incomes influence their average spending regarding

health insurance. The analyst received the following

information:

SST = 19.35 & SSE = 3.66

What is the R square value of this model?

Solutions

Expert Solution

2. B) Clustering Analysis

Clustering is an unsupervised machine learning task that automatically divides the data into clusters, or groups of similar items. It does this without having been told how the groups should look ahead of time.

3. C) Inconsistent data

When dealing with multiple data sources, inconsistency is a big indicator that there’s a data quality problem. In many circumstances, the same records might exist multiple times in a database. Duplicate data is one of the biggest problems that exist for data-driven businesses and can bring down revenue faster than any other data issue.

6. SST = 19.35 & SSE = 3.66

R square value of this model:

R-square is the square of the correlation between the response values and the predicted response values. It is also called the square of the multiple correlation coefficient and the coefficient of multiple determination.

R-square is defined as

  • R-square = 1 - [Sum(i=1 to n){wi (yi - fi)2}] /[Sum(i=1 to n){wi (yi - yav)2}] = 1 - SSE/SST

R square = 1 - SSE/SST = 1- 0.189147 = 0.8108

R-square can take on any value between 0 and 1, with a value closer to 1 indicating that a greater proportion of variance is accounted for by the model. For example, an R-square value of 0.8108 means that the fit explains 81.08% of the total variation in the data about the average.


Related Solutions

Explain the circumstances in which the auditor can use data mining as an audit technique Formulate...
Explain the circumstances in which the auditor can use data mining as an audit technique Formulate audit procedures to test assertion over investment property valuation at fair value
1 - Define the data mining technique of cause-and-effect modeling. Provide an example to illustrate cause-and-effect...
1 - Define the data mining technique of cause-and-effect modeling. Provide an example to illustrate cause-and-effect modeling that is taken from your own workplace (past or present)?
What type of data mining model might be used by a retailer such as Kroger or...
What type of data mining model might be used by a retailer such as Kroger or Macy's to determine how to group customers into segments (such as "bargain shoppers", "fashion conscious", "trend setter")? a) a classification model b) a clustering model c) a regression model d) an association model
ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science...
ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science is a hot new career 4 I don't love my major or career Use the corpus of documents shown in the above table to answer the quiz questions below. What is the inverse document frequency (IDF) of the term "love"? (Round your answer to 2 decimal places). What is the TF-IDF value (importance) of the term "data" to document 1? (Round your answer to...
first select one topic about Data Mining or Data warehouse submit a research article on the...
first select one topic about Data Mining or Data warehouse submit a research article on the topic related to Data Mining or Data warehouse. 1-write at least one page introduction of the topic that you have selected in your own words. The introduction should explain the topic in simple English. YOU CAN ADD DIAGRAM 2-summary of other research related to the topic.have to be some journal papers or conference proceedings 3-If you want to use more than 3-4 of the...
Briefly summarize what is Data Science for Business? What you need to know about data mining...
Briefly summarize what is Data Science for Business? What you need to know about data mining and data-analytic thinking". How do you think about the emerging trend of Big Data and Data Mining?
Think about olfaction as an assessment technique. Give two or three additional examples of data you...
Think about olfaction as an assessment technique. Give two or three additional examples of data you might collect through the use of smell. You are caring for a woman who has no hair on her head. How might you determine the cause of her hair loss? What other assessments should you perform?
What are your thoughts about Google and Facebook mining your personal data for advertisers and mostly...
What are your thoughts about Google and Facebook mining your personal data for advertisers and mostly doing it without you knowing. Are you, people ready to quit Facebook, stop using Google? What are some of the solutions to this problem?
Conception about the hypothesis test. Question: can the data be test by poission distribution H0: it...
Conception about the hypothesis test. Question: can the data be test by poission distribution H0: it can be test by poission H1: it cant be test by poission model Final: we fail to reject H0. My Question: My prof told me that even though we fail to reject H0, but it doesnt mean that H0 is trues. therefore, what should I write on my conclusion. should I write "yes, it can be test by poission model???" However, it doesn't mean...
Data are collected on 15 individuals and two models are considered: Model 1: model with 2...
Data are collected on 15 individuals and two models are considered: Model 1: model with 2 predictors (A and B), SStotal = 360, SST = 200 Model 2: model with 4 predictors (A, B, C and D), SStotal = 360, SST = 270 1. The standard error of the complete model is equal to _______ 2. The number of predictors to test to evaluate whether the restricted model is sufficient is equal to _______ 3. The partial F-test in 2....
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT