Question

In: Computer Science

25. In Data Mining, ___ is a set of input variables used to predict an observation's...

25. In Data Mining, ___ is a set of input variables used to predict an observation's outcome class or continuous outcome value.

26. During each iteration of cluster analysis, the distances between new clusters are determined until any two clusters are sufficiently close to be linked using an algorithm called ___.

27. In the CRISP-DM process for data mining, which phase is the cleaning of the data so it is ready for modeling tools?

Solutions

Expert Solution

25. In Data Mining, ___ is a set of input variables used to predict an observation's outcome class or continuous outcome value.

Ans : Hypothesis

Explanation : Hypothesis is main part of Learning problems or Algorithms that maps from set of input variables to target variables or predicted value.In order to design Learning algorithms, Firstly we have to decide is how we want to represent hypothesis ? So, In most supervised leaning algorithms , our main goal is to find out the possible hypothesis that could possibly map out the inputs to proper outputs.

26. During each iteration of cluster analysis, the distances between new clusters are determined until any two clusters are sufficiently close to be linked using an algorithm called ___.

Ans : Agglomerative Hierarchical clustering algorithm

Explanation : This is one of the type of Hierarchical clustering algorithm that works on assumption that Initially all individual points are considered as clusters and then successively linked the two closest clusters until only one cluster remains.

For more Information -

Hierarchical clustering algorithm is of two types :

  1. Agglomerative Hierarchical clustering algorithm or AGNES (agglomerative nesting)

  2. Divisive Hierarchical clustering algorithm or DIANA (divisive analysis).

27. In the CRISP-DM process for data mining, which phase is the cleaning of the data so it is ready for modeling tools?

Ans : Phase 3 - Data Preparation Phase

This phase of CRISP-DM involves the tasks –

1. Select data

2. Clean data

3. Construct data

4. Integrate data

5. Format data


Related Solutions

In the previous assessment, you used a static set of named variables to store the data...
In the previous assessment, you used a static set of named variables to store the data that was entered in a form to be output in a message. For this assessment, you will use the invitation.html file that you modified in the previous assessment to create a more effective process for entering information and creating invitations for volunteers. Rather than having to enter each volunteer and create an invitation one at a time, you will create a script that will...
The Iris data set is a well-known data set among data mining analysts. Please provide some...
The Iris data set is a well-known data set among data mining analysts. Please provide some background of this data set and the information contained in it.
DATA MINING : Find an interesting data set on the Web. Provide a high level description...
DATA MINING : Find an interesting data set on the Web. Provide a high level description of the data set and minimally give its name, location, number of features (with some discussion of the feature types), and number of entries. Describe how data mining can be applied to it (e.g., for classification, etc.) and describe why you think it is interesting.
Each value in the data set is called a ? .    Variables whose values are...
Each value in the data set is called a ? .    Variables whose values are determined by chance are called ? . A Blank 1 consists of all subjects (human or otherwise) that are being studied.    A Blank 1 is a circle that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution.    Tell whether Descriptive or Inferential Statistics has been used. In the upcoming election, it is predicted...
Using the data below, what percentage of data would you predict would be between 25 and...
Using the data below, what percentage of data would you predict would be between 25 and 50 and what percentage would you predict would be more than 50 miles? Then determine the percentage of data points in the dataset that fall within each of these ranges. How do each of these compare with your prediction and why is there a difference? Predicted percentage between 25 and 50 miles: Actual percentage between 25 and 50 miles: Predicted percentage of more than...
Use the HousePrice data and via multiple regression select the two variables that predict the house...
Use the HousePrice data and via multiple regression select the two variables that predict the house selling price the best. Make another table with these two variables and answer the questions. Numerical answers are rounded so choose the answer that matches the best: 9. Identify the negative coefficient. What is its value and what is the interpretation of this number? (Choose the most appropriate answer. Note: numbers are truncated.) a. -0.037; This is a negative number and serves no statistical...
Use the CO2 data and via Multiple regression select the two variables that predict the CO2...
Use the CO2 data and via Multiple regression select the two variables that predict the CO2 level with the best P-value. Make another table with these two variables and answer the questions. Numerical answers are rounded so choose the answer that matches the best: Hour CO Traffic Wind 1 2.4 50 -0.2 2 1.7 26 0.0 3 1.4 16 0.0 4 1.2 10 0.0 5 1.2 12 0.1 6 2.0 41 -0.1 7 3.4 157 -0.1 8 5.8 276 -0.2...
Variables in Wooldridge's data set (description): Cross-sectional data set from Wooldridge 1. return % change stock...
Variables in Wooldridge's data set (description): Cross-sectional data set from Wooldridge 1. return % change stock price, 90-94 2. dkr debt/capital, 1990 3. eps earnings per share, 1990 4. netinc net income, 1990 (millions $) 5. salary CEO salary, 1990 (thousands $) Dataset: return dkr eps netinc salary -20.84211 4 48.1 1144 1090 -9.138381 27.3 -85.3 35 1923 86.21795 36.8 -44.1 127 1012 131.8367 46.4 192.4 367 579 -8.189655 36.2 -60.4 214 600 -26.00733 18.7 -79.8 118 735 52.27273 34.4...
Explain why heteroscedasticity in a data set is problematic when one is trying to predict the...
Explain why heteroscedasticity in a data set is problematic when one is trying to predict the behavior of one variable based on the second variable.
What type of data mining model might be used by a retailer such as Kroger or...
What type of data mining model might be used by a retailer such as Kroger or Macy's to determine how to group customers into segments (such as "bargain shoppers", "fashion conscious", "trend setter")? a) a classification model b) a clustering model c) a regression model d) an association model
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT