Question

In: Computer Science

To evaluate the predictive performance of a model constructed from a two-class data set, k-fold cross-validations...

To evaluate the predictive performance of a model constructed from a two-class data set, k-fold cross-validations are frequently applied. Describe the concept of cross-validation, and two performance measures, sensitivity and specificity, respectively.

Expert Solution

Cross Validation:

Cross-validation is a technique in which we train our model using the subset of data-set and then evaluate using complementary subset of the data-set.

We use cross-validation technique when we couldn’t fit the model on the training data and can’t say that the model will work accurately for the real data. For this, we must assure that our model got the correct patterns from the data, and it is not getting up too much noise.

The steps involved in cross-validation are:

1.Reserve some portion of sample data-set.

2.Using the rest data-set train the model.

3.Test the model using the reserve portion of the data-set.

Two performance measures:

The true performance measures are

1. F1Score

2. Recall

1.F1 Score:

Its a harmonic mean between recall and precision. Its range is [0,1].This metric usually tells us how precise and robust our classifier is. It is used to measure the tests accuracy.

2. Classification Accuracy:

It is the ratio of number of correct predictions to the total number of input samples.

It works well only if there are equal number of samples belonging to each class.

Classification accuracy is good but it gives False Positive sense of achieving high accuracy. The problem arises due to the possibility of miss-classification of minor class samples are very high.

Sensitivity:

Sensitivity is the proportion of truely positive cases that were classified as positive; thus, it is a measure of how well your classifier identifies positive cases.

Specificity:

Specificity is the proportion of truly negative cases that were classified as negative; thus, it is a measure of how well your classifier identifies negative cases.

venereology answered 9 months ago

Cross validation: If we perform k-fold cross validation, training a Naive Bayes model, how many models...

Cross validation: If we perform k-fold cross validation, training a Naive Bayes model, how many models will we end up creating?

How does the computational time changes when we decrease the k in k-fold cross validation? Why?...

How does the computational time changes when we decrease the k in k-fold cross validation? Why? Explain. b. In which procedures, we can apply k-fold cross validation. Consider all the procedures that we learned.

This is for Predictive Analytics. 1. Read the iris data set into a data frame. 2....

This is for Predictive Analytics. 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint:...

Variables in Wooldridge's data set (description): Cross-sectional data set from Wooldridge 1. return % change stock...

Variables in Wooldridge's data set (description): Cross-sectional data set from Wooldridge 1. return % change stock price, 90-94 2. dkr debt/capital, 1990 3. eps earnings per share, 1990 4. netinc net income, 1990 (millions $) 5. salary CEO salary, 1990 (thousands $) Dataset: return dkr eps netinc salary -20.84211 4 48.1 1144 1090 -9.138381 27.3 -85.3 35 1923 86.21795 36.8 -44.1 127 1012 131.8367 46.4 192.4 367 579 -8.189655 36.2 -60.4 214 600 -26.00733 18.7 -79.8 118 735 52.27273 34.4...

Below is a set of data representing measurement to the nearest millimeter of triceps skin-fold of...

Below is a set of data representing measurement to the nearest millimeter of triceps skin-fold of seventh-grade boys: 4,7,13,12,10,6,7,17,9,8,13,21,6,9,10,9,8,7,5,9,15,17,19 8,9,10,7,4,20,10,9,15,12,18,12,8,9,9,11,10,7,8,15,6,19,10 9,9,7,13,12,8,22,10,5,8,9,11,13,15,19,17,11,9,8,9,10,11,7 9,14,7,16,10,9,8,9,6,14,11,12,9,8,15,6,10,17 Organize data into a grouped frequency distribution. Draw a histogram of the data Draw frequency polygon of the data Calculate cumulative frequency and cumulative percentiles for the data From the grouped frequency distribution, find the mode, median, and rang Transform the raw scores of 7, 10, 13, and 16 into corresponding T-scores, and percentiles.

1. The followings are the performance measures generated from multiple runs of the k-Means model on...

1. The followings are the performance measures generated from multiple runs of the k-Means model on the same dataset with different configurations. Which one presents the best k-Means model. Select one: a. SSE: 303; Davies-Bouldin index: 0.29 b. SSE: 706: Davies-Bouldin index: 0.85 c. SSE: 866; Davies-Bouldin index: 0.87 2. You have to specify which of the following parameters before the run of the DBSCAN algorithm. Select one: a. epsilon b. epsilon and MinPoints c. Number of clusters (k) d....

source: /** * A class to model an item (or set of items) in an *...

source: /** * A class to model an item (or set of items) in an * auction: a batch. */ public class BeerBatch { // A unique identifying number. private final int number; // A description of the batch. private String description; // The current highest offer for this batch. private Offer highestOffer; /** * Construct a BeerBatch, setting its number and description. * @param number The batch number. * @param description A description of this batch. */ public BeerBatch(int...

One standard that corporations use to evaluate their performance against their competitors is the set of...

One standard that corporations use to evaluate their performance against their competitors is the set of rankings developed by Fortune magazine. These include the Fortune 500, the 100 Best Companies to Work For, and other lists. The public also uses these rankings to decide to what companies they should give their business. Respond to the following in a minimum of 175 words: Discuss who gains and who loses when an economy opens for trade. Explain what determines exchange rates in...

Part 2 BeerBatch Class /** * A class to model an item (or set of items)...

Part 2 BeerBatch Class /** * A class to model an item (or set of items) in an * auction: a batch. */ public class BeerBatch { // A unique identifying number. private final int number; // A description of the batch. private String description; // The current highest offer for this batch. private Offer highestOffer; /** * Construct a BeerBatch, setting its number and description. * @param number The batch...

1. A good predictive model is one that fits the data closely whereas a good explanatory...

1. A good predictive model is one that fits the data closely whereas a good explanatory model is one that predicts new cases accurately. A. True B. False 2. The specificity of a classifier is its ability to detect the important class members correctly and sensitivity is its ability to rule out C0 members correctly. A. True B. False 3. This method of finding the best subset of predictors relies on partial, iterative search through the space of all possible...