How can we use concordant pairs (the c-index) to perform cross-validation for a logistic regression model?

Expert Solution

Calculate the predicted probability in logistic regression model.
Divide the data into two datasets. One dataset contains observations having actual value of dependent variable with value 1 (i.e. event) and corresponding predicted probability values. And the other dataset contains observations having actual value of dependent variable 0 (non-event) against their predicted probability scores.
Compare each predicted value in first dataset with each predicted value in second dataset.
Total Number of pairs to compare = x * y
x: Number of observations in first dataset (actual values of 1 in dependent variable)
y: Number of observations in second dataset (actual values of 0 in dependent variable).
In this step, we are performing cartesian product (cross join) of events and non-events. For example, you have 100 events and 1000 non-events. It would create 100k (100*1000) pairs for comparison.
A pair is concordant if 1 (observation with the desired outcome i.e. event) has a higher predicted probability than 0 (observation without the outcome i.e. non-event).
A pair is discordant if 0 (observation without the desired outcome i.e. non-event) has a higher predicted probability than 1 (observation with the outcome i.e. event).
A pair is tied if 1 (observation with the desired outcome i.e. event) has same predicted probability than 0 (observation without the outcome i.e. non-event).
The final percent values are calculated using the formula below -

Percent Concordant = (Number of concordant pairs)/Total number of pairs
Percent Discordance = (Number of discordant pairs)/Total number of pairs
Percent Tied = (Number of tied pairs)/Total number of pairs
Area under curve (c statistics) = Percent Concordant + 0.5 * Percent Tied

Percent Concordant : Percentage of pairs where the observation with the desired outcome (event) has a higher predicted probability than the observation without the outcome (non-event).

Percent Discordant : Percentage of pairs where the observation with the desired outcome (event) has a lower predicted probability than the observation without the outcome (non-event).

Percent Tied : Percentage of pairs where the observation with the desired outcome (event) has same predicted probability than the observation without the outcome (non-event).

c statistics (AUC) : c-statistics is also called area under curve (AUC). It is calculated by adding Concordance Percent and 0.5 times of Tied Percent

In general, higher percentages of concordant pairs and lower percentages of discordant and tied pairs indicate a more desirable model.

orchestra answered 2 years ago

Cross validation: If we perform k-fold cross validation, training a Naive Bayes model, how many models...

Cross validation: If we perform k-fold cross validation, training a Naive Bayes model, how many models will we end up creating?

One can use a fitted logistic regression model to predict the probability/odd of an interest of...

One can use a fitted logistic regression model to predict the probability/odd of an interest of choice represented by an dependent variable given a set of the values of the independent variables in the model. True or False

define the logistic regression model.

A multivariate logistic regression model has been built to predict the propensity of shoppers to perform...

A multivariate logistic regression model has been built to predict the propensity of shoppers to perform a repeat purchase of a free gift that they are given. The descriptive features used by the model are the age of the customer, the socio-economic band to which the customer belongs (a, b, or c), the average amount of money the customer spends on each visit to the shop, and the average number of visits the customer makes to the shop per week....

What is binary logistic regression, and how to use it?

Logistic Regression In logistic regression we are interested in determining the outcome of a categorical variable....

Logistic Regression In logistic regression we are interested in determining the outcome of a categorical variable. In most cases, we deal with binomial logistic regression with the binary response variable, for example yes/no, passed/failed, true/false, and others. Recall that logistic regression can be applied to classification problems when we want to determine a class of an event based on the values of its features. In this assignment we will use the heart data located at http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29 Here is the...

A logistic regression model describes how the probability of voting for the Republican candidate in a...

A logistic regression model describes how the probability of voting for the Republican candidate in a U.S. presidential election depends on x_1=family income, x_2=number of years of education, and s=sex (1=male, 0 = female), the prediction equation is "logit"[P ̂ (y=1)] = -2.40+0.02(x_1) +0.08(x_2)+0.20s. For this sample, ranges from 6 to 157 with a standard deviation of 25, and ranges from 7 to 20 with a standard deviation of 3. Interpret the sign of the coefficients of income and years...

Develop the best logistic regression model that can predict the wage by using the combination of...

Develop the best logistic regression model that can predict the wage by using the combination of any following variables: total unit (X2), constructed unit (X3), equipment used (X4), city location (X5) and total cost of a project (X6). Make sure that you partition your data with 60% training test, 40% validation test, and default seed of 12345 before running the logistic regression (15 points) Wage - X1 Total Unit - X2 Contracted Units - X3 Equipment Used - X4 City...

How can marketing managers use regression model to forecast sales?

Question

How can we use concordant pairs (the c-index) to perform cross-validation for a logistic regression model?

Solutions

Expert Solution

Related Solutions

Cross validation: If we perform k-fold cross validation, training a Naive Bayes model, how many models...

One can use a fitted logistic regression model to predict the probability/odd of an interest of...

define the logistic regression model.

A multivariate logistic regression model has been built to predict the propensity of shoppers to perform...

What is binary logistic regression, and how to use it?

What is binary logistic regression, and how to use it?

Logistic Regression In logistic regression we are interested in determining the outcome of a categorical variable....

A logistic regression model describes how the probability of voting for the Republican candidate in a...

Develop the best logistic regression model that can predict the wage by using the combination of...

How can marketing managers use regression model to forecast sales?