- Calculate the predicted probability in logistic regression
model.
- Divide the data into two datasets. One dataset contains
observations having actual value of dependent variable with value 1
(i.e. event) and corresponding predicted probability values. And
the other dataset contains observations having actual value of
dependent variable 0 (non-event) against their predicted
probability scores.
- Compare each predicted value in first dataset with each
predicted value in second dataset.
- Total Number of pairs to compare = x * y
x: Number of observations in first dataset (actual values of 1 in
dependent variable)
y: Number of observations in second dataset (actual values of 0 in
dependent variable).
In this step, we are performing cartesian product (cross
join) of events and non-events. For example, you have 100
events and 1000 non-events. It would create 100k (100*1000) pairs
for comparison.
- A pair is concordant if 1 (observation with the desired outcome
i.e. event) has a higher predicted probability than 0 (observation
without the outcome i.e. non-event).
- A pair is discordant if 0 (observation without the desired
outcome i.e. non-event) has a higher predicted probability than 1
(observation with the outcome i.e. event).
- A pair is tied if 1 (observation with the desired outcome i.e.
event) has same predicted probability than 0 (observation without
the outcome i.e. non-event).
- The final percent values are calculated using the formula below
-
Percent Concordant = (Number of concordant pairs)/Total number
of pairs
Percent Discordance = (Number of discordant pairs)/Total number of
pairs
Percent Tied = (Number of tied pairs)/Total number of pairs
Area under curve (c statistics) = Percent Concordant + 0.5 *
Percent Tied
Percent Concordant : Percentage of pairs where
the observation with the desired outcome (event) has a higher
predicted probability than the observation without the outcome
(non-event).
Percent Discordant : Percentage of pairs where the
observation with the desired outcome (event) has a lower predicted
probability than the observation without the outcome
(non-event).
Percent Tied : Percentage of pairs where the
observation with the desired outcome (event) has same predicted
probability than the observation without the outcome
(non-event).
c statistics (AUC) : c-statistics is also called
area under curve (AUC). It is calculated by adding Concordance
Percent and 0.5 times of Tied Percent
In general, higher percentages of concordant pairs and lower
percentages of discordant and tied pairs indicate a more desirable
model.