In: Math
This case study concerns a bank's efforts to calculate credit risk scores (These are opposite of credit scores. Higher the value the riskier the customer) A loan officer at a bank needs to be able to identify characteristics that are indicative of people who are likely to default on loans and use those characteristics to identify good and bad credit risks. The loan officer also needs to be able to better quantify an individual’s credit risk level.
Information on 700 past customers is given in the file along with data for the following variables:
Age: customer age in years.
Employment: years that the customer has been with his/her current employer
Address: years that the customer has lived at his/her current address
Income: household annual income (in $1,000)
Debt _to _Income: debt to income ratio (x100)
Risk _Score: Credit risk score (the higher the score, the more risky)
Default _Indicator: an indicator of whether the customer had previously defaulted
Test the hypothesis that the average credit risk score of customers who have previously defaulted on their loans is higher than those who haven’t defaulted on their loans. (Divide the credit risk score into two groups those who previously defaulted and those who didn’t and then compare the two groups.)
You would like to build a regression model to predict the credit risk score based on all the other variables. What is the regression equation?
What is the predicted credit risk score of a customer aged 40, who has been with their employer for 10 years, has lived at the same address for 3 years, has an income of $200,000, debt to income ratio of 2.5 and has previously defaulted on a loan. Will the prediction be accurate? Explain
Compute the coefficient of determination (R2) and fully interpret its meaning.
Interpret the meaning of the coefficients of the independent variables income and address.
Which independent variables significant? Defend your answer.
Age in years | Years with current employer | Years at current address | Household income in thousands | Debt to income ratio (x100) | Previously Defaulted | Credit Risk Score |
41 | 17 | 12 | 176 | 9.3 | 1 | 808.3943274 |
27 | 10 | 6 | 31 | 17.3 | 0 | 198.2974762 |
40 | 15 | 14 | 55 | 5.5 | 0 | 10.0361081 |
41 | 15 | 14 | 120 | 2.9 | 0 | 22.13828376 |
24 | 2 | 0 | 28 | 17.3 | 1 | 781.5883142 |
41 | 5 | 5 | 25 | 10.2 | 0 | 216.7089415 |
39 | 20 | 9 | 67 | 30.6 | 0 | 185.9601084 |
43 | 12 | 11 | 38 | 3.6 | 0 | 14.70865349 |
24 | 3 | 4 | 19 | 24.4 | 1 | 748.0412036 |
36 | 0 | 13 | 25 | 19.7 | 0 | 815.0570131 |
27 | 0 | 1 | 16 | 1.7 | 0 | 350.309226 |
25 | 4 | 0 | 23 | 5.2 | 0 | 239.0539023 |
52 | 24 | 14 | 64 | 10 | 0 | 9.790173473 |
37 | 6 | 9 | 29 | 16.3 | 0 | 364.4940475 |
48 | 22 | 15 | 100 | 9.1 | 0 | 11.87390385 |
36 | 9 | 6 | 49 | 8.6 | 1 | 96.70407786 |
36 | 13 | 6 | 41 | 16.4 | 1 | 212.0503906 |
43 | 23 | 19 | 72 | 7.6 | 0 | 1.404870603 |
39 | 6 | 9 | 61 | 5.7 | 0 | 104.1453903 |
41 | 0 | 21 | 26 | 1.7 | 0 | 91.9180135 |
39 | 22 | 3 | 52 | 3.2 | 0 | 4.373536462 |
47 | 17 | 21 | 43 | 5.6 | 0 | 3.047352362 |
28 | 3 | 6 | 26 | 10 | 0 | 293.9321797 |
29 | 8 | 6 | 27 | 9.8 | 0 | 106.7996198 |
21 | 1 | 2 | 16 | 18 | 1 | 629.7774553 |
25 | 0 | 2 | 32 | 17.6 | 0 | 861.3134014 |
45 | 9 | 26 | 69 | 6.7 | 0 | 16.46115799 |
43 | 25 | 21 | 64 | 16.7 | 0 | 1.437993467 |
33 | 12 | 8 | 58 | 18.4 | 0 | 276.7066755 |
26 | 2 | 1 | 37 | 14.2 | 0 | 503.3218674 |
45 | 3 | 15 | 20 | 2.1 | 0 | 76.41958523 |
30 | 1 | 10 | 22 | 10.5 | 0 | 433.6994251 |
27 | 2 | 7 | 26 | 6 | 0 | 288.7388759 |
25 | 8 | 4 | 27 | 14.4 | 0 | 231.1006843 |
25 | 8 | 1 | 35 | 2.9 | 0 | 74.95719559 |
26 | 6 | 7 | 45 | 26 | 0 | 950.0535168 |
30 | 10 | 4 | 22 | 16.1 | 0 | 211.9564036 |
32 | 12 | 1 | 54 | 14.4 | 0 | 335.9969153 |
28 | 1 | 8 | 24 | 17.1 | 1 | 643.9032953 |
45 | 23 | 5 | 50 | 4.2 | 0 | 2.268753579 |
23 | 7 | 2 | 31 | 6.6 | 0 | 132.8782071 |
34 | 17 | 3 | 59 | 8 | 0 | 31.76854323 |
42 | 7 | 23 | 41 | 4.6 | 0 | 31.90347933 |
39 | 19 | 5 | 48 | 13.1 | 0 | 28.07933138 |
26 | 0 | 0 | 14 | 7.5 | 1 | 511.04996 |
21 | 0 | 1 | 16 | 6.8 | 0 | 453.6168743 |
35 | 13 | 15 | 35 | 4.5 | 0 | 10.78188877 |
47 | 4 | 2 | 26 | 10.4 | 0 | 281.6573725 |
23 | 0 | 2 | 21 | 11.4 | 1 | 621.7847698 |
(first part) Test the hypothesis that the average credit risk score of customers who have previously defaulted on their loans is higher than those who haven’t defaulted on their loans. (Divide the credit risk score into two groups those who previously defaulted and those who didn’t and then compare the two groups.)
here we used t-test and following information has been generated using ms-excel and found that there is average credit risk is more for previously defaulted(1) as the one-tailed p-value is less than typical level of significance alpha=0.05
t-Test: Two-Sample Assuming Equal Variances | ||
previously defaulted(0) | previously defaulted(1) | |
Mean | 206.9791737 | 561.4770882 |
Variance | 58383.0311 | 62496.89295 |
Observations | 40 | 9 |
Pooled Variance | 59083.26291 | |
Hypothesized Mean Difference | 0 | |
df | 47 | |
t Stat | -3.953071406 | |
P(T<=t) one-tail | 0.000129087 | |
t Critical one-tail(0.05) | 1.677926722 | |
P(T<=t) two-tail | 0.000258174 | |
t Critical two-tail(0.05) | 2.01174048 |
(second part) What is the regression equation?
credit_risk_score=216.46-2.06x1-25.25x2-5.39x3+3.87x4+19.86x5+103.74x6
following regression analysis information has been generated using ms-excel
Regression Statistics | ||||||
Multiple R | 0.891660903 | |||||
R Square | 0.795059166 | |||||
Adjusted R Square | 0.765781904 | |||||
Standard Error | 134.3698582 | |||||
Observations | 49 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 6 | 2941873.257 | 490312.2 | 27.1562 | 5.73E-13 | |
Residual | 42 | 758320.8694 | 18055.26 | |||
Total | 48 | 3700194.126 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 216.4603585 | 121.8743419 | 1.776095 | 0.082962 | -29.492 | 462.4127 |
Age in years(x1) | -2.061511871 | 4.069594181 | -0.50656 | 0.61511 | -10.2743 | 6.151262 |
Years with current employer(x2) | -25.24501541 | 3.941245778 | -6.40534 | 1.04E-07 | -33.1988 | -17.2913 |
Years at current address(x3) | -5.391063761 | 4.066615102 | -1.32569 | 0.192109 | -13.5978 | 2.815698 |
Household income(x4) | 3.865940338 | 0.921752112 | 4.194121 | 0.000138 | 2.005769 | 5.726111 |
Debt to income ratio (x5) | 19.86416094 | 3.148379715 | 6.309328 | 1.43E-07 | 13.51047 | 26.21785 |
Previously Defaulted(x6) | 103.7410698 | 55.6767152 | 1.863276 | 0.069426 | -8.61909 | 216.1012 |
(third part) What is the predicted credit risk score of a customer aged x1= 40, who has been with their employer for x2=10 years, has lived at the same address for x3=3 years, has an income of x4=$200,000, debt to income ratio of x5=2.5 and has previously defaulted (x6=1) on a loan.
credit_risk_score=216.46-2.06*40-25.25*10-5.39*3+3.87*200+19.86*2.5+103.74*1=792.98
(fourth part) Compute the coefficient of determination (R2) and fully interpret its meaning
coefficient of determination (R2) =0.795
(fifth part)
Interpret the meaning of the coefficients of the independent variables income and address.
Which independent variables significant? Defend your answer.
Income is significant(p-value < 0.05) as its p-value is less than typical level of significance alpha=0.05 and address is not significant(p-value>0.05) p-value is greater than typical level of significance alpha=0.05