In: Statistics and Probability
For data CIR, regress involact on race and interpret the coefficient. Test the hypothesis to determine the claim that homeowners in zip codes with high percent minority are being denied insurance at higher rate than other zip codes. What can regression analysis tell you about the insurance companies claim that the discrepancy is due to greater risks in some zip codes?zip race fire theft age volact involact income
60626 10.0 6.2 29 60.4 5.3 0.0 11744
60640 22.2 9.5 44 76.5 3.1 0.1 9323
60613 19.6 10.5 36 73.5 4.8 1.2 9948
60657 17.3 7.7 37 66.9 5.7 0.5 10656
60614 24.5 8.6 53 81.4 5.9 0.7 9730
60610 54.0 34.1 68 52.6 4.0 0.3 8231
60611 4.9 11.0 75 42.6 7.9 0.0 21480
60625 7.1 6.9 18 78.5 6.9 0.0 11104
60618 5.3 7.3 31 90.1 7.6 0.4 10694
60647 21.5 15.1 25 89.8 3.1 1.1 9631
60622 43.1 29.1 34 82.7 1.3 1.9 7995
60631 1.1 2.2 14 40.2 14.3 0.0 13722
60646 1.0 5.7 11 27.9 12.1 0.0 16250
60656 1.7 2.0 11 7.7 10.9 0.0 13686
60630 1.6 2.5 22 63.8 10.7 0.0 12405
60634 1.5 3.0 17 51.2 13.8 0.0 12198
60641 1.8 5.4 27 85.1 8.9 0.0 11600
60635 1.0 2.2 9 44.4 11.5 0.0 12765
60639 2.5 7.2 29 84.2 8.5 0.2 11084
60651 13.4 15.1 30 89.8 5.2 0.8 10510
60644 59.8 16.5 40 72.7 2.7 0.8 9784
60624 94.4 18.4 32 72.9 1.2 1.8 7342
60612 86.2 36.2 41 63.1 0.8 1.8 6565
60607 50.2 39.7 147 83.0 5.2 0.9 7459
60623 74.2 18.5 22 78.3 1.8 1.9 8014
60608 55.5 23.3 29 79.0 2.1 1.5 8177
60616 62.3 12.2 46 48.0 3.4 0.6 8212
60632 4.4 5.6 23 71.5 8.0 0.3 11230
60609 46.2 21.8 4 73.1 2.6 1.3 8330
60653 99.7 21.6 31 65.0 0.5 0.9 5583
60615 73.5 9.0 39 75.4 2.7 0.4 8564
60638 10.7 3.6 15 20.8 9.1 0.0 12102
60629 1.5 5.0 32 61.8 11.6 0.0 11876
60636 48.8 28.6 27 78.1 4.0 1.4 9742
60621 98.9 17.4 32 68.6 1.7 2.2 7520
60637 90.6 11.3 34 73.4 1.9 0.8 7388
60652 1.4 3.4 17 2.0 12.9 0.0 13842
60620 71.2 11.9 46 57.0 4.8 0.9 11040
60619 94.1 10.5 42 55.9 6.6 0.9 10332
60649 66.1 10.7 43 67.5 3.1 0.4 10908
60617 36.4 10.8 34 58.0 7.8 0.9 11156
60655 1.0 4.8 19 15.2 13.0 0.0 13323
60643 42.5 10.4 25 40.8 10.2 0.5 12960
60628 35.1 15.6 28 57.8 7.5 1.0 11260
60627 47.4 7.0 3 11.4 7.7 0.2 10080
60633 34.0 7.1 23 49.2 11.6 0.3 11428
60645 3.1 4.9 27 46.6 10.9 0.0 13731
1. Regressing involact on race in Excel (Data -> Data Analysis -> Regression, choose race as X-column and involact as Y-column), we get the following regression output:
Coefficients | Standard Error | t Stat | P-value | |
Intercept | 0.12921803 | 0.096610589 | 1.337514 | 0.187776061 |
race | 0.013882353 | 0.00203073 | 6.836139 | 1.78382E-08 |
The p-value of the coefficient of race is 1.7838e-08, which is << 0.05 (p-value cutoff for 95% confidence level). Hence, race is a very significant predictor of involact, the response variable in this regression model.
2. Now, to test the different values of involact (insurance renewal rates) for lower and higher minority%: Let's sort the data by race (minority%) and divide the data into 2 parts, the data with higher race values and that with lower race values. Then regressing for the lower race value data and higher race value data separately, we obtain the following output:
Lower race values:
Coefficients | Standard Error | t Stat | P-value | |
Intercept | 0.007449542 | 0.111657083 | 0.066718047 | 0.94743729 |
race | 0.03693435 | 0.007752339 | 4.764284466 | 0.000104764 |
theft | -0.002351253 | 0.00413882 | -0.568097319 | 0.57599354 |
Higher race values:
Coefficients | Standard Error | t Stat | P-value | |
Intercept | 0.523885956 | 0.415699173 | 1.260252583 | 0.222084665 |
race | 0.00938898 | 0.005680172 | 1.652939255 | 0.113955758 |
theft | -0.002524967 | 0.004508608 | -0.560032407 | 0.581676522 |
The higher coefficient of race for lower race values shows that involact (or insurance renewals) indeed increase at a lower rate (0.00938898<<0.03693435) for areas with higher race values (minority%). Hence the claim that homeowners in zip codes with high percent minority are being denied insurance at higher rate than other zip codes, seems true from the given data. Taking into account the higher p-value for Higher race values model, however, can tell a different story as it says race isn't a very good predictor of involact for higher race values.
3. Now, the p-value for theft is infact very high for both the above datasets, hence, theft seems to be a very poor predictor of the response variable, involact. Hence, the insurance companies claim that the discrepancy is due to greater risks in some zip codes, seems wrong.