Question

In: Statistics and Probability

For data CIR, regress involact on race and interpret the coefficient. Test the hypothesis to determine...

For data CIR, regress involact on race and interpret the coefficient. Test the hypothesis to determine the claim that homeowners in zip codes with high percent minority are being denied insurance at higher rate than other zip codes. What can regression analysis tell you about the insurance companies claim that the discrepancy is due to greater risks in some zip codes?zip race fire theft age volact involact income
60626 10.0 6.2 29 60.4 5.3 0.0 11744
60640 22.2 9.5 44 76.5 3.1 0.1 9323
60613 19.6 10.5 36 73.5 4.8 1.2 9948
60657 17.3 7.7 37 66.9 5.7 0.5 10656
60614 24.5 8.6 53 81.4 5.9 0.7 9730
60610 54.0 34.1 68 52.6 4.0 0.3 8231
60611 4.9 11.0 75 42.6 7.9 0.0 21480
60625 7.1 6.9 18 78.5 6.9 0.0 11104
60618 5.3 7.3 31 90.1 7.6 0.4 10694
60647 21.5 15.1 25 89.8 3.1 1.1 9631
60622 43.1 29.1 34 82.7 1.3 1.9 7995
60631 1.1 2.2 14 40.2 14.3 0.0 13722
60646 1.0 5.7 11 27.9 12.1 0.0 16250
60656 1.7 2.0 11 7.7 10.9 0.0 13686
60630 1.6 2.5 22 63.8 10.7 0.0 12405
60634 1.5 3.0 17 51.2 13.8 0.0 12198
60641 1.8 5.4 27 85.1 8.9 0.0 11600
60635 1.0 2.2 9 44.4 11.5 0.0 12765
60639 2.5 7.2 29 84.2 8.5 0.2 11084
60651 13.4 15.1 30 89.8 5.2 0.8 10510
60644 59.8 16.5 40 72.7 2.7 0.8 9784
60624 94.4 18.4 32 72.9 1.2 1.8 7342
60612 86.2 36.2 41 63.1 0.8 1.8 6565
60607 50.2 39.7 147 83.0 5.2 0.9 7459
60623 74.2 18.5 22 78.3 1.8 1.9 8014
60608 55.5 23.3 29 79.0 2.1 1.5 8177
60616 62.3 12.2 46 48.0 3.4 0.6 8212
60632 4.4 5.6 23 71.5 8.0 0.3 11230
60609 46.2 21.8 4 73.1 2.6 1.3 8330
60653 99.7 21.6 31 65.0 0.5 0.9 5583
60615 73.5 9.0 39 75.4 2.7 0.4 8564
60638 10.7 3.6 15 20.8 9.1 0.0 12102
60629 1.5 5.0 32 61.8 11.6 0.0 11876
60636 48.8 28.6 27 78.1 4.0 1.4 9742
60621 98.9 17.4 32 68.6 1.7 2.2 7520
60637 90.6 11.3 34 73.4 1.9 0.8 7388
60652 1.4 3.4 17 2.0 12.9 0.0 13842
60620 71.2 11.9 46 57.0 4.8 0.9 11040
60619 94.1 10.5 42 55.9 6.6 0.9 10332
60649 66.1 10.7 43 67.5 3.1 0.4 10908
60617 36.4 10.8 34 58.0 7.8 0.9 11156
60655 1.0 4.8 19 15.2 13.0 0.0 13323
60643 42.5 10.4 25 40.8 10.2 0.5 12960
60628 35.1 15.6 28 57.8 7.5 1.0 11260
60627 47.4 7.0 3 11.4 7.7 0.2 10080
60633 34.0 7.1 23 49.2 11.6 0.3 11428
60645 3.1 4.9 27 46.6 10.9 0.0 13731

Solutions

Expert Solution

1. Regressing involact on race in Excel (Data -> Data Analysis -> Regression, choose race as X-column and involact as Y-column), we get the following regression output:

Coefficients Standard Error t Stat P-value
Intercept 0.12921803 0.096610589 1.337514 0.187776061
race 0.013882353 0.00203073 6.836139 1.78382E-08

The p-value of the coefficient of race is 1.7838e-08, which is << 0.05 (p-value cutoff for 95% confidence level). Hence, race is a very significant predictor of involact, the response variable in this regression model.

2. Now, to test the different values of involact (insurance renewal rates) for lower and higher minority%: Let's sort the data by race (minority%) and divide the data into 2 parts, the data with higher race values and that with lower race values. Then regressing for the lower race value data and higher race value data separately, we obtain the following output:

Lower race values:

Coefficients Standard Error t Stat P-value
Intercept 0.007449542 0.111657083 0.066718047 0.94743729
race 0.03693435 0.007752339 4.764284466 0.000104764
theft -0.002351253 0.00413882 -0.568097319 0.57599354

Higher race values:

Coefficients Standard Error t Stat P-value
Intercept 0.523885956 0.415699173 1.260252583 0.222084665
race 0.00938898 0.005680172 1.652939255 0.113955758
theft -0.002524967 0.004508608 -0.560032407 0.581676522

The higher coefficient of race for lower race values shows that involact (or insurance renewals) indeed increase at a lower rate (0.00938898<<0.03693435) for areas with higher race values (minority%). Hence the claim that homeowners in zip codes with high percent minority are being denied insurance at higher rate than other zip codes, seems true from the given data. Taking into account the higher p-value for Higher race values model, however, can tell a different story as it says race isn't a very good predictor of involact for higher race values.

3. Now, the p-value for theft is infact very high for both the above datasets, hence, theft seems to be a very poor predictor of the response variable, involact. Hence, the insurance companies claim that the discrepancy is due to greater risks in some zip codes, seems wrong.


Related Solutions

For data CIR, regress involact on race and interpret the coefficient.
For data CIR, regress involact on race and interpret the coefficient. Test the hypothesis to determine the claim that homeowners in zip codes with high percent minority are being denied insurance at higher rate than other zip codes. What can regression analysis tell you about the insurance companies claim that the discrepancy is due to greater risks in some zip codes?zip race fire theft age volact involact income60626 10.0 6.2 29 60.4 5.3 0.0 1174460640 22.2 9.5 44 76.5 3.1...
Conduct a 95% hypothesis test for your slope coefficient (a t-test). Interpret, in words and with...
Conduct a 95% hypothesis test for your slope coefficient (a t-test). Interpret, in words and with numerical evidence, the conclusion of that hypothesis test. Conduct a 95% hypothesis test for your model (a F-test). Interpret, in words and with numerical evidence, the conclusion of this hypothesis test. CO2 (ppm) Population (billions) 335.40 4.29 336.84 4.36 338.75 4.44 340.11 4.52 341.45 4.60 343.05 4.68 344.65 4.77 346.12 4.85 347.42 4.94 349.19 5.03 351.57 5.13 353.12 5.22 354.39 5.31 355.61 5.40 356.45...
Do all hypothesis testing steps and find the correlation coefficient for the following data to determine...
Do all hypothesis testing steps and find the correlation coefficient for the following data to determine if there is a significant correlation between the results of IQ test and Test A scores. (A higher score on either tests = a “better” score”) A.Also make a scatterplot (by hand) for this data. (IQ is X and Test Score is Y, also begin at 90 for X and 30 for Y) B.Calculate the regression equation for predicting scores on the Y’(Reading test)...
Determine and interpret the linear correlation coefficient, and use linear regression to find a best fit...
Determine and interpret the linear correlation coefficient, and use linear regression to find a best fit line for a scatter plot of the data and make predictions. Scenario According to the U.S. Geological Survey (USGS), the probability of a magnitude 6.7 or greater earthquake in the Greater Bay Area is 63%, about 2 out of 3, in the next 30 years. In April 2008, scientists and engineers released a new earthquake forecast for the State of California called the Uniform...
Homework Chapter 10: Compute the correlation coefficient State the hypothesis test the hypothesis at a=0.05. Use...
Homework Chapter 10: Compute the correlation coefficient State the hypothesis test the hypothesis at a=0.05. Use Table 1 Determine the regression line equation if r is significant Summarize the results Gestation x 105 285 151 238 112 Longevity y 5 15 8 41 10
Use the data from question 1. Conduct a hypothesis test at α = .01 to determine...
Use the data from question 1. Conduct a hypothesis test at α = .01 to determine if the population variance is greater than 904.75. Question 1: 1. Consider the following sampled data: s 2 = 906.304, n = 31. Calculate the following confidence intervals for the population variance: (a) 90% (b) 95% (c) 99%
How do we interpret the coefficient ?
How do we interpret the coefficient ?
The test scores dropped in 2009. Design and run a hypothesis test to determine if the...
The test scores dropped in 2009. Design and run a hypothesis test to determine if the decline represented a statistically significant decrease in the mean test score. Be sure to explain how you plan set up the test. You will need to decide what you will use as your population. Be sure to explain the result and implications of the hypothesis test. Test Data can be found here: https://docs.google.com/spreadsheets/d/e/2PACX-1vSdSY0cbi4CdW1509-B3HVVGL1t8AieqCPw8nroXMXup8q1Rll3huy8qiCzMphB-DsCOlJ6fNsxlzUO/pubhtml
1. Test the null hypothesis that the coefficient of x1 is zero versus the two-sided alternative....
1. Test the null hypothesis that the coefficient of x1 is zero versus the two-sided alternative. What is t? (report to 2 decimal places) n = 43 y = 1.6 + 6.4x1 + 5.7x2 SEb1 = 2.9 Can we reject the null hypothesis? 2. Test the null hypothesis that the coefficient of x1 is zero versus the two-sided alternative. What is t? (report to 2 decimal places) n = 25 y = 1.6 + 6.4x1 + 5.7x2 SEb1 = 3.3...
2. Provide a test of the null hypothesis that the population correlation coefficient for X1-Y1 sample...
2. Provide a test of the null hypothesis that the population correlation coefficient for X1-Y1 sample is zero against the alternative hypothesis that it is not zero. Set alpha=0.01. a. Test statisitc ==> b. Test critical value ==> c. Conclusion ==> X1 and Y1 data are in the google doc https://drive.google.com/file/d/1luXDuc_jl2CStByttmhZi3gZkiuwNzuG/view?usp=sharing
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT