In: Statistics and Probability
9.13
Using the SHHS data in Table 2.10,fit all possible multiple regression models (without interactions) that predict the y variable serum total cholesterol from diastolic blood pressure,systolic blood pressure,alcohol,carbon monoxide and cotinine. Scrutinize your results to understand how the x variables act in conjuction.For these data,which is the "best " multiple regression model for cholesterol? What percentage of variation does it explain?
Serum total cholesrerol (mmol/l) | Diastolic blood pressure (mmHg) | Systolic blood pressure (mmHg) | Alcohol (g/day) | Cigarettes (no./day) | Carbon monoxide(ppm) | Cotinine (ng/ml) | CHD (1=yes,2=no) |
5.75 | 80 | 121 | 5.4 | 0 | 6 | 13 | 2 |
6.76 | 83 | 139 | 64.6 | 0 | 4 | 3 | 2 |
6.47 | 76 | 113 | 21.5 | 20 | 21 | 284 | 2 |
7.11 | 79 | 124 | 8.2 | 40 | 57 | 395 | 2 |
5.42 | 100 | 127 | 24.4 | 20 | 29 | 283 | 2 |
7.04 | 79 | 148 | 13.6 | 0 | 3 | 0 | 2 |
5.75 | 79 | 124 | 54.6 | 0 | 3 | 1 | 2 |
7.14 | 100 | 127 | 6.2 | 0 | 1 | 0 | 2 |
6.1 | 79 | 138 | 0 | 0 | 1 | 3 | 2 |
6.55 | 85 | 133 | 2.4 | 0 | 2 | 0 | 2 |
6.29 | 92 | 141 | 0 | 0 | 7 | 0 | 2 |
5.98 | 100 | 183 | 21.5 | 20 | 55 | 245 | 1 |
5.71 | 78 | 119 | 50.2 | 0 | 14 | 424 | 2 |
6.89 | 90 | 143 | 16.7 | 0 | 4 | 0 | 1 |
4.9 | 85 | 132 | 40.6 | 4 | 7 | 82 | 2 |
6.23 | 88 | 139 | 16.7 | 25 | 24 | 324 | 2 |
7.71 | 109 | 154 | 7.2 | 1 | 3 | 11 | 1 |
5.73 | 93 | 136 | 10.8 | 0 | 2 | 0 | 1 |
6.54 | 100 | 149 | 26 | 0 | 3 | 0 | 2 |
7.16 | 73 | 107 | 2.9 | 25 | 29 | 315 | 1 |
6.13 | 92 | 132 | 23.9 | 0 | 2 | 2 | 2 |
6.25 | 87 | 123 | 31.1 | 0 | 7 | 10 | 2 |
5.19 | 97 | 141 | 12 | 0 | 3 | 4 | 1 |
6.05 | 74 | 118 | 23.9 | 0 | 3 | 0 | 2 |
7.12 | 85 | 133 | 24.4 | 0 | 2 | 0 | 2 |
5.71 | 88 | 121 | 45.4 | 0 | 8 | 2 | 2 |
6.19 | 69 | 129 | 24.8 | 15 | 40 | 367 | 1 |
6.73 | 98 | 129 | 52.6 | 15 | 21 | 233 | 2 |
5.34 | 70 | 123 | 38.3 | 1 | 2 | 7 | 2 |
4.79 | 82 | 127 | 23.9 | 0 | 2 | 1 | 2 |
6.78 | 74 | 104 | 4.8 | 0 | 4 | 7 | 2 |
6.1 | 88 | 123 | 86.1 | 0 | 3 | 1 | 1 |
4.35 | 88 | 128 | 15.5 | 20 | 11 | 554 | 2 |
7.1 | 79 | 136 | 7.4 | 10 | 9 | 189 | 1 |
5.85 | 102 | 150 | 4.1 | 0 | 6 | 0 | 2 |
6.74 | 68 | 109 | 1.2 | 15 | 15 | 230 | 2 |
7.55 | 80 | 135 | 92.1 | 25 | 29 | 472 | 2 |
7.86 | 78 | 131 | 23.9 | 6 | 55 | 407 | 1 |
6.92 | 101 | 137 | 2.5 | 0 | 3 | 0 | 2 |
6.64 | 97 | 139 | 119.6 | 40 | 16 | 298 | 2 |
6.46 | 76 | 142 | 62.2 | 40 | 31 | 404 | 1 |
5.99 | 73 | 108 | 0 | 0 | 2 | 4 | 2 |
5.39 | 77 | 112 | 11 | 30 | 11 | 251 | 2 |
6.35 | 81 | 133 | 16.2 | 0 | 3 | 0 | 2 |
5.86 | 88 | 147 | 88.5 | 0 | 3 | 0 | 2 |
5.64 | 65 | 111 | 0 | 20 | 16 | 271 | 2 |
6.6 | 102 | 149 | 65.8 | 0 | 3 | 1 | 2 |
6.76 | 75 | 140 | 12.4 | 0 | 2 | 0 | 2 |
5.51 | 75 | 125 | 0 | 25 | 16 | 441 | 2 |
7.15 | 92 | 131 | 31.1 | 20 | 36 | 434 | 1 |
Here I try to fit a multiple linear regression model using spss. I have seen that none of the variables are significant. Due to this the R squared value is very low . Only 14.5% of the total variation can be explained by the proposed model. I am attatching the model summary here.
But the model satisfies all the assumptions for the multiple linear regression model. I attaching that also
From the table it is clear that there is no multicollinearity in the data.
the residuals are normal also.