In: Statistics and Probability
Run a regression analysis and find a best model to predict White Speck count from cotton fiber properties given to you. Make sure to show all your steps on how you came up with the best model. Word or text file.
Harvdate date of cotton | Cotton fiber Length | Cotton fiber Strength | Short fiber content | Cotton fineness | Immature fiber content | Cotton trash count | Cotton dust count | Cotton nep count | y=White Specks |
1 | 1.06 | 31.8 | 21.8 | 196 | 6.36 | 75 | 404 | 253 | 17.8 |
1 | 1.06 | 31.0 | 21.0 | 197 | 6.05 | 76 | 292 | 247 | 11.6 |
1 | 1.07 | 30.3 | 24.0 | 193 | 6.98 | 102 | 390 | 291 | 11.0 |
1 | 1.06 | 30.6 | 20.9 | 196 | 6.58 | 49 | 188 | 297 | 10.2 |
1 | 1.04 | 31.0 | 25.7 | 195 | 7.24 | 67 | 298 | 262 | 10.6 |
1 | 1.05 | 30.5 | 25.0 | 196 | 7.05 | 37 | 181 | 262 | 10.8 |
3 | 1.05 | 30.7 | 20.6 | 198 | 6.02 | 63 | 259 | 247 | 11.0 |
3 | 1.04 | 30.3 | 20.5 | 199 | 5.93 | 29 | 131 | 220 | 7.6 |
3 | 1.05 | 29.5 | 21.0 | 197 | 6.46 | 43 | 194 | 306 | 9.6 |
3 | 1.05 | 29.3 | 19.8 | 198 | 6.17 | 31 | 187 | 258 | 6.0 |
3 | 1.04 | 30.5 | 21.6 | 199 | 6.62 | 59 | 278 | 310 | 14.0 |
3 | 1.05 | 30.2 | 21.9 | 197 | 6.87 | 32 | 172 | 272 | 13.4 |
4 | 1.03 | 30.7 | 23.5 | 198 | 6.47 | 88 | 339 | 275 | 14.8 |
4 | 1.04 | 30.0 | 20.5 | 194 | 5.92 | 69 | 264 | 236 | 16.4 |
4 | 1.03 | 29.5 | 24.9 | 195 | 6.99 | 104 | 382 | 347 | 17.4 |
4 | 1.03 | 29.3 | 21.7 | 196 | 7.11 | 72 | 270 | 297 | 16.4 |
4 | 1.02 | 29.6 | 22.7 | 196 | 6.64 | 115 | 348 | 270 | 17.2 |
4 | 1.01 | 30.6 | 20.7 | 197 | 6.29 | 64 | 270 | 239 | 17.4 |
4 | 1.04 | 30.8 | 24.4 | 193 | 6.44 | 118 | 412 | 300 | 25.2 |
4 | 1.05 | 30.6 | 24.4 | 197 | 6.77 | 94 | 346 | 298 | 18.8 |
4 | 1.04 | 30.0 | 24.1 | 196 | 6.60 | 90 | 323 | 282 | 21.8 |
4 | 1.05 | 29.4 | 21.3 | 195 | 6.44 | 83 | 255 | 261 | 18.0 |
4 | 1.01 | 29.6 | 22.2 | 196 | 5.95 | 120 | 409 | 227 | 11.4 |
4 | 1.02 | 29.9 | 22.6 | 196 | 6.60 | 88 | 311 | 268 | 18.6 |
5 | 1.04 | 30.3 | 24.8 | 196 | 7.35 | 91 | 266 | 295 | 19.8 |
5 | 1.05 | 29.1 | 23.4 | 196 | 7.08 | 72 | 274 | 291 | 13.4 |
5 | 1.03 | 29.3 | 27.0 | 197 | 7.49 | 139 | 514 | 330 | 18.2 |
5 | 1.04 | 28.7 | 23.1 | 196 | 7.37 | 71 | 271 | 310 | 17.0 |
5 | 1.01 | 29.0 | 23.1 | 196 | 6.81 | 79 | 326 | 284 | 13.2 |
5 | 1.00 | 29.2 | 24.4 | 197 | 7.10 | 60 | 272 | 270 | 19.4 |
5 | 1.06 | 30.1 | 23.9 | 196 | 7.01 | 142 | 464 | 310 | 19.0 |
5 | 1.06 | 29.7 | 22.1 | 197 | 6.61 | 92 | 296 | 268 | 20.4 |
5 | 1.04 | 29.8 | 22.6 | 194 | 6.56 | 113 | 347 | 246 | 31.6 |
5 | 1.05 | 29.6 | 21.7 | 193 | 6.40 | 120 | 391 | 290 | 18.8 |
5 | 1.06 | 29.8 | 21.7 | 195 | 6.75 | 140 | 432 | 285 | 17.2 |
5 | 1.06 | 30.2 | 20.1 | 197 | 6.83 | 100 | 351 | 256 | 22.0 |
6 | 1.04 | 28.1 | 22.8 | 197 | 6.46 | 65 | 218 | 327 | 25.2 |
6 | 1.03 | 28.8 | 23.6 | 199 | 6.43 | 63 | 217 | 247 | 26.6 |
6 | 1.03 | 28.8 | 24.4 | 198 | 6.86 | 80 | 294 | 294 | 28.4 |
6 | 1.03 | 28.8 | 22.7 | 197 | 7.26 | 61 | 257 | 313 | 16.2 |
6 | 1.03 | 28.8 | 23.8 | 197 | 6.56 | 81 | 293 | 313 | 17.4 |
6 | 1.01 | 29.0 | 21.6 | 196 | 6.49 | 67 | 262 | 256 | 16.6 |
6 | 1.03 | 29.4 | 23.0 | 198 | 6.35 | 80 | 294 | 267 | 22.8 |
6 | 1.03 | 29.4 | 23.7 | 197 | 6.49 | 60 | 215 | 267 | 10.0 |
6 | 1.05 | 29.3 | 21.1 | 195 | 6.26 | 70 | 241 | 255 | 11.6 |
6 | 1.03 | 29.1 | 24.9 | 197 | 6.89 | 70 | 237 | 266 | 13.4 |
6 | 1.05 | 28.8 | 22.6 | 199 | 7.09 | 100 | 318 | 321 | 18.2 |
6 | 1.04 | 28.7 | 24.0 | 197 | 6.90 | 76 | 261 | 319 | 17.2 |
7 | 1.04 | 30.0 | 25.2 | 195 | 7.02 | 85 | 431 | 277 | 21.4 |
7 | 1.03 | 28.7 | 23.1 | 193 | 6.44 | 66 | 280 | 322 | 18.6 |
7 | 1.04 | 28.8 | 23.0 | 194 | 7.18 | 78 | 376 | 298 | 18.6 |
7 | 1.04 | 28.5 | 21.0 | 196 | 6.67 | 56 | 230 | 298 | 16.0 |
7 | 1.01 | 28.4 | 22.3 | 195 | 6.36 | 69 | 280 | 262 | 12.0 |
7 | 1.03 | 28.7 | 22.1 | 194 | 6.66 | 64 | 257 | 296 | 11.6 |
7 | 1.04 | 29.9 | 22.4 | 195 | 6.83 | 103 | 361 | 276 | 17.0 |
7 | 1.03 | 29.9 | 21.6 | 194 | 6.34 | 64 | 196 | 237 | 18.4 |
7 | 1.04 | 28.3 | 21.3 | 192 | 6.14 | 84 | 251 | 260 | 17.6 |
7 | 1.04 | 28.7 | 21.2 | 193 | 5.87 | 81 | 280 | 297 | 22.2 |
7 | 1.04 | 29.2 | 22.6 | 193 | 6.25 | 88 | 290 | 291 | 20.4 |
7 | 1.04 | 28.4 | 19.9 | 194 | 6.30 | 68 | 212 | 286 | 24.8 |
The regression output is:
R² | 0.358 | ||||||
Adjusted R² | 0.242 | ||||||
R | 0.598 | ||||||
Std. Error | 4.455 | ||||||
n | 60 | ||||||
k | 9 | ||||||
Dep. Var. | y=White Specks | ||||||
ANOVA table | |||||||
Source | SS | df | MS | F | p-value | ||
Regression | 552.7251 | 9 | 61.4139 | 3.09 | .0050 | ||
Residual | 992.5443 | 50 | 19.8509 | ||||
Total | 1,545.2693 | 59 | |||||
Regression output | confidence interval | ||||||
variables | coefficients | std. error | t (df=50) | p-value | 95% lower | 95% upper | VIF |
Intercept | -70.6480 | ||||||
Harvdate date of cotton | 1.5150 | 0.5459 | 2.775 | .0077 | 0.4186 | 2.6114 | 2.846 |
Cotton fiber Length | 12.3279 | 49.7865 | 0.248 | .8054 | -87.6711 | 112.3270 | 1.619 |
Cotton fiber Strength | 1.2766 | 1.3843 | 0.922 | .3609 | -1.5038 | 4.0570 | 3.713 |
Short fiber content | 0.4694 | 0.5532 | 0.849 | .4001 | -0.6417 | 1.5805 | 2.269 |
Cotton fineness | 0.0948 | 0.3752 | 0.253 | .8015 | -0.6588 | 0.8485 | 1.224 |
Immature fiber content | -0.9472 | 2.1974 | -0.431 | .6683 | -5.3609 | 3.4664 | 2.241 |
Cotton trash count | 0.1006 | 0.0530 | 1.900 | .0632 | -0.0057 | 0.2070 | 5.325 |
Cotton dust count | -0.0156 | 0.0184 | -0.845 | .4021 | -0.0526 | 0.0215 | 6.071 |
Cotton nep count | 0.0124 | 0.0301 | 0.411 | .6828 | -0.0481 | 0.0728 | 2.066 |
Since VIF of all independent variables is less than 10, we keep all the variables in the regression.
Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).
We will remove Cotton fiber Length because it has the highest p-value = 0.8054.
Running the regression again, we get:
R² | 0.357 | ||||||
Adjusted R² | 0.256 | ||||||
R | 0.597 | ||||||
Std. Error | 4.414 | ||||||
n | 60 | ||||||
k | 8 | ||||||
Dep. Var. | y=White Specks | ||||||
ANOVA table | |||||||
Source | SS | df | MS | F | p-value | ||
Regression | 551.5079 | 8 | 68.9385 | 3.54 | .0025 | ||
Residual | 993.7614 | 51 | 19.4855 | ||||
Total | 1,545.2693 | 59 | |||||
Regression output | confidence interval | ||||||
variables | coefficients | std. error | t (df=51) | p-value | 95% lower | 95% upper | VIF |
Intercept | -57.9993 | ||||||
Harvdate date of cotton | 1.4954 | 0.5351 | 2.795 | .0073 | 0.4212 | 2.5697 | 2.786 |
Cotton fiber Strength | 1.3786 | 1.3094 | 1.053 | .2974 | -1.2501 | 4.0072 | 3.384 |
Short fiber content | 0.4193 | 0.5101 | 0.822 | .4149 | -0.6047 | 1.4434 | 1.965 |
Cotton fineness | 0.0803 | 0.3672 | 0.219 | .8278 | -0.6569 | 0.8174 | 1.194 |
Immature fiber content | -0.8658 | 2.1526 | -0.402 | .6892 | -5.1874 | 3.4557 | 2.191 |
Cotton trash count | 0.1035 | 0.0512 | 2.021 | .0486 | 0.0007 | 0.2063 | 5.073 |
Cotton dust count | -0.0165 | 0.0179 | -0.921 | .3613 | -0.0524 | 0.0195 | 5.830 |
Cotton nep count | 0.0149 | 0.0280 | 0.533 | .5966 | -0.0413 | 0.0711 | 1.824 |
Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).
We will remove Cotton fineness because it has the highest p-value = 0.8278.
Running the regression again, we get:
R² | 0.356 | ||||||
Adjusted R² | 0.270 | ||||||
R | 0.597 | ||||||
Std. Error | 4.374 | ||||||
n | 60 | ||||||
k | 7 | ||||||
Dep. Var. | y=White Specks | ||||||
ANOVA table | |||||||
Source | SS | df | MS | F | p-value | ||
Regression | 550.5762 | 7 | 78.6537 | 4.11 | .0012 | ||
Residual | 994.6931 | 52 | 19.1287 | ||||
Total | 1,545.2693 | 59 | |||||
Regression output | confidence interval | ||||||
variables | coefficients | std. error | t (df=52) | p-value | 95% lower | 95% upper | VIF |
Intercept | -42.7089 | ||||||
Harvdate date of cotton | 1.4823 | 0.5268 | 2.814 | .0069 | 0.4251 | 2.5395 | 2.751 |
Cotton fiber Strength | 1.3847 | 1.2970 | 1.068 | .2906 | -1.2179 | 3.9874 | 3.383 |
Short fiber content | 0.4239 | 0.5050 | 0.839 | .4051 | -0.5895 | 1.4372 | 1.962 |
Immature fiber content | -0.7970 | 2.1099 | -0.378 | .7072 | -5.0308 | 3.4368 | 2.144 |
Cotton trash count | 0.1025 | 0.0506 | 2.028 | .0477 | 0.0011 | 0.2040 | 5.035 |
Cotton dust count | -0.0168 | 0.0177 | -0.947 | .3478 | -0.0523 | 0.0187 | 5.802 |
Cotton nep count | 0.0146 | 0.0277 | 0.528 | .6000 | -0.0410 | 0.0702 | 1.820 |
Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).
We will remove Immature fiber content because it has the highest p-value = 0.7072.
Running the regression again, we get:
Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).
We will remove the Cotton nep count because it has the highest p-value = 0.6754.
Running the regression again, we get:
Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).
We will remove the Cotton dust count because it has the highest p-value = 0.3527.
Running the regression again, we get:
Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).
We will remove the Short fiber content because it has the highest p-value = 0.4831.
Running the regression again, we get:
Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).
We will remove the Cotton fiber Strength because it has the highest p-value = 0.4368.
Running the regression again, we get:
The final model is:
Please give me a thumbs-up if this helps you out. Thank you!