Question

In: Statistics and Probability

Run a regression analysis and find a best model to predict White Speck count from cotton...

Run a regression analysis and find a best model to predict White Speck count from cotton fiber properties given to you. Make sure to show all your steps on how you came up with the best model. Word or text file.

 
Harvdate date of cotton Cotton fiber Length Cotton fiber Strength Short fiber content Cotton fineness Immature fiber content Cotton trash count Cotton dust count Cotton nep count y=White Specks
1 1.06 31.8 21.8 196 6.36 75 404 253 17.8
1 1.06 31.0 21.0 197 6.05 76 292 247 11.6
1 1.07 30.3 24.0 193 6.98 102 390 291 11.0
1 1.06 30.6 20.9 196 6.58 49 188 297 10.2
1 1.04 31.0 25.7 195 7.24 67 298 262 10.6
1 1.05 30.5 25.0 196 7.05 37 181 262 10.8
3 1.05 30.7 20.6 198 6.02 63 259 247 11.0
3 1.04 30.3 20.5 199 5.93 29 131 220 7.6
3 1.05 29.5 21.0 197 6.46 43 194 306 9.6
3 1.05 29.3 19.8 198 6.17 31 187 258 6.0
3 1.04 30.5 21.6 199 6.62 59 278 310 14.0
3 1.05 30.2 21.9 197 6.87 32 172 272 13.4
4 1.03 30.7 23.5 198 6.47 88 339 275 14.8
4 1.04 30.0 20.5 194 5.92 69 264 236 16.4
4 1.03 29.5 24.9 195 6.99 104 382 347 17.4
4 1.03 29.3 21.7 196 7.11 72 270 297 16.4
4 1.02 29.6 22.7 196 6.64 115 348 270 17.2
4 1.01 30.6 20.7 197 6.29 64 270 239 17.4
4 1.04 30.8 24.4 193 6.44 118 412 300 25.2
4 1.05 30.6 24.4 197 6.77 94 346 298 18.8
4 1.04 30.0 24.1 196 6.60 90 323 282 21.8
4 1.05 29.4 21.3 195 6.44 83 255 261 18.0
4 1.01 29.6 22.2 196 5.95 120 409 227 11.4
4 1.02 29.9 22.6 196 6.60 88 311 268 18.6
5 1.04 30.3 24.8 196 7.35 91 266 295 19.8
5 1.05 29.1 23.4 196 7.08 72 274 291 13.4
5 1.03 29.3 27.0 197 7.49 139 514 330 18.2
5 1.04 28.7 23.1 196 7.37 71 271 310 17.0
5 1.01 29.0 23.1 196 6.81 79 326 284 13.2
5 1.00 29.2 24.4 197 7.10 60 272 270 19.4
5 1.06 30.1 23.9 196 7.01 142 464 310 19.0
5 1.06 29.7 22.1 197 6.61 92 296 268 20.4
5 1.04 29.8 22.6 194 6.56 113 347 246 31.6
5 1.05 29.6 21.7 193 6.40 120 391 290 18.8
5 1.06 29.8 21.7 195 6.75 140 432 285 17.2
5 1.06 30.2 20.1 197 6.83 100 351 256 22.0
6 1.04 28.1 22.8 197 6.46 65 218 327 25.2
6 1.03 28.8 23.6 199 6.43 63 217 247 26.6
6 1.03 28.8 24.4 198 6.86 80 294 294 28.4
6 1.03 28.8 22.7 197 7.26 61 257 313 16.2
6 1.03 28.8 23.8 197 6.56 81 293 313 17.4
6 1.01 29.0 21.6 196 6.49 67 262 256 16.6
6 1.03 29.4 23.0 198 6.35 80 294 267 22.8
6 1.03 29.4 23.7 197 6.49 60 215 267 10.0
6 1.05 29.3 21.1 195 6.26 70 241 255 11.6
6 1.03 29.1 24.9 197 6.89 70 237 266 13.4
6 1.05 28.8 22.6 199 7.09 100 318 321 18.2
6 1.04 28.7 24.0 197 6.90 76 261 319 17.2
7 1.04 30.0 25.2 195 7.02 85 431 277 21.4
7 1.03 28.7 23.1 193 6.44 66 280 322 18.6
7 1.04 28.8 23.0 194 7.18 78 376 298 18.6
7 1.04 28.5 21.0 196 6.67 56 230 298 16.0
7 1.01 28.4 22.3 195 6.36 69 280 262 12.0
7 1.03 28.7 22.1 194 6.66 64 257 296 11.6
7 1.04 29.9 22.4 195 6.83 103 361 276 17.0
7 1.03 29.9 21.6 194 6.34 64 196 237 18.4
7 1.04 28.3 21.3 192 6.14 84 251 260 17.6
7 1.04 28.7 21.2 193 5.87 81 280 297 22.2
7 1.04 29.2 22.6 193 6.25 88 290 291 20.4
7 1.04 28.4 19.9 194 6.30 68 212 286 24.8

Solutions

Expert Solution

The regression output is:

0.358
Adjusted R² 0.242
R   0.598
Std. Error   4.455
n   60
k   9
Dep. Var. y=White Specks
ANOVA table
Source SS   df   MS F p-value
Regression 552.7251 9   61.4139 3.09 .0050
Residual 992.5443 50   19.8509
Total 1,545.2693 59  
Regression output confidence interval
variables coefficients std. error    t (df=50) p-value 95% lower 95% upper VIF
Intercept -70.6480
Harvdate date of cotton 1.5150 0.5459 2.775 .0077 0.4186 2.6114 2.846
Cotton fiber Length 12.3279 49.7865 0.248 .8054 -87.6711 112.3270 1.619
Cotton fiber Strength 1.2766 1.3843 0.922 .3609 -1.5038 4.0570 3.713
Short fiber content 0.4694 0.5532 0.849 .4001 -0.6417 1.5805 2.269
Cotton fineness 0.0948 0.3752 0.253 .8015 -0.6588 0.8485 1.224
Immature fiber content -0.9472 2.1974 -0.431 .6683 -5.3609 3.4664 2.241
Cotton trash count 0.1006 0.0530 1.900 .0632 -0.0057 0.2070 5.325
Cotton dust count -0.0156 0.0184 -0.845 .4021 -0.0526 0.0215 6.071
Cotton nep count 0.0124 0.0301 0.411 .6828 -0.0481 0.0728 2.066

Since VIF of all independent variables is less than 10, we keep all the variables in the regression.

Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).

We will remove Cotton fiber Length because it has the highest p-value = 0.8054.

Running the regression again, we get:

0.357
Adjusted R² 0.256
R   0.597
Std. Error   4.414
n   60
k   8
Dep. Var. y=White Specks
ANOVA table
Source SS   df   MS F p-value
Regression 551.5079 8   68.9385 3.54 .0025
Residual 993.7614 51   19.4855
Total 1,545.2693 59  
Regression output confidence interval
variables coefficients std. error    t (df=51) p-value 95% lower 95% upper VIF
Intercept -57.9993
Harvdate date of cotton 1.4954 0.5351 2.795 .0073 0.4212 2.5697 2.786
Cotton fiber Strength 1.3786 1.3094 1.053 .2974 -1.2501 4.0072 3.384
Short fiber content 0.4193 0.5101 0.822 .4149 -0.6047 1.4434 1.965
Cotton fineness 0.0803 0.3672 0.219 .8278 -0.6569 0.8174 1.194
Immature fiber content -0.8658 2.1526 -0.402 .6892 -5.1874 3.4557 2.191
Cotton trash count 0.1035 0.0512 2.021 .0486 0.0007 0.2063 5.073
Cotton dust count -0.0165 0.0179 -0.921 .3613 -0.0524 0.0195 5.830
Cotton nep count 0.0149 0.0280 0.533 .5966 -0.0413 0.0711 1.824

Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).

We will remove Cotton fineness because it has the highest p-value = 0.8278.

Running the regression again, we get:

0.356
Adjusted R² 0.270
R   0.597
Std. Error   4.374
n   60
k   7
Dep. Var. y=White Specks
ANOVA table
Source SS   df   MS F p-value
Regression 550.5762 7   78.6537 4.11 .0012
Residual 994.6931 52   19.1287
Total 1,545.2693 59  
Regression output confidence interval
variables coefficients std. error    t (df=52) p-value 95% lower 95% upper VIF
Intercept -42.7089
Harvdate date of cotton 1.4823 0.5268 2.814 .0069 0.4251 2.5395 2.751
Cotton fiber Strength 1.3847 1.2970 1.068 .2906 -1.2179 3.9874 3.383
Short fiber content 0.4239 0.5050 0.839 .4051 -0.5895 1.4372 1.962
Immature fiber content -0.7970 2.1099 -0.378 .7072 -5.0308 3.4368 2.144
Cotton trash count 0.1025 0.0506 2.028 .0477 0.0011 0.2040 5.035
Cotton dust count -0.0168 0.0177 -0.947 .3478 -0.0523 0.0187 5.802
Cotton nep count 0.0146 0.0277 0.528 .6000 -0.0410 0.0702 1.820

Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).

We will remove Immature fiber content because it has the highest p-value = 0.7072.

Running the regression again, we get:

Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).

We will remove the Cotton nep count because it has the highest p-value = 0.6754.

Running the regression again, we get:

Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).

We will remove the Cotton dust count because it has the highest p-value = 0.3527.

Running the regression again, we get:

​​​​​​​

Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).

We will remove the Short fiber content​​​​​​​ because it has the highest p-value = 0.4831.

Running the regression again, we get:

​​​​​​​

Now, let's do the backward elimination and remove the variable which is highly insignificant(high p-value).

We will remove the Cotton fiber Strength​​​​​​​ because it has the highest p-value = 0.4368.

Running the regression again, we get:

The final model is:

​​​​​​​
Please give me a thumbs-up if this helps you out. Thank you!


Related Solutions

Develop the best logistic regression model that can predict the wage by using the combination of...
Develop the best logistic regression model that can predict the wage by using the combination of any following variables: total unit (X2), constructed unit (X3), equipment used (X4), city location (X5) and total cost of a project (X6). Make sure that you partition your data with 60% training test, 40% validation test, and default seed of 12345 before running the logistic regression (15 points) Wage - X1 Total Unit - X2 Contracted Units - X3 Equipment Used - X4 City...
Run a regression analysis to predict mpg based on weight and horsepower. What is the model’s...
Run a regression analysis to predict mpg based on weight and horsepower. What is the model’s predictive power, interpretation of each of the regression coefficients and their confidence intervals and use it in prediction? MPG   Horsepower   Weight 43.1   48   1985 19.9   110   3365 19.2   105   3535 17.7   165   3445 18.1   139   3205 20.3   103   2830 21.5   115   3245 16.9   155   4360 15.5   142   4054 18.5   150   3940 27.2   71   3190 41.5   76   2144 46.6   65   2110 23.7   100   2420 27.2  ...
Use the following data to develop a multiple regression model to predict from and . Discuss...
Use the following data to develop a multiple regression model to predict from and . Discuss the output, including comments about the overall strength of the model, the significance of the regression coefficients, and other indicators of model fit. y x1 x2 198 29 1.64 214 71 2.81 211 54 2.22 219 73 2.70 184 67 1.57 167 32 1.63 201 47 1.99 204 43 2.14 190 60 2.04 222 32 2.93 197 34 2.15 Appendix A Statistical Tables *(Round...
Data was collected from 40 employees to develop a regression model to predict the employee’s annual...
Data was collected from 40 employees to develop a regression model to predict the employee’s annual salary using their years with the company (Years), their starting salary in thousands (Starting), and their Gender (Male = 0, Female = 1). The level of significance is .01. The results from Excel regression analysis are shown below: SUMMARY OUTPUT Regression Statistics Multiple R 0.718714957 R Square 0.516551189 Adjusted R Square 0.476263788 Standard Error 10615.63461 Observations 40 ANOVA Df SS MS F Significance F...
Multiple Regression: Must find a model that best fits the data: USING R 1. Test to...
Multiple Regression: Must find a model that best fits the data: USING R 1. Test to see if x1 and x2 are highly correlated using variance inflation factor technique. What can we conclude? Is Multicollinearity present? 2. Construct scatter plot in R to visualize relationship between y and each x. Dataset: Y= Time X1= School X2=District "School" "District" "Time" 1,3,4 2,6,7 18,9,24 4,10,114 9, 2, 16
A multiple regression model is to be constructed to predict the final exam score of a...
A multiple regression model is to be constructed to predict the final exam score of a university student doing a particular course based upon their mid-term exam score, the average number of hours spent studying per week and the average number of hours spent watching television per week. Data has been collected on 30 randomly selected individuals: hide data Download the data Final score Mid-term Score Hours studying per week Hours watching TV per week 76 85 19 34 60...
Please find at least one application of a multiple linear regression model in business analysis and...
Please find at least one application of a multiple linear regression model in business analysis and post your comments/thoughts and the web link of the information source,
A multiple regression model is to be constructed to predict the heart rate in beats per...
A multiple regression model is to be constructed to predict the heart rate in beats per minute (bpm) of a person based upon their age, weight and height. Data has been collected on 30 randomly selected individuals: hide data Heart Rate (bpm) Age (yrs) Weight (lb) Height (in) 78 23 245 70 91 44 223 68 79 42 178 67 60 33 200 58 57 25 99 68 59 35 123 64 78 30 204 62 98 56 200 63...
A) produce a regression equation to predict the selling prie for residences using a model of...
A) produce a regression equation to predict the selling prie for residences using a model of the following form: y1=B0 + B1 x1 + b2 X2 + e B) Interpert the parameters B1 and B2 in the model given in part a C) Produce an equation that describes the relationship between th selling price and the square footage of (1) condos and (2) single-family homes D) conduct a hypothesis test to determine if the relationship between the selling price and...
Use Excel to develop a regression model for the Hospital Database to predict the number of...
Use Excel to develop a regression model for the Hospital Database to predict the number of Personnel by the number of Births. How many residuals are within 1 standard error? Write your answer as a whole number. Personnel Births 792 312 1762 1077 2310 1027 328 355 181 168 1077 3810 742 735 131 1 1594 1733 233 257 241 169 203 430 325 0 676 2049 347 211 79 16 505 2648 1543 2450 755 1465 959 0 325...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT