In: Statistics and Probability
Build a simple linear regression for (1) all 50 states, (2) Eastern Time zone states, (3) Central Time zone states, (4) Mountain Time zone states, and (5) Pacific, Alaska, and Hawaii Time zone states. Compare your results in all five parts and state your judgements. You may use charts and tables in the comparison. Your answers should have values for the coefficient of determination, AOV table, significance levels, residual plots, and the regression fit with their interpretations.
Data source: Kaiser Family Foundation, 4/20/2020, 5:38PM. (ET = eastern time, CT = central time, MT = mountain time, PT = Pacific time). Some states have a mix of two different time zones which I ignored here).
States |
Time zone |
X = Number of COVID-19 Cases |
Y = Deaths from COVID-19 |
Alabama |
CT |
5,041 |
169 |
Alaska |
PT |
321 |
9 |
Arizona |
MT |
5,068 |
191 |
Arkansas |
CT |
1,923 |
41 |
California |
PT |
33,404 |
1205 |
Colorado |
MT |
9,730 |
420 |
Connecticut |
ET |
19,830 |
1331 |
Delaware |
ET |
2,745 |
72 |
District of Columbia |
ET |
2,927 |
105 |
Florida |
ET |
26,660 |
789 |
Georgia |
ET |
18,947 |
733 |
Hawaii |
PT |
580 |
10 |
Idaho |
MT |
1,672 |
45 |
Illinois |
CT |
31,513 |
1349 |
Indiana |
ET |
11,686 |
569 |
Iowa |
CT |
3,159 |
79 |
Kansas |
CT |
2,043 |
101 |
Kentucky |
ET |
2,960 |
148 |
Louisiana |
CT |
24,523 |
1328 |
Maine |
ET |
875 |
35 |
Maryland |
ET |
13,684 |
465 |
Massachusetts |
ET |
38,077 |
1706 |
Michigan |
ET |
32,000 |
2468 |
Minnesota |
CT |
2,470 |
143 |
Mississippi |
CT |
4,512 |
169 |
Missouri |
CT |
5,889 |
200 |
Montana |
MT |
433 |
10 |
Nebraska |
CT |
1,511 |
28 |
Nevada |
PT |
3,830 |
159 |
New Hampshire |
ET |
1,390 |
41 |
New Jersey |
ET |
88,722 |
4496 |
New Mexico |
MT |
1,845 |
55 |
New York |
ET |
252,595 |
18611 |
North Carolina |
ET |
6,842 |
202 |
North Dakota |
CT |
627 |
9 |
Ohio |
ET |
12,919 |
509 |
Oklahoma |
CT |
2,680 |
143 |
Oregon |
PT |
1,957 |
75 |
Pennsylvania |
ET |
33,914 |
1348 |
Rhode Island |
ET |
5,090 |
155 |
South Carolina |
ET |
4,446 |
123 |
South Dakota |
CT |
1,685 |
7 |
Tennessee |
ET |
7,238 |
152 |
Texas |
CT |
19,751 |
507 |
Utah |
MT |
3,213 |
27 |
Vermont |
ET |
816 |
38 |
Virginia |
ET |
8,984 |
300 |
Washington |
PT |
12,111 |
643 |
West Virginia |
WV |
902 |
24 |
Wisconsin |
CT |
4,499 |
230 |
Wyoming |
MT |
313 |
2 |
Solution:
Excel Output
1.
In the above table, our regression model is,
Y ( Death from COVID-19) = Intercept ( B0) + (B1) * Number of COVID-19
Y ( Death from COVID-19) = -162.097 + 0.071 * number of COVID-19
2.
In the above result the P-value(0.000) < 0.05, then we reject the null hypothesis that means they stiatisticaly insignificant.
3.
Here the value of the coefficient of determination( R-square) is 0.92.
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.
R-squared is always between 0 and 100%:
here the value of R-square is near to 100% that means the line fit data well.
4.
Residual plot:
In this plot (on the right) each point is one day, where the prediction made by the model is on the x-axis, and the accuracy of the prediction is on the y-axis. The distance from the line at 0 is how bad the prediction was for that value.
Since
Residual = Observed – Predicted
positive values for the residual (on the y-axis) mean the prediction was too low, and negative values mean the prediction was too high; 0 means the guess was exactly correct.
In a simple model like this, with only two variables, you can get a sense of how accurate the model is just by relating Number of COVID-19 cases to death cases of COVID-19. Here’s the regression run where the model is very accurate.
5.
Line Fit Plot:
The value of R-squared is 0.92 which indicates the line of the data is fit well.
In the line fit plot the maximum number of points are near to line that means our line is fit well for these data.
Let me know in the comment section if anything is not
clear. I will reply ASAP!
If you liked the answer, please give an upvote. This will be quite
encouraging for me.Thank-you!