In: Statistics and Probability
Question 2 Situation: A think tank wants to examine recent assertions about whether lobster consumption and number of college enrollments predict coastal Western Hemisphere countries’ total GDP. Annual data were collected for the last 40 years. Lobster consumption was measured in 10,000,000 pound units of lobster. Number of college enrollments was measured in 10,000,000 individuals units enrolled in at least one college or university course per the fall and spring semesters for a minimum of two courses per year. GDP was measured in nominal percentage growth. First, run descriptive statistics. Next, run a multiple linear regression where lobster consumption and college enrollments are the predictors (independent variables) and GDP is the dependent variable. Alpha = .05. Are the data normally distributed? Do lobster consumption and college enrollments predict GDP? Report and interpret (explain) your results including the skewness and kurtosis statistics for each variable, overall regression p-value, r-square, predictor coefficients and their p-values, and sample size. Be sure to explain the overall contribution of the model in explaining change in the dependent variable as well as the individual predictor contribution to change in the dependent variable. Based on your results, would you make any recommendations about additional analysis (65% of assignment points).
GDP | Lobster | Enrollments | Lobster (00,000,000) | Enrollments (00,000,000) |
1.18 | 300,000,000 | 13,761,737 | 30.0000 | 1.3762 |
2.28 | 168,005,852 | 35,000,000 | 16.8006 | 3.5000 |
2.37 | 300,000,000 | 42,000,000 | 30.0000 | 4.2000 |
2.37 | 171,886,564 | 38,811,068 | 17.1887 | 3.8811 |
5.85 | 300,000,000 | 56,016,072 | 30.0000 | 5.6016 |
-0.57 | 100,000,000 | 10,000,000 | 10.0000 | 1.0000 |
4.47 | 300,000,000 | 55,216,617 | 30.0000 | 5.5217 |
1.24 | 75,000,000 | 30,000,000 | 7.5000 | 3.0000 |
2.97 | 235,304,976 | 45,000,000 | 23.5305 | 4.5000 |
2.36 | 202,998,799 | 25,000,000 | 20.2999 | 2.5000 |
2.99 | 231,600,131 | 30,000,000 | 23.1600 | 3.0000 |
2.44 | 173,174,519 | 23,297,388 | 17.3175 | 2.3297 |
0.72 | 150,000,000 | 14,691,020 | 15.0000 | 1.4691 |
0.56 | 100,000,000 | 32,000,000 | 10.0000 | 3.2000 |
3.05 | 263,823,576 | 41,360,064 | 26.3824 | 4.1360 |
2.36 | 150,378,241 | 40,000,000 | 15.0378 | 4.0000 |
3.12 | 270,730,403 | 42,440,763 | 27.0730 | 4.2441 |
-1.79 | 120,000,000 | 20,000,000 | 12.0000 | 2.0000 |
3.82 | 400,000,000 | 31,755,949 | 40.0000 | 3.1756 |
7.69 | 300,000,000 | 30,000,000 | 30.0000 | 3.0000 |
5.12 | 395,000,000 | 26,320,490 | 39.5000 | 2.6320 |
6.52 | 300,000,000 | 60,618,467 | 30.0000 | 6.0618 |
2.78 | 188,647,991 | 33,778,258 | 18.8648 | 3.3778 |
-1.25 | 50,000,000 | 20,000,000 | 5.0000 | 2.0000 |
3.92 | 357,750,427 | 55,948,661 | 35.7750 | 5.5949 |
4.74 | 395,000,000 | 42,279,580 | 39.5000 | 4.2280 |
-0.98 | 35,000,000 | 45,000,000 | 3.5000 | 4.5000 |
1.34 | 70,522,208 | 9,807,424 | 7.0522 | 0.9807 |
2.05 | 131,319,757 | 11,892,034 | 13.1320 | 1.1892 |
3.75 | 295,721,911 | 7,649,064 | 29.5722 | 0.7649 |
3.76 | 303,489,724 | 64,061,968 | 30.3490 | 6.4062 |
4.48 | 385,919,736 | 76,124,744 | 38.5920 | 7.6125 |
0.93 | 38,703,734 | 12,044,349 | 3.8704 | 1.2044 |
0.47 | 20,000,000 | 32,000,000 | 2.0000 | 3.2000 |
0.44 | 78,000,000 | 9,257,624 | 7.8000 | 0.9258 |
0.99 | 45,082,034 | 11,725,510 | 4.5082 | 1.1726 |
1.88 | 123,452,084 | 47,975,552 | 12.3452 | 4.7976 |
3.04 | 210,343,787 | 44,833,849 | 21.0344 | 4.4834 |
2.53 | 177,362,455 | 40,000,000 | 17.7362 | 4.0000 |
4.58 | 325,000,000 | 11,238,401 | 32.5000 | 1.1238 |
Descriptive Statistics using Real statistics Add-In in EXCEL
Descriptive Statistics | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Using EXCEL DATA > DATA analysis > regression
Input Y: GDF
Input X: Lobsters, enrollments
Ok
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.807229605 | |||||
R Square | 0.651619635 | |||||
Adjusted R Square | 0.632788264 | |||||
Standard Error | 1.250876045 | |||||
Observations | 40 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 108.2856149 | 54.14280747 | 34.6028779 | 3.37282E-09 | |
Residual | 37 | 57.89356255 | 1.56469088 | |||
Total | 39 | 166.1791775 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -0.778810642 | 0.479926025 | -1.622772263 | 0.113129217 | -1.751233137 | 0.193611853 |
Lobster | 1.33051E-08 | 1.97528E-09 | 6.735792059 | 6.39218E-08 | 9.30279E-09 | 1.73074E-08 |
Enrollments | 1.67554E-08 | 1.30327E-08 | 1.28564316 | 0.206556211 | -9.65137E-09 | 4.31622E-08 |
Are the data normally distributed? Do lobster consumption and college enrollments predict GDP? Report and interpret (explain) your results including the skewness and kurtosis statistics for each variable, overall regression p-value, r-square, predictor coefficients and their p-values, and sample size.
Shapiro-Wilk Test | |||
GDP | Lobster | Enrollments | |
W-stat | 0.987659 | 0.947955336 | 0.956847113 |
p-value | 0.93455 | 0.064502867 | 0.130556972 |
alpha | 0.05 | 0.05 | 0.05 |
normal | yes | yes | yes |
Box Plot | |||
GDP | Lobster | Enrollments | |
Min | 0 | 20000001.79 | 7649065.79 |
Q1-Min | 2.9225 | 95000000 | 11023691 |
Med-Q1 | 1.2725 | 80823395 | 13327245 |
Q3-Med | 1.37 | 104176605 | 11039034.5 |
Max-Q3 | 3.915 | 100000000 | 33085709.5 |
Mean | 4.30425 | 205980474.5 | 32972668.12 |
Min | -1.79 | 20000000 | 7649064 |
Q1 | 1.1325 | 115000000 | 18672755 |
Median | 2.405 | 195823395 | 32000000 |
Q3 | 3.775 | 300000000 | 43039034.5 |
Max | 7.69 | 400000000 | 76124744 |
Mean | 2.51425 | 205980472.7 | 32972666.33 |
Using Shapiro Wilks test to test for normality of the data. The null hypothesis for this test is that the data are normally distributed. If the chosen alpha level is 0.05 and the p-value is less than 0.05, then the null hypothesis that the data are normally distributed is rejected. If the p-value is greater than 0.05, then the null hypothesis is not rejected.
Since our all three data variables have the p-value is greater than 0.05, then the null hypothesis is not rejected. Thus we can conclude that our data us normal
The linear regression model is given by
The ANOVA of the regression gives F(2,37) = 34.60, p =3.372E-09. Thus it is significant. The coefficient of determination r-square is 0.6516 which implies 65.16% of the variation in GDP is explained by lobster consumption and college enrollments. Though the p-value of the coefficient is not significant as it more than 0.05.