In: Statistics and Probability
Steps (all statistical analysis to be done in Excel and/or StatCrunch):
Watch the TED talk by Hans Roling titled “The best stats you’ve ever seen”. You will need to include comments on this in your paper. Here is a link: http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?language=en
Download the Excel data from IvyLearn
Create histograms of each of the variables (one histogram for fertility, one for life expectancy). Use the histograms to identify the shapes of the distribution. StatCrunch will be the easier tool to use for this particular task.
Calculate some descriptive statistics for each of the variables, including but not limited to the mean, median and standard deviation. Organize these numbers nicely in a table.
Using fertility as the predictor variable and life expectancy as the response variable, create a scatter diagram, come up with the least-squares regression line and calculate the linear correlation coefficient as well as the coefficient of determination. Make sure that you understand all interpretations and include them in your paper. Please carefully review the rubric below to see the full list of required interpretations.
Use the regression line to predict life expectancy for the United States given fertility and then compare this to the actual value in the United States.
Name some possible lurking variables that may be at work here.
Explain the difference between correlation and causation and why we cannot say that there is a cause and effect relationship in this situation.
Explain why we cannot use our regression model to predict the life expectancy of one particular individual.
Take a look at the website where this data was pulled from and comment on how the model might have been different if we used the data from 20, 40 or 60 years ago. Navigate to http://gapminder.org and click on “Gapminder World”. Use the x-axis and y-axis dropdown menus to ensure that ‘life expectancy (years)’ is selected on the y-axis and ‘children per woman (total fertility)’ is selected on the x-axis.
Country | 2013 Fertility | 2013 Life Expectancy |
Afghanistan | 4.9 | 56.2 |
Albania | 1.771 | 75.8 |
Algeria | 2.795 | 76.3 |
Angola | 5.863 | 60.4 |
Antigua and Barbuda | 2.089 | 75.2 |
Argentina | 2.175 | 76 |
Armenia | 1.74 | 73.8 |
Aruba | 1.673 | 75.455 |
Australia | 1.882 | 81.8 |
Austria | 1.471 | 80.8 |
Azerbaijan | 1.924 | 72.3 |
Bahamas | 1.888 | 72.5 |
Bahrain | 2.075 | 79 |
Bangladesh | 2.177 | 69.5 |
Barbados | 1.849 | 75.6 |
Belarus | 1.494 | 70.2 |
Belgium | 1.854 | 80.2 |
Belize | 2.676 | 70 |
Benin | 4.845 | 64.9 |
Bhutan | 2.232 | 69.4 |
Bolivia | 3.221 | 71.9 |
Bosnia and Herzegovina | 1.283 | 77.5 |
Botswana | 2.619 | 65.8 |
Brazil | 1.801 | 75 |
Brunei | 1.994 | 78.7 |
Bulgaria | 1.541 | 74.5 |
Burkina Faso | 5.605 | 62 |
Burundi | 6.033 | 59.8 |
Cambodia | 2.861 | 67.8 |
Cameroon | 4.78 | 58.7 |
Canada | 1.67 | 81.5 |
Cape Verde | 2.292 | 74.2 |
Chad | 6.263 | 57.1 |
Channel Islands | 1.459 | 80.324 |
Chile | 1.82 | 79.1 |
China | 1.668 | 76.5 |
Colombia | 2.286 | 75.6 |
Comoros | 4.714 | 63.7 |
Congo, Dem. Rep. | 5.933 | 57.5 |
Congo, Rep. | 4.969 | 61.5 |
Costa Rica | 1.795 | 79.8 |
Cote d'Ivoire | 4.866 | 58.9 |
Croatia | 1.501 | 77.8 |
Cuba | 1.449 | 78.3 |
Cyprus | 1.461 | 82.2 |
Czech Rep. | 1.566 | 78.2 |
Denmark | 1.88 | 79.9 |
Djibouti | 3.387 | 63.4 |
Dominican Rep. | 2.484 | 73.6 |
Ecuador | 2.559 | 74.8 |
Egypt | 2.77 | 70.9 |
El Salvador | 2.184 | 73.9 |
Equatorial Guinea | 4.845 | 58.8 |
Eritrea | 4.696 | 62.1 |
Estonia | 1.604 | 76.6 |
Ethiopia | 4.519 | 62.6 |
Fiji | 2.588 | 66.1 |
Finland | 1.853 | 80.6 |
France | 1.98 | 81.7 |
French Guiana | 3.058 | 77.121 |
French Polynesia | 2.058 | 76.257 |
Gabon | 4.087 | 59.1 |
Gambia | 5.751 | 64.3 |
Georgia | 1.817 | 72.9 |
Germany | 1.419 | 80.7 |
Ghana | 3.857 | 64.9 |
Greece | 1.529 | 79.8 |
Greenland | 2.077 | 71.5 |
Grenada | 2.17 | 71.5 |
Guadeloupe | 2.08 | 80.947 |
Guam | 2.405 | 78.854 |
Guatemala | 3.783 | 72.3 |
Guinea | 4.915 | 60.2 |
Guyana | 2.546 | 64 |
Haiti | 3.148 | 64.3 |
Honduras | 3.001 | 72 |
Hong Kong, China | 1.135 | 83.378 |
Hungary | 1.411 | 75.8 |
Iceland | 2.083 | 82.8 |
India | 2.479 | 66.2 |
Indonesia | 2.338 | 70.5 |
Iran | 1.92 | 78.3 |
Iraq | 4.026 | 71.3 |
Ireland | 1.997 | 80.4 |
Israel | 2.898 | 82.2 |
Italy | 1.487 | 82.1 |
Jamaica | 2.26 | 75.5 |
Japan | 1.419 | 83.3 |
Jordan | 3.244 | 78.1 |
Kazakhstan | 2.455 | 67.8 |
Kenya | 4.382 | 65.2 |
Kiribati | 2.952 | 62 |
Korea, Dem. Rep. | 1.988 | 71.2 |
Korea, Rep. | 1.321 | 80.5 |
Kuwait | 2.6 | 80.3 |
Kyrgyzstan | 3.075 | 68.6 |
Laos | 3.02 | 65.8 |
Latvia | 1.607 | 75.3 |
Lebanon | 1.495 | 78.3 |
Liberia | 4.792 | 63.1 |
Libya | 2.356 | 75.6 |
Lithuania | 1.519 | 75 |
Luxembourg | 1.671 | 81.1 |
Macao, China | 1.083 | 80.4 |
Macedonia, FYR | 1.431 | 76.6 |
Madagascar | 4.468 | 64.3 |
Malawi | 5.389 | 57.3 |
Malaysia | 1.964 | 74.7 |
Maldives | 2.256 | 79.3 |
Mali | 6.847 | 57.2 |
Malta | 1.356 | 82.1 |
Martinique | 1.827 | 81.41 |
Mauritania | 4.67 | 65.1 |
Mauritius | 1.501 | 73.3 |
Mayotte | 3.802 | 79.19 |
Mexico | 2.185 | 75.5 |
Micronesia, Fed. Sts. | 3.294 | 66.8 |
Moldova | 1.456 | 71.9 |
Mongolia | 2.436 | 64.7 |
Montenegro | 1.666 | 75.6 |
Morocco | 2.735 | 74.3 |
Mozambique | 5.188 | 56.2 |
Myanmar | 1.938 | 67.1 |
Namibia | 3.051 | 60.6 |
Nepal | 2.3 | 70.6 |
Netherlands | 1.774 | 80.6 |
Netherlands Antilles | 1.89 | 76.894 |
New Caledonia | 2.127 | 76.306 |
New Zealand | 2.052 | 80.6 |
Nicaragua | 2.498 | 76.4 |
Niger | 7.561 | 61.6 |
Nigeria | 5.976 | 60.1 |
Norway | 1.931 | 81.4 |
Oman | 2.853 | 75.5 |
Pakistan | 3.185 | 65.7 |
Panama | 2.466 | 77.8 |
Papua New Guinea | 3.781 | 59.8 |
Paraguay | 2.864 | 73.7 |
Peru | 2.417 | 77.1 |
Philippines | 3.043 | 70 |
Poland | 1.417 | 76.9 |
Portugal | 1.315 | 79.8 |
Puerto Rico | 1.636 | 78.864 |
Qatar | 2.019 | 81.8 |
Reunion | 2.232 | 79.646 |
Romania | 1.417 | 76 |
Russia | 1.595 | 71.3 |
Rwanda | 4.508 | 65.3 |
Saint Lucia | 1.912 | 74.5 |
Saint Vincent and the Grenadines | 1.997 | 72.7 |
Samoa | 4.147 | 71.8 |
Sao Tome and Principe | 4.075 | 68.4 |
Saudi Arabia | 2.644 | 77.9 |
Senegal | 4.934 | 65.7 |
Serbia | 1.365 | 77.7 |
Seychelles | 2.18 | 73.3 |
Sierra Leone | 4.705 | 57.7 |
Singapore | 1.282 | 81.9 |
Slovak Republic | 1.396 | 76.2 |
Slovenia | 1.509 | 80 |
Solomon Islands | 4.031 | 63.7 |
Somalia | 6.563 | 57.7 |
South Africa | 2.387 | 60.4 |
South Sudan | 4.92 | 57.2 |
Spain | 1.505 | 81.7 |
Sri Lanka | 2.339 | 76.1 |
Sudan | 4.42 | 68.9 |
Suriname | 2.268 | 70.1 |
Sweden | 1.928 | 81.8 |
Switzerland | 1.533 | 82.7 |
Syria | 2.964 | 72.4 |
Taiwan | 1.065 | 79.3 |
Tajikistan | 3.815 | 70.6 |
Tanzania | 5.214 | 62.2 |
Thailand | 1.399 | 74.9 |
Timor-Leste | 5.855 | 71.4 |
Togo | 4.639 | 63 |
Tonga | 3.767 | 70.3 |
Trinidad and Tobago | 1.797 | 71.2 |
Tunisia | 2.008 | 77.1 |
Turkey | 2.041 | 76.3 |
Turkmenistan | 2.326 | 67.5 |
Uganda | 5.867 | 59.8 |
Ukraine | 1.47 | 71.7 |
United Arab Emirates | 1.801 | 76.4 |
United Kingdom | 1.892 | 81 |
United States | 1.976 | 78.9 |
Uruguay | 2.046 | 76.9 |
Uzbekistan | 2.309 | 69.7 |
Vanuatu | 3.382 | 64.6 |
Venezuela | 2.39 | 75.4 |
Vietnam | 1.743 | 76.3 |
Virgin Islands (U.S.) | 2.487 | 80.152 |
West Bank and Gaza | 4.01 | 74.6 |
Western Sahara | 2.363 | 67.764 |
Yemen, Rep. | 4.075 | 67 |
Zambia | 5.687 | 56.7 |
Zimbabwe | 3.486 | 56 |
Histogram of 2013 Fertility shows that the 2013 Fertility data is positively skewed distribution whereas Histogram of 2013 Life Expectancy shows that the 2013 Life Expectancy data is negatively skewed distribution.
From the scatter plot, it is observed that 2013 Life Expectancy (y) is negatively correlated with 2013 Fertility (x).
Pearson correlation of 2013 Fertility and 2013 Life Expectancy = -0.800
Since P-Value = 0.000<0.05, so correlation coefficient of x and y is significant.
Regression Analysis: 2013 Life Expectancy versus 2013 Fertility
The regression equation is
2013 Life Expectancy = 84.2 - 4.31x 2013 Fertility
Table 1
Predictor Coef SE Coef T P
Constant 84.1613 0.7176 117.28 0.000<0.05
2013 Fertility -4.3090 0.2307 -18.68
0.000<0.05
S = 4.51093 R-Sq = 64.0% R-Sq(adj) = 63.9%
Table 2
Analysis of Variance
Source DF SS MS F P
Regression 1 7101.7 7101.7 349.00 0.000
Residual Error 196 3988.3 20.3
Total 197 11090.0
From Table 1 we observed that Intercept (constant) and Slope (regression coefficient) are significantly present since p-values are less than 0.05.
Hence finally we get the regression equation:
2013 Life Expectancy = 84.2 - 4.31x 2013 Fertility
This regression equation is also significant (since from Table 2, we see that p-value <0.05).
coefficient of determination= 64.0% implies this regression eqution explains 64.0% of total variation in 2013 Life Expectancy.
From the above scatter plot with fitted line it is observed that the fitting is good.
Predicted life expectancy for the United States given fertility=84.2 - 4.31x1.976=75.683
The actual value of life expectancy in the United States=78.900
Residual=78.900-75.683=3.217