Question

In: Statistics and Probability

You will be performing an analysis on a dataset that contains data on fertility and life...

You will be performing an analysis on a dataset that contains data on fertility and life expectancy for 198 different countries. All data is from the year 2013. The fertility numbers are the average number of children per woman in each of the countries. The life expectancy numbers are the average life expectancy in each of the countries.

You will be turning in a paper that should include section headings, graphics and tables when appropriate and complete sentences which explain all analysis that was done in addition to all conclusions and results. There is not a specified length, however it is important that you follow all steps below and grade yourself using the rubric provided since it is the rubric that I will be using to grade your submissions. All work should be your own. Plagiarism will result in a project score of 0.

Steps (all statistical analysis to be done in Excel and/or StatCrunch):

Watch the TED talk by Hans Roling titled “The best stats you’ve ever seen”. You will need to include comments on this in your paper. Here is a link: http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?language=en

Create histograms of each of the variables (one histogram for fertility, one for life expectancy). Use the histograms to identify the shapes of the distribution. StatCrunch will be the easier tool to use for this particular task.

Calculate some descriptive statistics for each of the variables, including but not limited to the mean, median and standard deviation. Organize these numbers nicely in a table.

Using fertility as the predictor variable and life expectancy as the response variable, create a scatter diagram, come up with the least-squares regression line and calculate the linear correlation coefficient as well as the coefficient of determination. Make sure that you understand all interpretations and include them in your paper. Please carefully review the rubric below to see the full list of required interpretations.

Use the regression line to predict life expectancy for the United States given fertility and then compare this to the actual value in the United States.

Name some possible lurking variables that may be at work here.

Explain the difference between correlation and causation and why we cannot say that there is a cause and effect relationship in this situation.

Explain why we cannot use our regression model to predict the life expectancy of one particular individual.

Take a look at the website where this data was pulled from and comment on how the model might have been different if we used the data from 20, 40 or 60 years ago. Navigate to http://gapminder.org and click on “Gapminder World”. Use the x-axis and y-axis dropdown menus to ensure that ‘life expectancy (years)’ is selected on the y-axis and ‘children per woman (total fertility)’ is selected on the x-axis.

Put everything together into an organized paper and submit !!!

Country 2013 Fertility 2013 Life Expectancy
Afghanistan 4.9 56.2
Albania 1.771 75.8
Algeria 2.795 76.3
Angola 5.863 60.4
Antigua and Barbuda 2.089 75.2
Argentina 2.175 76
Armenia 1.74 73.8
Aruba 1.673 75.455
Australia 1.882 81.8
Austria 1.471 80.8
Azerbaijan 1.924 72.3
Bahamas 1.888 72.5
Bahrain 2.075 79
Bangladesh 2.177 69.5
Barbados 1.849 75.6
Belarus 1.494 70.2
Belgium 1.854 80.2
Belize 2.676 70
Benin 4.845 64.9
Bhutan 2.232 69.4
Bolivia 3.221 71.9
Bosnia and Herzegovina 1.283 77.5
Botswana 2.619 65.8
Brazil 1.801 75
Brunei 1.994 78.7
Bulgaria 1.541 74.5
Burkina Faso 5.605 62
Burundi 6.033 59.8
Cambodia 2.861 67.8
Cameroon 4.78 58.7
Canada 1.67 81.5
Cape Verde 2.292 74.2
Chad 6.263 57.1
Channel Islands 1.459 80.324
Chile 1.82 79.1
China 1.668 76.5
Colombia 2.286 75.6
Comoros 4.714 63.7
Congo, Dem. Rep. 5.933 57.5
Congo, Rep. 4.969 61.5
Costa Rica 1.795 79.8
Cote d'Ivoire 4.866 58.9
Croatia 1.501 77.8
Cuba 1.449 78.3
Cyprus 1.461 82.2
Czech Rep. 1.566 78.2
Denmark 1.88 79.9
Djibouti 3.387 63.4
Dominican Rep. 2.484 73.6
Ecuador 2.559 74.8
Egypt 2.77 70.9
El Salvador 2.184 73.9
Equatorial Guinea 4.845 58.8
Eritrea 4.696 62.1
Estonia 1.604 76.6
Ethiopia 4.519 62.6
Fiji 2.588 66.1
Finland 1.853 80.6
France 1.98 81.7
French Guiana 3.058 77.121
French Polynesia 2.058 76.257
Gabon 4.087 59.1
Gambia 5.751 64.3
Georgia 1.817 72.9
Germany 1.419 80.7
Ghana 3.857 64.9
Greece 1.529 79.8
Greenland 2.077 71.5
Grenada 2.17 71.5
Guadeloupe 2.08 80.947
Guam 2.405 78.854
Guatemala 3.783 72.3
Guinea 4.915 60.2
Guyana 2.546 64
Haiti 3.148 64.3
Honduras 3.001 72
Hong Kong, China 1.135 83.378
Hungary 1.411 75.8
Iceland 2.083 82.8
India 2.479 66.2
Indonesia 2.338 70.5
Iran 1.92 78.3
Iraq 4.026 71.3
Ireland 1.997 80.4
Israel 2.898 82.2
Italy 1.487 82.1
Jamaica 2.26 75.5
Japan 1.419 83.3
Jordan 3.244 78.1
Kazakhstan 2.455 67.8
Kenya 4.382 65.2
Kiribati 2.952 62
Korea, Dem. Rep. 1.988 71.2
Korea, Rep. 1.321 80.5
Kuwait 2.6 80.3
Kyrgyzstan 3.075 68.6
Laos 3.02 65.8
Latvia 1.607 75.3
Lebanon 1.495 78.3
Liberia 4.792 63.1
Libya 2.356 75.6
Lithuania 1.519 75
Luxembourg 1.671 81.1
Macao, China 1.083 80.4
Macedonia, FYR 1.431 76.6
Madagascar 4.468 64.3
Malawi 5.389 57.3
Malaysia 1.964 74.7
Maldives 2.256 79.3
Mali 6.847 57.2
Malta 1.356 82.1
Martinique 1.827 81.41
Mauritania 4.67 65.1
Mauritius 1.501 73.3
Mayotte 3.802 79.19
Mexico 2.185 75.5
Micronesia, Fed. Sts. 3.294 66.8
Moldova 1.456 71.9
Mongolia 2.436 64.7
Montenegro 1.666 75.6
Morocco 2.735 74.3
Mozambique 5.188 56.2
Myanmar 1.938 67.1
Namibia 3.051 60.6
Nepal 2.3 70.6
Netherlands 1.774 80.6
Netherlands Antilles 1.89 76.894
New Caledonia 2.127 76.306
New Zealand 2.052 80.6
Nicaragua 2.498 76.4
Niger 7.561 61.6
Nigeria 5.976 60.1
Norway 1.931 81.4
Oman 2.853 75.5
Pakistan 3.185 65.7
Panama 2.466 77.8
Papua New Guinea 3.781 59.8
Paraguay 2.864 73.7
Peru 2.417 77.1
Philippines 3.043 70
Poland 1.417 76.9
Portugal 1.315 79.8
Puerto Rico 1.636 78.864
Qatar 2.019 81.8
Reunion 2.232 79.646
Romania 1.417 76
Russia 1.595 71.3
Rwanda 4.508 65.3
Saint Lucia 1.912 74.5
Saint Vincent and the Grenadines 1.997 72.7
Samoa 4.147 71.8
Sao Tome and Principe 4.075 68.4
Saudi Arabia 2.644 77.9
Senegal 4.934 65.7
Serbia 1.365 77.7
Seychelles 2.18 73.3
Sierra Leone 4.705 57.7
Singapore 1.282 81.9
Slovak Republic 1.396 76.2
Slovenia 1.509 80
Solomon Islands 4.031 63.7
Somalia 6.563 57.7
South Africa 2.387 60.4
South Sudan 4.92 57.2
Spain 1.505 81.7
Sri Lanka 2.339 76.1
Sudan 4.42 68.9
Suriname 2.268 70.1
Sweden 1.928 81.8
Switzerland 1.533 82.7
Syria 2.964 72.4
Taiwan 1.065 79.3
Tajikistan 3.815 70.6
Tanzania 5.214 62.2
Thailand 1.399 74.9
Timor-Leste 5.855 71.4
Togo 4.639 63
Tonga 3.767 70.3
Trinidad and Tobago 1.797 71.2
Tunisia 2.008 77.1
Turkey 2.041 76.3
Turkmenistan 2.326 67.5
Uganda 5.867 59.8
Ukraine 1.47 71.7
United Arab Emirates 1.801 76.4
United Kingdom 1.892 81
United States 1.976 78.9
Uruguay 2.046 76.9
Uzbekistan 2.309 69.7
Vanuatu 3.382 64.6
Venezuela 2.39 75.4
Vietnam 1.743 76.3
Virgin Islands (U.S.) 2.487 80.152
West Bank and Gaza 4.01 74.6
Western Sahara 2.363 67.764
Yemen, Rep. 4.075 67
Zambia 5.687 56.7
Zimbabwe 3.486 56

Solutions

Expert Solution

Here I write R-code for given problem in which I calculate some thing which you want
Also I write those things in the front of " # " for every questions. And it output is
write end of code.Also I give picture of Histograms, And write Regression equation.
But before run code first copy given data as it is in Excel.
Then copy the only second and last column without selecting Lable row/Tital.
Then Run it then we get required answers.

The R-code is as follows:

a=read.table("clipboard",header=F)
attach(a)
summary(V1)
summary(V2)
var(V1)
var(V2)
summary(lm(V2~V1))
cor(V1,V2)
hist(V1)
hist(V2)


And the output is as follows:

> summary(V1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.065 1.798 2.277 2.783 3.461 7.561
> summary(V2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
56.00 65.88 74.25 72.17 78.28 83.38
> var(V1)
[1] 1.9415
> var(V2)
[1] 56.29439
> summary(lm(V2~V1))

Call:
lm(formula = V2 ~ V1)

Residuals:
Min 1Q Median 3Q Max
-13.476 -3.074 0.452 3.223 12.468

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 84.1613 0.7176 117.28 <2e-16 ***
V1 -4.3090 0.2307 -18.68 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.511 on 196 degrees of freedom
Multiple R-squared: 0.6404, Adjusted R-squared: 0.6385
F-statistic: 349 on 1 and 196 DF, p-value: < 2.2e-16

> cor(V1,V2)
[1] -0.8002306

And the histograms are:

Here we cannot use our regression model to predict the life expectancy of one particular individual because

Our R-squared value is 0.6385 which explains only 63.85% variation in given data.

The Regression equation is as :

Life Expectancy =84.1613 - (4.3090) *Fertility

For united state The Fertility is 1.976 and the  Life Expectancy is 78.9

Now, by using equation we get,

Life Expectancy for united state = 84.1613 - (4.3090) *1.967

= 75.6855?

Which is different than actual value. This difference is called Residual / Error.

Also it seen that there strong negative correlation between two variables.

Thus we can say that fertility increases resulting the Life Expectancy goes down.


Related Solutions

You will be using your Framingham dataset to answer the following questions. You will be performing...
You will be using your Framingham dataset to answer the following questions. You will be performing hypothesis testing. For each question, please write out the null hypothesis, alternate hypothesis, which test statistic you will be using (based on variable type). Then report the results from performing the analysis using SPSS. Make sure to report the test statistic, significance level, and whether you will accept or reject the null hypothesis and why. Finally, if you find significant differences, report the proper....
The dataset in the file Lab11data.xlsx contains data on Crimini mushrooms. The factor variable is the...
The dataset in the file Lab11data.xlsx contains data on Crimini mushrooms. The factor variable is the weight of the mushroom in grams and the response variable is the total copper content in mg. 1. Plot Copper vs. Weight and describe. 2. Find least squares regression line and interpret slope in the words of the problem. 3. Find the coefficient of determination (R2) and interpret in context. 4. Find the correlation coefficient (R) and interpret in context. 5. Find and interpret...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those receiving...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treat- ments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those...
2. The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study....
2. The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if Cognitive Behavioral treatment is effective in helping patients gain weight. Perform all necessary steps for...
The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004...
The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004 in the Netherlands. It has 1436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications. (a.) Explore the data using the data visualization (e.g., Graph > Scatterplot Matrix and Graph > Graph Builder) capabilities of JMP. Which of the pairs among the variables seem to be correlated? (three or four correlations please). Multivariate Correlations Price Age_08_04 KM HP CC...
The dataset Golfers2008.xlsx saved in Datasets in Blackboard contains data on the top 40 golfers in...
The dataset Golfers2008.xlsx saved in Datasets in Blackboard contains data on the top 40 golfers in 2008. This was the year when Tiger Woods won the U.S. Open in June and then had year-ending surgery. Using all the explanatory variables, run a regression predicting Earnings per Round. Determine the best fit model by removing any insignificant x-variables.   Rerun the analysis with your best fit model. Make a clear notation of which model is your best-fit model by labeling the worksheet...
Table 1: Population Data for the World Year Life Expectancy (Yrs) Fertility Rate (# Children per...
Table 1: Population Data for the World Year Life Expectancy (Yrs) Fertility Rate (# Children per Woman) Population 2020 72.8 2.44 7,794,790,000 2025 72.8 2.44 8 156 572 190 2030 72.8 2.44 8 500 749 620 2035 72.8 2.44 8 830 589 600 2040 72.8 2.44 9 149 581 080 2045 72.8 2.44 9 456 113 060 2050 72.8 2.44 9 750 278 650 Table 2: Population Data for China Year Life Expectancy (Yrs) Fertility Rate (# Children per Woman)...
For Question 1-4, We will load 1000_Companies.csv dataset that contains data belongs to 1000 companies such...
For Question 1-4, We will load 1000_Companies.csv dataset that contains data belongs to 1000 companies such as R&D, administration and marketing spendings and location. We will use this data to build a machine learning based decision suppport system model to predict companies' profit. Question 1: 10 Points (Load Data) (A) Load the "1000_Companies.csv" dataset - 5 points (B) Display the first and last 5 rows of this dataset - 5 points In [ ]: ​ Question 2: 15 Points (Manipulate...
If you are performing an analysis of a company, would you only look at the last...
If you are performing an analysis of a company, would you only look at the last year's performance ? Please explain your answer. Yes or no is not sufficient.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT