Question

In: Statistics and Probability

You will be performing an analysis on a dataset that contains data on fertility and life...

You will be performing an analysis on a dataset that contains data on fertility and life expectancy for 198 different countries. All data is from the year 2013. The fertility numbers are the average number of children per woman in each of the countries. The life expectancy numbers are the average life expectancy in each of the countries.

You will be turning in a paper that should include section headings, graphics and tables when appropriate and complete sentences which explain all analysis that was done in addition to all conclusions and results. There is not a specified length, however it is important that you follow all steps below and grade yourself using the rubric provided since it is the rubric that I will be using to grade your submissions. All work should be your own. Plagiarism will result in a project score of 0.

Steps (all statistical analysis to be done in Excel and/or StatCrunch):

Watch the TED talk by Hans Roling titled “The best stats you’ve ever seen”. You will need to include comments on this in your paper. Here is a link: http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?language=en

Create histograms of each of the variables (one histogram for fertility, one for life expectancy). Use the histograms to identify the shapes of the distribution. StatCrunch will be the easier tool to use for this particular task.

Calculate some descriptive statistics for each of the variables, including but not limited to the mean, median and standard deviation. Organize these numbers nicely in a table.

Using fertility as the predictor variable and life expectancy as the response variable, create a scatter diagram, come up with the least-squares regression line and calculate the linear correlation coefficient as well as the coefficient of determination. Make sure that you understand all interpretations and include them in your paper. Please carefully review the rubric below to see the full list of required interpretations.

Use the regression line to predict life expectancy for the United States given fertility and then compare this to the actual value in the United States.

Name some possible lurking variables that may be at work here.

Explain the difference between correlation and causation and why we cannot say that there is a cause and effect relationship in this situation.

Explain why we cannot use our regression model to predict the life expectancy of one particular individual.

Take a look at the website where this data was pulled from and comment on how the model might have been different if we used the data from 20, 40 or 60 years ago. Navigate to http://gapminder.org and click on “Gapminder World”. Use the x-axis and y-axis dropdown menus to ensure that ‘life expectancy (years)’ is selected on the y-axis and ‘children per woman (total fertility)’ is selected on the x-axis.

Put everything together into an organized paper and submit !!!

Country	2013 Fertility	2013 Life Expectancy
Afghanistan	4.9	56.2
Albania	1.771	75.8
Algeria	2.795	76.3
Angola	5.863	60.4
Antigua and Barbuda	2.089	75.2
Argentina	2.175	76
Armenia	1.74	73.8
Aruba	1.673	75.455
Australia	1.882	81.8
Austria	1.471	80.8
Azerbaijan	1.924	72.3
Bahamas	1.888	72.5
Bahrain	2.075	79
Bangladesh	2.177	69.5
Barbados	1.849	75.6
Belarus	1.494	70.2
Belgium	1.854	80.2
Belize	2.676	70
Benin	4.845	64.9
Bhutan	2.232	69.4
Bolivia	3.221	71.9
Bosnia and Herzegovina	1.283	77.5
Botswana	2.619	65.8
Brazil	1.801	75
Brunei	1.994	78.7
Bulgaria	1.541	74.5
Burkina Faso	5.605	62
Burundi	6.033	59.8
Cambodia	2.861	67.8
Cameroon	4.78	58.7
Canada	1.67	81.5
Cape Verde	2.292	74.2
Chad	6.263	57.1
Channel Islands	1.459	80.324
Chile	1.82	79.1
China	1.668	76.5
Colombia	2.286	75.6
Comoros	4.714	63.7
Congo, Dem. Rep.	5.933	57.5
Congo, Rep.	4.969	61.5
Costa Rica	1.795	79.8
Cote d'Ivoire	4.866	58.9
Croatia	1.501	77.8
Cuba	1.449	78.3
Cyprus	1.461	82.2
Czech Rep.	1.566	78.2
Denmark	1.88	79.9
Djibouti	3.387	63.4
Dominican Rep.	2.484	73.6
Ecuador	2.559	74.8
Egypt	2.77	70.9
El Salvador	2.184	73.9
Equatorial Guinea	4.845	58.8
Eritrea	4.696	62.1
Estonia	1.604	76.6
Ethiopia	4.519	62.6
Fiji	2.588	66.1
Finland	1.853	80.6
France	1.98	81.7
French Guiana	3.058	77.121
French Polynesia	2.058	76.257
Gabon	4.087	59.1
Gambia	5.751	64.3
Georgia	1.817	72.9
Germany	1.419	80.7
Ghana	3.857	64.9
Greece	1.529	79.8
Greenland	2.077	71.5
Grenada	2.17	71.5
Guadeloupe	2.08	80.947
Guam	2.405	78.854
Guatemala	3.783	72.3
Guinea	4.915	60.2
Guyana	2.546	64
Haiti	3.148	64.3
Honduras	3.001	72
Hong Kong, China	1.135	83.378
Hungary	1.411	75.8
Iceland	2.083	82.8
India	2.479	66.2
Indonesia	2.338	70.5
Iran	1.92	78.3
Iraq	4.026	71.3
Ireland	1.997	80.4
Israel	2.898	82.2
Italy	1.487	82.1
Jamaica	2.26	75.5
Japan	1.419	83.3
Jordan	3.244	78.1
Kazakhstan	2.455	67.8
Kenya	4.382	65.2
Kiribati	2.952	62
Korea, Dem. Rep.	1.988	71.2
Korea, Rep.	1.321	80.5
Kuwait	2.6	80.3
Kyrgyzstan	3.075	68.6
Laos	3.02	65.8
Latvia	1.607	75.3
Lebanon	1.495	78.3
Liberia	4.792	63.1
Libya	2.356	75.6
Lithuania	1.519	75
Luxembourg	1.671	81.1
Macao, China	1.083	80.4
Macedonia, FYR	1.431	76.6
Madagascar	4.468	64.3
Malawi	5.389	57.3
Malaysia	1.964	74.7
Maldives	2.256	79.3
Mali	6.847	57.2
Malta	1.356	82.1
Martinique	1.827	81.41
Mauritania	4.67	65.1
Mauritius	1.501	73.3
Mayotte	3.802	79.19
Mexico	2.185	75.5
Micronesia, Fed. Sts.	3.294	66.8
Moldova	1.456	71.9
Mongolia	2.436	64.7
Montenegro	1.666	75.6
Morocco	2.735	74.3
Mozambique	5.188	56.2
Myanmar	1.938	67.1
Namibia	3.051	60.6
Nepal	2.3	70.6
Netherlands	1.774	80.6
Netherlands Antilles	1.89	76.894
New Caledonia	2.127	76.306
New Zealand	2.052	80.6
Nicaragua	2.498	76.4
Niger	7.561	61.6
Nigeria	5.976	60.1
Norway	1.931	81.4
Oman	2.853	75.5
Pakistan	3.185	65.7
Panama	2.466	77.8
Papua New Guinea	3.781	59.8
Paraguay	2.864	73.7
Peru	2.417	77.1
Philippines	3.043	70
Poland	1.417	76.9
Portugal	1.315	79.8
Puerto Rico	1.636	78.864
Qatar	2.019	81.8
Reunion	2.232	79.646
Romania	1.417	76
Russia	1.595	71.3
Rwanda	4.508	65.3
Saint Lucia	1.912	74.5
Saint Vincent and the Grenadines	1.997	72.7
Samoa	4.147	71.8
Sao Tome and Principe	4.075	68.4
Saudi Arabia	2.644	77.9
Senegal	4.934	65.7
Serbia	1.365	77.7
Seychelles	2.18	73.3
Sierra Leone	4.705	57.7
Singapore	1.282	81.9
Slovak Republic	1.396	76.2
Slovenia	1.509	80
Solomon Islands	4.031	63.7
Somalia	6.563	57.7
South Africa	2.387	60.4
South Sudan	4.92	57.2
Spain	1.505	81.7
Sri Lanka	2.339	76.1
Sudan	4.42	68.9
Suriname	2.268	70.1
Sweden	1.928	81.8
Switzerland	1.533	82.7
Syria	2.964	72.4
Taiwan	1.065	79.3
Tajikistan	3.815	70.6
Tanzania	5.214	62.2
Thailand	1.399	74.9
Timor-Leste	5.855	71.4
Togo	4.639	63
Tonga	3.767	70.3
Trinidad and Tobago	1.797	71.2
Tunisia	2.008	77.1
Turkey	2.041	76.3
Turkmenistan	2.326	67.5
Uganda	5.867	59.8
Ukraine	1.47	71.7
United Arab Emirates	1.801	76.4
United Kingdom	1.892	81
United States	1.976	78.9
Uruguay	2.046	76.9
Uzbekistan	2.309	69.7
Vanuatu	3.382	64.6
Venezuela	2.39	75.4
Vietnam	1.743	76.3
Virgin Islands (U.S.)	2.487	80.152
West Bank and Gaza	4.01	74.6
Western Sahara	2.363	67.764
Yemen, Rep.	4.075	67
Zambia	5.687	56.7
Zimbabwe	3.486	56

Expert Solution

Here I write R-code for given problem in which I calculate some thing which you want
Also I write those things in the front of " # " for every questions. And it output is
write end of code.Also I give picture of Histograms, And write Regression equation.
But before run code first copy given data as it is in Excel.
Then copy the only second and last column without selecting Lable row/Tital.
Then Run it then we get required answers.

The R-code is as follows:

a=read.table("clipboard",header=F)
attach(a)
summary(V1)
summary(V2)
var(V1)
var(V2)
summary(lm(V2~V1))
cor(V1,V2)
hist(V1)
hist(V2)

And the output is as follows:

> summary(V1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.065 1.798 2.277 2.783 3.461 7.561
> summary(V2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
56.00 65.88 74.25 72.17 78.28 83.38
> var(V1)
[1] 1.9415
> var(V2)
[1] 56.29439
> summary(lm(V2~V1))

Call:
lm(formula = V2 ~ V1)

Residuals:
Min 1Q Median 3Q Max
-13.476 -3.074 0.452 3.223 12.468

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 84.1613 0.7176 117.28 <2e-16 ***
V1 -4.3090 0.2307 -18.68 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.511 on 196 degrees of freedom
Multiple R-squared: 0.6404, Adjusted R-squared: 0.6385
F-statistic: 349 on 1 and 196 DF, p-value: < 2.2e-16

> cor(V1,V2)
[1] -0.8002306

And the histograms are:

Here we cannot use our regression model to predict the life expectancy of one particular individual because

Our R-squared value is 0.6385 which explains only 63.85% variation in given data.

The Regression equation is as :

Life Expectancy =84.1613 - (4.3090) *Fertility

For united state The Fertility is 1.976 and the Life Expectancy is 78.9

Now, by using equation we get,

Life Expectancy for united state = 84.1613 - (4.3090) *1.967

= 75.6855?

Which is different than actual value. This difference is called Residual / Error.

Also it seen that there strong negative correlation between two variables.

Thus we can say that fertility increases resulting the Life Expectancy goes down.

orchestra answered 3 years ago

You will be using your Framingham dataset to answer the following questions. You will be performing...

You will be using your Framingham dataset to answer the following questions. You will be performing hypothesis testing. For each question, please write out the null hypothesis, alternate hypothesis, which test statistic you will be using (based on variable type). Then report the results from performing the analysis using SPSS. Make sure to report the test statistic, significance level, and whether you will accept or reject the null hypothesis and why. Finally, if you find significant differences, report the proper....

The dataset in the file Lab11data.xlsx contains data on Crimini mushrooms. The factor variable is the...

The dataset in the file Lab11data.xlsx contains data on Crimini mushrooms. The factor variable is the weight of the mushroom in grams and the response variable is the total copper content in mg. 1. Plot Copper vs. Weight and describe. 2. Find least squares regression line and interpret slope in the words of the problem. 3. Find the coefficient of determination (R2) and interpret in context. 4. Find the correlation coefficient (R) and interpret in context. 5. Find and interpret...

The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...

The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those receiving...

The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...

The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treat- ments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those...

2. The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study....

2. The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if Cognitive Behavioral treatment is effective in helping patients gain weight. Perform all necessary steps for...

The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004...

The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004 in the Netherlands. It has 1436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications. (a.) Explore the data using the data visualization (e.g., Graph > Scatterplot Matrix and Graph > Graph Builder) capabilities of JMP. Which of the pairs among the variables seem to be correlated? (three or four correlations please). Multivariate Correlations Price Age_08_04 KM HP CC...

The dataset Golfers2008.xlsx saved in Datasets in Blackboard contains data on the top 40 golfers in...

The dataset Golfers2008.xlsx saved in Datasets in Blackboard contains data on the top 40 golfers in 2008. This was the year when Tiger Woods won the U.S. Open in June and then had year-ending surgery. Using all the explanatory variables, run a regression predicting Earnings per Round. Determine the best fit model by removing any insignificant x-variables. Rerun the analysis with your best fit model. Make a clear notation of which model is your best-fit model by labeling the worksheet...

Table 1: Population Data for the World Year Life Expectancy (Yrs) Fertility Rate (# Children per...

Table 1: Population Data for the World Year Life Expectancy (Yrs) Fertility Rate (# Children per Woman) Population 2020 72.8 2.44 7,794,790,000 2025 72.8 2.44 8 156 572 190 2030 72.8 2.44 8 500 749 620 2035 72.8 2.44 8 830 589 600 2040 72.8 2.44 9 149 581 080 2045 72.8 2.44 9 456 113 060 2050 72.8 2.44 9 750 278 650 Table 2: Population Data for China Year Life Expectancy (Yrs) Fertility Rate (# Children per Woman)...

For Question 1-4, We will load 1000_Companies.csv dataset that contains data belongs to 1000 companies such...

For Question 1-4, We will load 1000_Companies.csv dataset that contains data belongs to 1000 companies such as R&D, administration and marketing spendings and location. We will use this data to build a machine learning based decision suppport system model to predict companies' profit. Question 1: 10 Points (Load Data) (A) Load the "1000_Companies.csv" dataset - 5 points (B) Display the first and last 5 rows of this dataset - 5 points In [ ]: Question 2: 15 Points (Manipulate...

If you are performing an analysis of a company, would you only look at the last...

If you are performing an analysis of a company, would you only look at the last year's performance ? Please explain your answer. Yes or no is not sufficient.

Question

You will be performing an analysis on a dataset that contains data on fertility and life...

Solutions

Expert Solution

Related Solutions

You will be using your Framingham dataset to answer the following questions. You will be performing...

The dataset in the file Lab11data.xlsx contains data on Crimini mushrooms. The factor variable is the...

The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...

The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...

2. The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study....

The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004...

The dataset Golfers2008.xlsx saved in Datasets in Blackboard contains data on the top 40 golfers in...

Table 1: Population Data for the World Year Life Expectancy (Yrs) Fertility Rate (# Children per...

For Question 1-4, We will load 1000_Companies.csv dataset that contains data belongs to 1000 companies such...

If you are performing an analysis of a company, would you only look at the last...