Question

In: Statistics and Probability

Background This part is based on a study of premature mortality in Great Britain between 2012...

Background

This part is based on a study of premature mortality in Great Britain between 2012 and 2014. The dataset from this study includes information on premature mortality for 378 local authorities in Great Britain from 2012 to 2014. Premature mortality is measured as the number of individuals that die before the age of 70 in a cohort of 100,000. In addition to total premature mortality, the dataset also includes a breakdown by gender and socioeconomic indicators such as income, education and employment for each local authority.

Dataset

You can run the following line of code in R and this will load the data directly from the course website:

pmdata <- read.csv("https://uclspp.github.io/datasets/data/pmgb2012_2014.csv")

Codebook

The codebook describes the variables in the dataset.

Variable Description

code Unique identifierer for each local authority

country 1 = England, 2 = Scotland, 3 = Wales

pop_density 1 = low, 2 = medium, 3 = high

pmdeaths_total Number of premature deaths out of 100,00

pmdeaths_female Number of premature deaths among women, out of 100,000

pmdeaths_male Number of premature deaths among men, out of 100,000

mean_income Mean income in the local authority

edu_level3 Qualification: proportion of the population with A level

edu_level4 Qualification: proportion of the population with degree-level education or equivalent

2a. Descriptive Statistics

• Using the appropriate measures, report and interpret the central tendency and dispersion for the following variables:

– edu_level3

– edu_level4

– pop_density

2b. Visualization •

Produce a scatter plot of premature mortality (pmdeaths_total) on the y-axis and degree-level education (edu_level4 ) on the x-axis

• Provide an explanation of the substantive meaning of the graphs. What do they tell us about the association between premature mortality and levels of education in Great Britain?

• Produce a box plot that compares premature mortality in England, Scotland, Wales.

• What does the plot tell us about how premature mortality varies across the three countries?

2c. Difference in Means

• Calculate the mean difference between premature mortality among men and women in Great Britain.

• Conduct t-test to establish whether the difference between the premature mortality of men and women is statistically significant at the 95% confidence level.

• Interpret the results of the t-test both statistically and substantively

• Interpret the confidence interval of the difference in means

2d. Linear Regression

• Estimate a linear regression model to analyse the relationship between mean income and premature mortality in each local authority. The dependent variable is pmdeaths_total and the independent variable is mean_income

• Present a table with the output of the regression model

• Interpret the main coefficient of interest (mean_income)

• Interpret the estimated intercept term of the regression model

• Interpret the R2 term of the regression model

Expert Solution

pmdata <- read.csv("https://uclspp.github.io/datasets/data/pmgb2012_2014.csv")

head(pmdata)

code country pop_density pmdeaths_total pmdeaths_female pmdeaths_male mean_income edu_level3 edu_level4

1 416 2 2 20484.33 16368.25 24600.41 36950 0.09638724 0.3315938

2 417 2 1 15327.69 13122.84 17532.53 37008 0.10156433 0.2695638

3 267 1 2 14930.00 11442.00 18418.00 28537 0.11180530 0.2198936

4 92 1 1 16548.00 13522.00 19574.00 31561 0.11522675 0.2277338

5 98 1 1 15267.50 12273.00 18262.00 31396 0.11985204 0.2315130

6 423 2 1 17216.90 14495.01 19938.78 28226 0.10897409 0.2363351

2a)

In statistics, most common measures of central tendency are mean and median.

Mean is the sum of all measurements divided by the number of observations in the data.

Median is the middle value that separates the higher half from the lower half of the data.

Mode is the most frequent value in the data.

edu_level3

> mean(pmdata$edu_level3,na.rm=TRUE)

[1] 0.1201406

> median(pmdata$edu_level3,na.rm=TRUE)

[1] 0.1190243

Mean for edu_level3 is 0.12 and median is 0.119

edu_level4

> mean(pmdata$edu_level4,na.rm=TRUE)

[1] 0.2679692

> median(pmdata$edu_level4,na.rm=TRUE)

[1] 0.2569474

Mean for edu_level4 is 0.26797 and median is 0.25695

pop_density

> mean(pmdata$pop_density,na.rm=TRUE)

[1] 1.62234

> median(pmdata$pop_density,na.rm=TRUE)

[1] 1.5

Mean for pop_density is 1.62234 and median is 1.5

2b)

plot(pmdata$edu_level4, pmdata$pmdeaths_total, main="Scatterplot",

xlab="edu_level4 ", ylab="pmdeaths_total ", pch=19)

# Add fit lines

abline(lm(pmdata$pmdeaths_total~pmdata$edu_level4), col="red") # regression line (y~x)

The above scatterplot shows inverse relationship between the two variables (ie pmdeaths_total and edu_level4) ie as edu_level4 increases, pmdeaths_total descreases and vice versa.

# Boxplot of pmdeaths_total by Country

boxplot(pmdeaths_total~country,data=pmdata, main="Premature Mortality by Country",

xlab="1 = England, 2 = Scotland, 3 = Wales", ylab="Number of premature deaths out of 100,00")

Boxplot tells us that the mean Premature Morality is highest for Scotland and Lowest for England

2c)

We can use selective mean function to get the mean for country = 1(England)

> mean(pmdata$pmdeaths_female[pmdata$country==1])

[1] 12322.2

Subtracting from mean of men, we get the mean difference between men and women.

> mean(pmdata$pmdeaths_male[pmdata$country==1])-mean(pmdata$pmdeaths_female[pmdata$country==1])

[1] 5931.648

2d)

Regression:

> regmodel <- lm(pmdeaths_total ~ mean_income, data = pmdata)

> summary(regmodel)

Call:

lm(formula = pmdeaths_total ~ mean_income, data = pmdata)

Residuals:

    Min      1Q  Median      3Q     Max

-5606.0 -1844.4  -362.7  1371.7 10994.7

Coefficients:

              Estimate Std. Error t value Pr(>|t|)

(Intercept)  2.111e+04  5.135e+02   41.12   <2e-16 ***

mean_income -1.600e-01  1.482e-02  -10.79   <2e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2474 on 376 degrees of freedom

Multiple R-squared:  0.2366,  Adjusted R-squared:  0.2346

F-statistic: 116.5 on 1 and 376 DF,  p-value: < 2.2e-16

Hence, the Regression equation is:

Pmdeaths_total = 21110 – 0.16 * mean_income

Since the p-value for mean-income is less then 0.05, hence the variable is significant.

The intercept(21110) is the mean value of Pmdeaths_total when all dependent variable is 0.

R Square of the model is 0.2366 ie only 23.66% of the variation in the independent variable can be

Explained by the dependent variable.

orchestra answered 1 year ago

In a study comparing banks in Germany and Great Britain, a sample of 145 matched pairs...

In a study comparing banks in Germany and Great Britain, a sample of 145 matched pairs of banks was formed. Each pair contained one bank from Germany an done from Great Britain. The pairings were made in such a way that the two members were as similar as possible in regard to such factors as size anda ge (mattched samples). The ratio of total loans outstanding to total assets was calculated for each of the banks. Fort his ratio, the...

Was reconciliation between the American colonies and Great Britain possible in 1774? Why or why not?

Background: This activity is based on the results of a recent study on the safety of...

Background: This activity is based on the results of a recent study on the safety of airplane drinking water that was conducted by the U.S. Environmental Protection Agency (EPA). A study found that out of a random sample of 316 airplanes tested, 40 had coliform bacteria in the drinking water drawn from restrooms and kitchens. As a benchmark comparison, in 2003 the EPA found that about 3.5% of the U.S. population have coliform bacteria-infected drinking water. The question of interest...

A study was conducted to determine association between physical fitness and overall mortality among adults between...

A study was conducted to determine association between physical fitness and overall mortality among adults between 40 and 70 years of age. Researchers identified 500 study participants who exercised 3 or more times per week (exposed) and 500 study participants who exercised less than 3 times per week (unexposed). The study participants were followed over time until they died, were lost to follow-up, or up to a maximum 10 years of follow-up. The high exercise group had 26 deaths and...

Case Study #1 Comparison of Mortality Rates Between Countries As an international health DNP working for...

Case Study #1 Comparison of Mortality Rates Between Countries As an international health DNP working for an international health organization, you have been requested to compare the mortality rates of two new countries which have been recently formed (Burma and Senegal) and to recommend funding for the country with the greater need. For this assignment: 1) Calculate the following rates and describe the mortality rates of the two populations using the data below: a) Infant Mortality Rate b) Ratio of...

Data: Bob wishes to study the relationship between mean annual temperature (Temp) and the mortality rate...

Data: Bob wishes to study the relationship between mean annual temperature (Temp) and the mortality rate (SMI) for a type of breast cancer in women at a 5% significance level and collects paired sample data. SMI 100 96 95 89 89 79 82 72 65 68 53 Temp 50 49 48 47 45 46 44 43 42 40 34 1) Determine predictor variable, x, and response variable, y. What is the response variable? Group of answer choices A) y=mean annual...

Case Study #2: Cultural Considerations—Know Before You Go Note: This case study is based in part...

Case Study #2: Cultural Considerations—Know Before You Go Note: This case study is based in part on the FITT Going Global Workshop: An Introductiono the Cultural Aspects of International Trade tcase study, Beamer and Varner, pp. 216-217 Canadians “Hit the Wall” in China A team of two top executives of a Canadian equipment manufacturer had just arrived in China to negotiate a sale with a local manufacturer who needed new equipment to increase production. The Canadians had never sold in...

A research study investigated differences between male and female students. Based on the study results, we...

A research study investigated differences between male and female students. Based on the study results, we can assume the population mean and standard deviation for the GPA of male students are µ = 3.5 and σ = 0.6. Suppose a random sample of 100 male students is selected and the GPA for each student is calculated. What is the probability that the random sample of 100 male students has a mean GPA greater than 3.42?

20) A research study investigated differences between male and female students. Based on the study results,...

20) A research study investigated differences between male and female students. Based on the study results, we can assume the population mean and standard deviation for the GPA of male students are µ = 3.5 and σ = .05. Suppose a random sample of 100 male students is selected and the GPA for each student is calculated. Find the interval that contains 95.44 percent of the sample means for male students. µ = Population Mean σ =Population Standard Deviation

5) A research study investigated differences between male and female students. Based on the study results,...

5) A research study investigated differences between male and female students. Based on the study results, we can assume the population mean and standard deviation for the GPA of male students are µ = 4.0 and σ = .75. Suppose a random sample of 100 male students is selected and the GPA for each student is calculated. Find the interval that contains 95.44 percent of the sample means for male students. µ = Population Mean σ =Population Standard Deviation [2.5,...