In: Statistics and Probability
The data file contains the Body Mass Index (BMI) for a sample of men and a sample of women. Two of the columns, OW_male and OW_female code the BMI values as: 0 - if BMI ≤ 25.4 (these are considered “not overweight”); 1 - if BMI >= 25.5 (these are considered “overweight”). (a) Test whether there is sufficient evidence to show that the proportion of overweight males (proportion of males who are overweight) is different than the proportion of overweight females in the population. Use the critical value approach and the 0.05 level of significance. Perform the test manually after using Excel to summarize the data (if you use descriptive statistics, the mean coded value in each sample is the sample proportion). (b) Explain how to find the p-value manually (indicate what probability has to be calculated). (c) Finally calculate manually the 95% 2-sided confidence interval for the true difference between the proportions of overweight males and overweight females. (d) Explain how the results in parts (b) and (c) are consistent with your conclusion in part (a).
MedInc | bmi_male | bmi_female | OW_male |
OW-female |
||
32908 | 26.9 | 19.4 | 1 | 0 | ||
35306 | 29.9 | 23.1 | 1 | 0 | ||
34956 | 28.2 | 24.8 | 1 | 0 | ||
44511 | 30.5 | 18.4 | 1 | 0 | ||
42716 | 25.6 | 29.9 | 1 | 1 | ||
34530 | 31.3 | 24.5 | 1 | 0 | ||
31157 | 27.6 | 19.8 | 1 | 0 | ||
35051 | 23.3 | 19 | 0 | 0 | ||
46143 | 25.1 | 22.9 | 0 | 0 | ||
55872 | 29.6 | 17.7 | 1 | 0 | ||
61669 | 22.1 | 25.6 | 0 | 1 | ||
41318 | 24.2 | 25.6 | 0 | 1 | ||
32042 | 26.3 | 22.1 | 1 | 0 | ||
38921 | 31.3 | 23.9 | 1 | 0 | ||
34252 | 22.1 | 27.7 | 0 | 1 | ||
38444 | 23.8 | 22.1 | 0 | 0 | ||
36718 | 26.2 | 28 | 1 | 1 | ||
42227 | 30.3 | 32.3 | 1 | 1 | ||
37513 | 23.4 | 29.1 | 0 | 1 | ||
42193 | 19.7 | 35.2 | 0 | 1 | ||
46632 | 28 | 22.1 | 1 | 0 | ||
33481 | 27.2 | 19.1 | 1 | 0 | ||
36038 | 27 | 25.2 | 1 | 0 | ||
38484 | 21.7 | 18.9 | 0 | 0 | ||
47727 | 24.9 | 24.3 | 0 | 0 | ||
35954 | 30.5 | 21.9 | 1 | 0 | ||
38620 | 25.6 | 28.1 | 1 | 1 | ||
39036 | 29.3 | 16.3 | 1 | 0 | ||
38932 | 33 | 18.4 | 1 | 0 | ||
42326 | 27.1 | 23.8 | 1 | 0 | ||
34479 | 30.2 | 22 | 1 | 0 | ||
24825 | 29 | 1 | ||||
41163 | 24.9 | 0 | ||||
48945 | 22 | 0 | ||||
38938 | 25.6 | 1 |
Test at the 0.05 level of significance whether the sample of
male BMI observations is enough to
show that the mean BMI for males exceeds 25.5. Show your manual
calculations (you may use
Excel to summarize the sample data).
(b) Explain whether your test satisfies the underlying assumptions,
with reference to a boxplot of
the sample data.
4. [10 marks]
The data file contains data on the median incomes (medinc) of
census dissemination areas
in Ottawa.
(a) Treating this set of data as the population, use Excel to
calculate the population mean for the
medinc variable. Set aside all population information until part
(e).
(b) Examine a boxplot and histogram of the population data. Explain
if the means of all possible
random samples of size 40 from this population would form a normal
distribution.
Answer (c) and (d), without the information from (a) and (b).
(c) Now use Excel (Calc Menu – Random Data – Sample from Columns)
to draw twenty
samples of size n = 40 from the Ottawa medinc population. This
procedure must be replicated
twenty times (note that if you open up the same sampling dialog box
each time from the menu,
then you only have to replace the last destination column with the
next one). For each sample,
use Excel to calculate a 95% confidence interval estimate for the
population mean, assuming
you do not know the population standard deviation.
(d) For your first sample, confirm the Excel generated interval by
calculating the interval
manually. Display the sample data using a boxplot and comment on
whether the relevant
assumption regarding the population distribution is warranted given
your sample (state clearly
the assumption needed to justify the interval estimation).
(e) Count the number of intervals out of your twenty that contain
the true value of the population
mean from part (a).
Note : Allowed to solve only one question per post.
(a) Test whether there is sufficient evidence to show
that the proportion of overweight males (proportion of males who
are overweight) is different than the proportion of overweight
females in the population. Use the critical value approach and the
0.05 level of significance. Perform the test manually after using
Excel to summarize the data (if you use descriptive statistics, the
mean coded value in each sample is the sample proportion).
(b) Explain how to find the p-value manually (indicate what
probability has to be calculated).
Step 1 : find the proportion of overweight in males and females as
shown below
Step 2 : Hypothesis testing
Let sample 1 be Males.
Let sample 2 be females.
Hence we find that there is sufficient evidence to show that the proportion of overweight males (proportion of males who are overweight) is different than the proportion of overweight females in the population.
(c) Finally calculate manually the 95% 2-sided confidence interval for the true difference between the proportions of overweight males and overweight females.
(d) Explain how the results in parts (b) and (c) are consistent with your conclusion in part (a).
Yes both results are consistents as the confidence interval is does not contain zero and is greater than 0, indicating the that the proportion of males is greater than that of the females.