In: Statistics and Probability
(just how to solve those problems..) 3. You are asked to study the relationship between maternal smoking and low birthweight. You have a Stata dataset of babies’ birth weights and whether the mother smoked during pregnancy. Let Yi be a binary variable that equals 1 if a baby is born with low birthweight. Unless otherwise indicated, assume that {Y1,Y2,...,Yn} are independent and identically distributed. Use the dataset bwght2.dta for this question. (a) Use Stata to compute the mean of Yi for mothers who didn’t smoke. Using only the mean and number of observations, show how you can compute the sample standard deviation. (b) Your estimate for the proportion of babies with low birthweight is Y ̄ = .014. Provide an estimate for the variance of Y ̄ . (c) Suppose you want to test the null hypothesis that the proportion of babies born with low birth- weight in this population equal to .02. Conduct a two-sided test at 5% confidence level manually by constructing the following: i. The test statistic ii. The distribution of the test statistic under the null. Explain why you do not need to know the distribution of Y ̄ in order to know this distribution of the test statistic. What feature(s) of the setup make it possible to know this distribution? iii. The rejection rule iv. The outcome of the test (d) What is the p-value of the test? (e) Compute the 95 % confidence interval for the proportion of babies with low birthweight. (f) Confirm the above test results using the built-in Stata command (Hint: to perform t-test, use the ttest command). (g) Now use Stata to compute the mean of Yi for mothers who smoked. Test whether mothers who smoke have a different incidence of low birthweight than mothers who don’t smoke. Note: you may need to create a variable that indicates whether a mother smoked (Hint: the first part is gen cigs_10=1 if cigs>0 & cigs<. The extra part of the if command ensures that your new variable is set to missing when cigs is missing. You can type tabulate cigs cigs_10, missing when you are done to confirm that your new variable is reasonable.) Conduct a two-sided test at the 5% confidence level manually by constructing the following: i. The null hypothesis ii. The test statistic
[As you ask for how to solve the mentioned questions I only give you hints to solve. If you need any further assistance or any doubts feel free to ask. ]
(a) Here Yi's are binary random variables taking value 0 or 1 furthermore Yi's are independent and identically(iid) distributed. If we think each Yi as a Bernoulli trial with some success probability then we can construct a Binomial model. Here the success probability (P, say) is the population proportion of low weighted babies.
Now observe that the mean of Yi's are nothing but the sample proportion of low weighted babies (p, say)
let, n= number of observations
then for binomial model the estimated variance is= v = np(1-p) ( both are available to us)
from the variance, we can get standard deviation just by taking a square root.
(b) use the steps used in (a)
(c) let the null hypothesis be
Ho: P=0.2 against the alternative H1: P !=(not equal to) 0.2
let the joint distribution of (Y1,Y2,......Yn) is = f(P)
now compute the likelihood ratio(LR)= f(P=0.2)/f(P=p) ,[where p=sample proportion]
Now we can construct our test statistics, P-value etc.
g) use the above steps for two groups of Yi's one is for mothers who smoked where the other is for mothers who didn't smoke.