In: Statistics and Probability
3. A rival music streaming company wishes to make inference for the proportion of individuals in the United States who subscribe to Spotify. They plan to take a survey. Let S1, . . . , Sn be the yet-to-be observed survey responses from n individuals, where the event Si = 1 corresponds to the ith individual subscribing to Spotify and the event Si = 0 corresponds to the ith individual does not subscribe to Spotify (i = 1, . . . , n). Assume that S1, . . . , Sn are i.i.d. Bernoulli(π).
a) What distribution does the random variable S = sum of Si from i = 1 to n have? Compute E(S) and var(S). The formulas should involve π and n.
(b) Suppose that n = 30 and π = 0.2. Run a Monte Carlo simulation with m = 10000 replications to verify the formulas for E(S) and var(S) from the previous question. That is, simulate 10000 i.i.d. copies of S and compare the observed average of these to the true mean, and the observed (sample) variance to the true variance. Comment.
(c) Let S¯ = S(n ^−1) = (n ^−1)*sum of Si from i=1 to n. What is the mean and variance of S¯?
(d) Verify your answers to the previous question by a Monte Carlo simulation with m = 10000 replications.
(e) Is S¯ a continuous random variable? Explain.
(f) Run a Monte Carlo simulation to estimate the probability P(S¯− 1/ √ n ≤ π ≤ S¯ + 1/ √ n) when π = 0.2 and n = 10, 20, 80, 160. Hint: For every n considered, do the following m = 10000 times: generate a random variable S˜ with the same distribution as S¯ and record whether |S˜−0.2| ≤ 1/ √ n. The Monte Carlo estimate of the desired probability is the number of times this happened divided by the total number of simulations, m = 10000.
The random variables
a)The random variable
has pmf
That is the sum of Bernoulli random variables are Binomially distributed. We know the mean and variance of the Binomial random variable is
b) The R code for simulating 10000 runs of 30 combinations of
Bernoulli RV
is given below.
m <- 10000
n <- 30
p <- 0.2
sims = array (dim = c(m,n))
S = array (dim = c(m))
for ( i in 1:m)
sims[i,] <- rbinom(n, 1, prob=p)
S[i] <- sum(sims[i,])
meanS <- mean(S)
varS <- var(S)
The output is
> meanS
[1] 6.0379
> varS
[1] 4.782342
So the true mean is
and the observed average is
So the true variance is
and the observed variance is
We can see that the theoretical and simulated values are approximately equal.
c) The RV
. The mean and variance of
d)The R code for simulating 10000 runs of 30 combinations of Bernoulli RV and finding the mean variance of the mean of the sum is given below.
m <- 10000
n <- 30
p <- 0.2
sims = array (dim = c(m,n))
Sm = array (dim = c(m))
for ( i in 1:m)
sims[i,] <- rbinom(n, 1, prob=p)
Sm[i] <- sum(sims[i,])/n
meanSm <- mean(Sm)
varSm <- var(Sm)
The output is:
> meanSm
[1] 0.1995667
> varSm
[1] 0.005378795
So the true mean is
and the observed average is
So the true variance is
and the observed variance is
e) The RV
is discrete and has values,
f) The question is not clear.
If you have any doubt please revert. Kindly upvote.