In: Statistics and Probability
3. A rival music streaming company wishes to make inference for the proportion of individuals in the United States who subscribe to Spotify. They plan to take a survey. Let S1, . . . , Sn be the yet-to-be observed survey responses from n individuals, where the event Si = 1 corresponds to the ith individual subscribing to Spotify and the event Si = 0 corresponds to the ith individual does not subscribe to Spotify (i = 1, . . . , n). Assume that S1, . . . , Sn are i.i.d. Ber(π).
(a) What distribution does the random variable S = Pn i=1 Si have? Compute E(S) and var(S). The formulas should involve π and n.
(b) Suppose that n = 30 and π = 0.2. Run a Monte Carlo simulation with m = 10000 replications to verify the formulas for E(S) and var(S) from the previous question. That is, simulate 10000 i.i.d. copies of S and compare the observed average of these to the true mean, and the observed (sample) variance to the true variance. Comment. 1
(c) Let S¯ = n −1S = n −1 Pn i=1 Si . What is the mean and variance of S¯?
(d) Verify your answers to the previous question by a Monte Carlo simulation with m = 10000 replications.
(e) Is S¯ a continuous random variable? Explain.
(f) Run a Monte Carlo simulation to estimate the probability P(S¯− 1/ √ n ≤ π ≤ S¯ + 1/ √ n) when π = 0.2 and n = 10, 20, 80, 160. Hint: For every n considered, do the following m = 10000 times: generate a random variable S˜ with the same distribution as S¯ and record whether |S˜−0.2| ≤ 1/ √ n. The Monte Carlo estimate of the desired probability is the number of times this happened divided by the total number of simulations, m = 10000.
The random variables [S Bernoulli (T) .i-1,2,..., n] .
a)The random variable
[S=\sum_{i=1}^{n}S_i] has pmf
[{\color{Blue} P\left ( S=k \right )=\binom{n}{k}\pi ^k\left ( 1-\pi \right )^{n-k},k=0,1,2,...,n}]
That is the sum of Bernoulli random variables are Binomially distributed. We know the mean and variance of the Binomial random variable is
[{\color{Blue} E\left ( S \right )=n\pi ,Var\left ( S \right )=n\pi \left ( 1-\pi \right )}] .
b) The R code for simulating 10000 runs of 30 combinations of Bernoulli RV [S_i\sim Bernoulli\left ( 0.2 \right ),i=1,2,...,40] is given below.
m <- 10000
n <- 30
p <- 0.2
sims = array (dim = c(m,n))
S = array (dim = c(m))
for ( i in 1:m)
{
sims[i,] <- rbinom(n, 1, prob=p)
S[i] <- sum(sims[i,])
}
meanS <- mean(S)
varS <- var(S)
meanS
varS
The output is
> meanS
[1] 6.0379
> varS
[1] 4.782342
So the true mean is [E\left ( S \right )=30\times 0.2={\color{Blue} 6}] and the observed average is [{\color{Blue} 6.0379}] .
So the true variance is [Var\left ( S \right )=30\times 0.2\times 0.8={\color{Blue} 4.8}] and the observed variance is [{\color{Blue} 4.782342}] .
We can see that the theoretical and simulated values are approximately equal.
c) The RV [S^{-}=\frac{1}{n}\sum_{i=1}^{n}S_i=\frac{S}{n}] . The mean and variance of [S^{-}] are
[E\left (S^{-} \right )=\frac{E\left ( S \right )}{n}=\frac{n\pi }{n}={\color{Blue} \pi }]
[Var\left (S^{-} \right )=\frac{Var\left ( S \right )}{n^2}=\frac{n\pi\left ( 1-\pi \right ) }{n^2}={\color{Blue}\frac{\pi\left ( 1-\pi \right )}{n} }]
d)The R code for simulating 10000 runs of 30 combinations of Bernoulli RV and finding the mean variance of the mean of the sum is given below.
m <- 10000
n <- 30
p <- 0.2
sims = array (dim = c(m,n))
Sm = array (dim = c(m))
for ( i in 1:m)
{
sims[i,] <- rbinom(n, 1, prob=p)
Sm[i] <- sum(sims[i,])/n
}
meanSm <- mean(Sm)
varSm <- var(Sm)
meanSm
varSm
The output is:
> meanSm
[1] 0.1995667
> varSm
[1] 0.005378795
So the true mean is [E\left ( S^{-} \right )={\color{Blue} 0.2}] and the observed average is [{\color{Blue} 0.1996}] .
So the true variance is [Var\left ( S \right )= \frac{0.2\times 0.8}{30}={\color{Blue} 0.00533}] and the observed variance is [{\color{Blue}0.00538}] .
e) The RV [S^{-}=\frac{1}{n}\sum_{i=1}^{n}S_i=\frac{S}{n}] is discrete and has values, [\frac{i}{n},i=0,1,...,n] .
f) The question is not clear.
If you have any doubt please revert. Kindly upvote.