In: Statistics and Probability
R-Code
3. A rival music streaming company wishes to make inference for the proportion of individuals in the United States who subscribe to Spotify. They plan to take a survey. Let S1, . . . , Sn be the yet-to-be observed survey responses from n individuals, where the event Si = 1 corresponds to the ith individual subscribing to Spotify and the event Si = 0 corresponds to the ith individual does not subscribe to Spotify (i = 1, . . . , n). Assume that S1, . . . , Sn are i.i.d. Bernoulli(π).
a) What distribution does the random variable S = sum of Si from i = 1 to n have? Compute E(S) and var(S). The formulas should involve π and n.
(b) Suppose that n = 30 and π = 0.2. Run a Monte Carlo simulation with m = 10000 replications to verify the formulas for E(S) and var(S) from the previous question. That is, simulate 10000 i.i.d. copies of S and compare the observed average of these to the true mean, and the observed (sample) variance to the true variance. Comment.
(c) Let S¯ = S(n ^−1) = (n ^−1)*sum of Si from i=1 to n. What is the mean and variance of S¯?
(d) Verify your answers to the previous question by a Monte Carlo simulation with m = 10000 replications.
(e) Is S¯ a continuous random variable? Explain.
(f) Run a Monte Carlo simulation to estimate the probability P(S¯− 1/ √ n ≤ π ≤ S¯ + 1/ √ n) when π = 0.2 and n = 10, 20, 80, 160. Hint: For every n considered, do the following m = 10000 times: generate a random variable S˜ with the same distribution as S¯ and record whether |S˜−0.2| ≤ 1/ √ n. The Monte Carlo estimate of the desired probability is the number of times this happened divided by the total number of simulations, m = 10000.
The random variables .
a)The random variable
has pmf
That is the sum of Bernoulli random variables are Binomially distributed. We know the mean and variance of the Binomial random variable is
.
b) The R code for simulating 10000 runs of 30 combinations of Bernoulli RV is given below.
m <- 10000
n <- 30
p <- 0.2
sims = array (dim = c(m,n))
S = array (dim = c(m))
for ( i in 1:m)
{
sims[i,] <- rbinom(n, 1, prob=p)
S[i] <- sum(sims[i,])
}
meanS <- mean(S)
varS <- var(S)
meanS
varS
The output is
> meanS
[1] 6.0379
> varS
[1] 4.782342
So the true mean is and the observed average is .
So the true variance is and the observed variance is .
We can see that the theoretical and simulated values are approximately equal.
c) The RV . The mean and variance of are
d)The R code for simulating 10000 runs of 30 combinations of Bernoulli RV and finding the mean variance of the mean of the sum is given below.
m <- 10000
n <- 30
p <- 0.2
sims = array (dim = c(m,n))
Sm = array (dim = c(m))
for ( i in 1:m)
{
sims[i,] <- rbinom(n, 1, prob=p)
Sm[i] <- sum(sims[i,])/n
}
meanSm <- mean(Sm)
varSm <- var(Sm)
meanSm
varSm
The output is:
> meanSm
[1] 0.1995667
> varSm
[1] 0.005378795
So the true mean is and the observed average is .
So the true variance is and the observed variance is .
e) The RV is discrete and has values, .
f) The question is not clear.
If you have any doubt please revert. Kindly upvote.