In: Computer Science
Using R Language
The code provided below uses a for loop to simulate conducting 10,000 polls of 8 people in which each person has 58% probability of being a supporter of the Democratic candidate and a 42% probability of being a supporter of the Republican. The way the loop works is it runs through the code inside the loop 10,000 times, but changing the value of i with each iteration (i is 1 in the first iteration, 10,000 in the last).
# Define a vector of integers that has 10,000 elements. poll_sims = vector(length = 10000, mode = "integer") # for loop to simulate 10,000 polls for (i in 1:10000) { # Do a poll of 8 people in which each person has a 58% chance of supporting the # Democratic candidate and 42% chance of supporting the Republican. poll = sample(c("Democrat", "Republican"), size = 8, replace = T, prob = c(.58, .42)) # Count the number of people who support the Democrat and store the result in the # poll_sims vector as the ith result. poll_sims[i] = sum(poll == "Democrat") } 2 # Visualise the poll_sims vector using basic R plot(factor(poll_sims)) # Visualise the poll_sims vector using tidyverse library(tidyverse) qplot(factor(poll_sims)) + geom_bar()
1. Run this code on your own and find the fraction of the simulations in which less than half the people (3 or fewer) support the Democratic candidate. Compare this result to your answer in Question 5 of the previous section.
2. Change the code to simulate 10,000 polls of 100 people (rather than 10,000 polls of 8). Find the fraction of simulations in which less than half the people support the Democratic candidate. In other words, use the simulations to approximate the likelihood that a poll of 100 people will incorrectly guess the winner of the election.
3. Graph the simulations so you can visualize the distribution.
4. Change the code again to simulate 10,000 polls of 1,000 people. Find the fraction of simulations in which between 55% and 61% of the people support the Democratic candidate. In other words, use the simulations to approximate the likelihood that a poll of 1,000 people will be off from the true probability by 3% or less.
1. (PS I don't have Question number 5 of the previous section)
Source function
USA_Polling <- function(){
poll_sims = vector(length = 10000, mode = "integer")
for (i in 1:10000)
poll = sample(c("Democrat", "Republican"), size = 8, replace = T,
prob = c(.58, .42))
poll_sims[i] = sum(poll == "Democrat")
plot(factor(poll_sims))
library(tidyverse)
qplot(factor(poll_sims)) + geom_bar()
z = sum(poll=="Democrat")
return(z)
}
Now Let's call this function for 100 times, by the following function -
runn_poll <- function(){
s = 0;
for (i in 1:100)
{
result = USA_Polling();
if(result<=3)
s = s+1
}
return(s/100)
}
Now ro get the output, add both the files as source and print the result as -
source('~/R/USA_Polling.R')
source('~/R/run_poll.R')
x = runn_poll();
print(x)
Answer to Q2
Change the first function, 4th line -> size = 10 and re-run the code
USA_Polling <- function(){
poll_sims = vector(length = 10000, mode = "integer")
for (i in 1:10000)
poll = sample(c("Democrat", "Republican"), size = 100, replace = T,
prob = c(.58, .42))
poll_sims[i] = sum(poll == "Democrat")
plot(factor(poll_sims))
library(tidyverse)
qplot(factor(poll_sims)) + geom_bar()
z = sum(poll=="Democrat")
return(z)
}
I got a resul of 0.1, You may get different result.
Answer to Q3
Get the result of the first attempt in r1, as
r1 = runn_poll();
Get the
result of the second attempt in r2, after changing if(result<50) in runn_poll(), as
r2 = runn_poll();
Now barplot the results as,
H <- c(r1,r2)
barplot(H)
Answer to Q4
Now the size will be 1000, as
USA_Polling <- function(){
poll_sims = vector(length = 10000, mode = "integer")
for (i in 1:10000)
poll = sample(c("Democrat", "Republican"), size = 1000, replace =
T, prob = c(.58, .42))
poll_sims[i] = sum(poll == "Democrat")
plot(factor(poll_sims))
library(tidyverse)
qplot(factor(poll_sims)) + geom_bar()
z = sum(poll=="Democrat")
return(z)
}
and we have to change runn_poll as well to demonstrate only the fraction of simulations in which between 55% and 61%
runn_poll <- function(){
s = 0;
for (i in 1:100)
{
result = USA_Polling();
if(result>550&&result<610) #as there are 1000
people
s = s+1
}
return(s/100)
}