Question

In: Statistics and Probability

Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of...

Using R Studio/R programming...

Usually, we will use a random sample to estimate the statistics of the underlying population. If we assume a given population is a standard normal distribution and we want to estimate its mean, which is the better technique to estimate that mean from a sample:
1. Use the mean of one random sample of size 500
2. Use the mean of 300 random samples of size 10

Run your own experiment and use your results as a supporting argument for your response.

How might I run A and B? What is the format? I don't particularly need the answer, I can determine that on my own. I just need help with the "set up". Thanks!

Expert Solution

Let us assume that we have access to a population which is normally distributed.

So we create a population with large number of entries say 100,000 so that drawing a sample of 500 from it without replacement will not become a finite population analysis (That is sample size is not more than 5% of the population.)

We will assume that we do not know the population standard deviation (and of course the mean, which we are trying to estimate). So we do not know the distribution of the population.

a) Next we draw one random sample of size 500 (without replacement, we are not doing a bootstrapping estimation) and get the sample mean and sample standard deviation s. Since the sample size n=500 is greater than 30, we can use the central limit theorem to say that the sample mean is normally distributed.

We can say with 95% confidence that the population mean lies with in

b) Next we draw 300 random samples of size 10 each and estimate 300 sample means. Now we can get a 95% percentile interval of the mean.

We can then say that we are 95% confident that the population mean lies between 2.5 percentile and 97.5 percentile

Finally we see which of the interval to estimate the population mean is narrower.

R code with comments

----

set.seed(123)

#set the size of the population
N<-100000
#Draw a population of N entries from standard normal(0,1)
pop<-rnorm(N)

#we assume that the population mean and sd are not known.

#part a)
#get 1 sampe of size 500 from the population (without replacement)
x<-sample(pop,size=500)
#get the right tail critical value for 95% confidence
zc<-qnorm(0.975)
#set the standard error of mean
se<-sd(x)/sqrt(500)
#calculate the confidence interval
lci<-mean(x)-se*zc
uci<-mean(x)+se*zc
sprintf('We are 95%% confident that the population mean lies in the interval [%.4f,%.4f]',lci,uci)

#part b)
#get 300 samples of size 10
x<-sample(pop,size=300*10)
#format x into a matrix of 300x10
x<-matrix(x,nrow=300)
#calculate 300 sample means
xbar<-apply(x,1,mean)
#get the percentile intervals
ci<-quantile(xbar,c(0.025,0.975))
sprintf('We are 95%% confident that the population mean lies in the interval [%.4f,%.4f]',ci[1],ci[2])

----

get this

we can see that using a sample of size 500, we are able to estimate the population mean to be between -0.1 and 0.08 in comparison to our estimate of population mean to be between -0.59 and 0.56 when using 300 samples of size 10 (with 95% confidence)

Using a sample size of 500 gives us a narrower interval for the population mean, making it the sampling plan of choice

orchestra answered 1 year ago

R - STUDIO R PROGRAMMING STATISTICS Imagine that you and your friend have catched COVID-19 while...

R - STUDIO R PROGRAMMING STATISTICS Imagine that you and your friend have catched COVID-19 while jogging without social distancing. Your case is more severe than your friend’s at the beginning: there are 400 millions of coronavirus in you, and only 120 millions in your friend. However, your immune system is more effective. In your body the number coronavirus decrease by 20 percent each day (new = 0.8 × orginal), while in your friend it increases by 10 percent each...

Please Use R studio to answer the question. This is the Statistics section of Comparing Groups....

Please Use R studio to answer the question. This is the Statistics section of Comparing Groups. One month before the election, a poll of 630 randomly selected votes showed 54% planning to vote for a certain candidate. A week later, it became known that he had had an extramarital affair, and a new poll showed only 51% of 1010 voters supporting him. Do these results indicate a decrease in voter support fo his candidacy? a) Test an appropriate hypothesis as...

MUST BE FAMILIAR WITH R STUDIO PROGRAMMING* Random samples of resting heart rates are taken from...

****MUST BE FAMILIAR WITH R STUDIO PROGRAMMING***** Random samples of resting heart rates are taken from two groups. Population 1 exercises regularly, and population 2 does not. The data from these two samples (in beats per minute) are given below: Exercise group (sample from population 1): 62.4, 64.1, 66.8, 60.7, 68.2, 69.2, 64.9, 70.9, 67.7, 68, 58.5, 58.9, 64.7 No exercise group (sample from population 2): 79.3, 73.8, 75.3, 74.7, 76.9, 74.9, 73.2, 75.7, 75.2, 76.7, 78.7 Estimate the difference...

Please use R and R studio A sample of 15 female collegiate golfers was selected and...

Please use R and R studio A sample of 15 female collegiate golfers was selected and the clubhead velocity (km/hr) while swinging a driver was determined for each one, resulting in the following data (“Hip Rotational Velocities During the Full Golf Swing,” J.of Sports Science and Medicine, 2009: 296–299): 69.0 69.7 72.7 80.3 81.0 85.0 86.0 86.3 86.7 87.7 89.3 90.7 91.0 92.5 93.0 The corresponding z percentiles are -1.83 -1.28 -0.97 -0.73 -0.52 -0.34 -0.17 0.0 0.17 0.34 0.52...

Using R Studio/R programming... A consumer-reports group is testing whether a gasoline additive changes a car's...

Using R Studio/R programming... A consumer-reports group is testing whether a gasoline additive changes a car's gas mileage. A test of seven cars finds an average improvement of 0.4 miles per gallon with a standard deviation of 3.57. Is the difference significantly greater than 0? Assume that the values are normally distributed. What would the code be?

(Use R Programming to Code) Use the Monte Carol simulation to estimate the probability that all...

(Use R Programming to Code) Use the Monte Carol simulation to estimate the probability that all six faces appear exactly once in six tosses of fair dice.

Using R Studio Now, set the seed to 348 with `set.seed()`. Then take a sample of...

Using R Studio Now, set the seed to 348 with `set.seed()`. Then take a sample of size 10,000 from a normal distribution with a mean of 82 and a standard deviation of 11. (a) Using sum() on a logical vector, how many draws are less than 60? Using mean() on a logical vector, what proportion of the total draws is that? How far is your answer from pnorm() in 1.1 above? ```{R} set.seed(348) x=rnorm(10000,82,11) sum(ifelse(x<60,1,0)) mean(ifelse(x<60,1,0)) pnorm(60,82,11) Using sum() function...

Please use R studio, Thank you. 2. The probability of a student passing statistics is known...

Please use R studio, Thank you. 2. The probability of a student passing statistics is known to be 0.41; and the probability of a student passing chemistry is known to be 0.55. If the probability of passing both is known to be 0.35, calculate: (a) the probability of passing at least one of statistics and chemistry (b) the probability of a student passing chemistry, given that they passed statistics (c) Are passing chemistry and statistics independent? Justify (d) (harder) a...

Using R studio: We will test whether the Shapiro-Wilk test is resistant to outliers. Run the...

Using R studio: We will test whether the Shapiro-Wilk test is resistant to outliers. Run the code below and decide whether the presence of a single outlier [the 5] changes the ability of the test to determine normality. What conclusions can you make from this experiment? shapiro.test(c(rnorm(100), 6)) shapiro.test(c(rnorm(1000), 6)) shapiro.test(c(rnorm(4000), 6))

We use sample data to estimate population parameters and point estimate and interval estimate are two...

We use sample data to estimate population parameters and point estimate and interval estimate are two typical tools of such estimation. Please discuss them answering the following questions. 1) Briefly discuss about how point estimate and confidence interval are different to each other. 2) Discuss about way(s) to secure unbiased and efficient point estimators. 3) Discuss about the effects of confidence level (or, Z-score) and sample size on the width of confidence interval. Is it good or bad to have...