Question

In: Statistics and Probability

Q. In this question, you will do some resampling and show results in graphics. This is...

Q. In this question, you will do some resampling and show results in graphics. This is related to bootstrap technique. The population distribution is a normal with µ = 10 and σ^2 = 4. The statistic is the sample mean. Hence in theory we know exactly what the density function of the sample mean is.

(a) Simulate a sample, say x, with sample size n=100. Report its mean, sd, min, and max.

(b) Use R functions sample and replicate to resample x 50000 times with replacement. The statistic is the sample mean and the output is booted.data. Find the mean, sd, min, and max of booted.data.

(c) Plot the histogram of booted.data. Please double the cells of histogram since the default one is too small. Please plot as a density plot since the theoretical density will be added in the next step. Comment the shape and center of this distribution.

(d) Plot the histogram of booted.data-mean(x) with twice number of default cells. Please plot as a density plot. Add the theoretical density function of the X¯ −µ to the histogram with different line type and color. Comment out your findings.

(e) Repeat the procedures from (a) to (d) two additional times to check consistency.

Solutions

Expert Solution

Let X be a Random variable having normal distribution with mean and variance

Let be a randomly selected sample of size n=100. Using the central limit theorem, we know that the theoretical distribution of is normally distributed with mean and standard deviation (or the standard error of mean)

That is

(a) Simulate a sample, say x, with sample size n=100. Report its mean, sd, min, and max.

R code with comments

---

#set the random seed
set.seed(123)

#set the sample size
n<-100

#a) simulate x from normal(10,4)
x<-rnorm(n,mean=10,sd=2)
#report mean, sd, min, and max.
sprintf('The sample mean:%.4f sd:%.4f min:%.4f max:%.4f',mean(x),sd(x),min(x),max(x))

----

get this

(b) Use R functions sample and replicate to resample x 50000 times with replacement. The statistic is the sample mean and the output is booted.data. Find the mean, sd, min, and max of booted.data.

R code with comments

----

#b)
#sample x with replacement 50000 times and find the sample mean
booted.data<-replicate(50000,mean(sample(x,size=n,replace=TRUE)))
#report mean, sd, min, and max.
sprintf('The sample mean:%.4f sd:%.4f min:%.4f max:%.4f',
   mean(booted.data),sd(booted.data),min(booted.data),max(booted.data))

----

get this

(c) Plot the histogram of booted.data. Please double the cells of histogram since the default one is too small. Please plot as a density plot since the theoretical density will be added in the next step. Comment the shape and center of this distribution.

R code with comments

---

#c)plot the density histogram of booted.data
hist(booted.data,breaks=30,freq=FALSE)

---

get this

We can see that the histogram has a bell shape centered at around 10.2. This is as expected due to the central limit theorem, that the sampling distribution of sample mean has a normal distribution.

d) Plot the histogram of booted.data-mean(x) with twice number of default cells. Please plot as a density plot. Add the theoretical density function of the X¯ −µ to the histogram with different line type and color. Comment out your findings.

we have already seen that the sample mean of a sample of size n=100 has

Hence the distribution of is normal with mean=0 and standard deviation = 0.2

R code with comments

---

#d) histogram of booted.data-mean(x)
hist(booted.data-mean(x),breaks=30,freq=FALSE)
#add the theoretical distribution of Xbar-mu
curve(dnorm(x,0,0.2),from=min(booted.data)-mean(x),to=max(booted.data)-mean(x),add=TRUE,col="red",lty=2)
----

get this

We can see that the theoretical distribution of indicated by the dotted red line matches the density histogram of booted.data-mean(x), hence supporting the theory.

(e) Repeat the procedures from (a) to (d) two additional times to check consistency.

We will change the random seed to get a different solution

All the code together is

----

#set the random seed
set.seed(124)

#set the sample size
n<-100

#a) simulate x from normal(10,4)
x<-rnorm(n,mean=10,sd=2)
#report mean, sd, min, and max.
sprintf('The sample mean:%.4f sd:%.4f min:%.4f max:%.4f',mean(x),sd(x),min(x),max(x))

#b)
#sample x with replacement 50000 times and find the sample mean
booted.data<-replicate(50000,mean(sample(x,size=n,replace=TRUE)))
#report mean, sd, min, and max.
sprintf('The sample mean:%.4f sd:%.4f min:%.4f max:%.4f',
   mean(booted.data),sd(booted.data),min(booted.data),max(booted.data))


#make way for 2 graphs
par(mfrow=c(2,1))
#c)plot the density histogram of booted.data
hist(booted.data,breaks=30,freq=FALSE)

#d) histogram of booted.data-mean(x)
hist(booted.data-mean(x),breaks=30,freq=FALSE)
#add the theoretical distribution of Xbar-mu
curve(dnorm(x,0,0.2),from=min(booted.data)-mean(x),to=max(booted.data)-mean(x),add=TRUE,col="red",lty=2)
---

output with seed(124)

plots

run #3, with seed(125)

and the plot

We can see that the observations that we made for run #1 still holds good for these 2 runs. and hence the results are consistent with theory.


Related Solutions

Question 2 q = $1,000 IH = $5,000 IS' = $150 How do you describe this...
Question 2 q = $1,000 IH = $5,000 IS' = $150 How do you describe this insurance policy? full and fair insurance full and unfair insurance partial and fair insurance not enough information to determine partial insurance 13. In addition to info in Question number 2 assume r = $900 and p = 0.1 or 10% What is the value of IS ? What is the expected profit of the insurance company? $250; $800 $50; $1,000 $50; $900 $50; $800...
How do report writers determine which type of graphics to include in a report? Will you...
How do report writers determine which type of graphics to include in a report? Will you include graphics in your report? If so, how will they enhance your report? If not, why? Please write the answer at least 250 words. Thanks.
Now, imagine you are a statistician called to do some analysis to show a quality improvement...
Now, imagine you are a statistician called to do some analysis to show a quality improvement in a manufacturing plant. The plant manager wants to use the statistical analysis in a presentation to the company executives to show that quality improvements are being made in the plant. The data is constructed with a sample of size n equal to 300 items and the after data is composed of a sample size n equal to 300 items. When ensuring the integrity...
Hi. This is a question for my summer neuroscience class. Q: What are some of the...
Hi. This is a question for my summer neuroscience class. Q: What are some of the problems associated with relating neuronal activity and hemodynamic response?
whats the market size of electricity market in china. also if there are some graphics for...
whats the market size of electricity market in china. also if there are some graphics for its market share thank you
You promise to do some research on this question and bring the information on your next...
You promise to do some research on this question and bring the information on your next visit. What information do you think would be of benefit to teens? Which two or three sites would you recommend and why?
This question requires that you do some research into recent trends in the labor market and...
This question requires that you do some research into recent trends in the labor market and and to use that information to draw a conclusion that can be transferred to an aggregate supply and demand model to draw a conclusion about impacts on prices and output. Please, include a “picture” of an aggregate supply and demand model. You can “make” the picture from scratch if you want. You can borrow a picture so long as you don’t claim it as...
2. One‐Sample Univariate Hypothesis Testing with Proportions For this question, show the results “by hand”, but...
2. One‐Sample Univariate Hypothesis Testing with Proportions For this question, show the results “by hand”, but you can use R to check your work. Suppose that the 4‐year graduation rate at a large, public university is 70 percent (this is the population proportion of successes). In an effort to increase graduation rates, the university randomly selected 200 incoming freshman to participate in a peer‐advising program. After 4 years, 154 of these students graduated. What are the null and alternative hypotheses?...
Why do different countries show different results for social inequalities? Why is there such a big...
Why do different countries show different results for social inequalities? Why is there such a big gap on the rich and poor in the countries?
can you please do this question? i need the result asap. show the output as well...
can you please do this question? i need the result asap. show the output as well Write a program that will simulate non - preemptive process scheduling algorithm: First Come – First Serve Your program should input the information necessary for the calculation of average turnaround time including: Time required for a job execution; Arrival time; The output of the program should include: starting and terminating time for each job, turnaround time for each job, average turnaround time. Step 1:...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT