Question

In: Statistics and Probability

For this problem, you will run a simulation to investigate how violating the assumption of normally...

For this problem, you will run a simulation to investigate how violating the assumption of normally distributed data can affect the properties of a t-test.

a). The gamma distribution is skewed to the right. It contains a parameter called “shape”. The R function for generating data from a gamma distribution is rgamma – you can read the details in R help. Make three historgrams, each of a sample of size n = 10,000 drawn from a gamma distribution, with shape = 1, shape = 0.5, and shape = 0.1. Use “breaks = 100” to force each histogram to have lots of bars. Describe what you see happening as the shape parameter gets smaller.

b). Write a simulation that repeatedly draws two samples from a gamma distribution with shape = 1, then compares their means using a t-test. For this simulation, use n = 30 for the size of each sample. Write code that will save both the t-test statistic and p-value each time. Then make a histogram of the test statistics, and report the proportion of p-values less than 0.05. Note that, if the assumptions of the t-test are not violated, the p-value should be less than 0.05 5% of the time.

c). Do the same thing in part b. two more times, using shape = 0.5 and shape = 0.1. Does this seem to have any effect on the distribution of the test statistics, or the proportion of p- values less than 0.05?

d). Run the simulation three more times (once for each value of shape), using samples of size n = 10 rather than n = 30. Show the three histograms and three proportions of p-values less than 0.05. Did this have any noticeable effect on the results?

Solutions

Expert Solution

Answer:-

Given That:-

For this problem, you will run a simulation to investigate how violating the assumption of normally distributed data can affect the properties of a t-test.

a). The gamma distribution is skewed to the right. It contains a parameter called “shape”. The R function for generating data from a gamma distribution is rgamma – you can read the details in R help. Make three historgrams, each of a sample of size n = 10,000 drawn from a gamma distribution, with shape = 1, shape = 0.5, and shape = 0.1. Use “breaks = 100” to force each histogram to have lots of bars. Describe what you see happening as the shape parameter gets smaller.

R CODE:

d1<- rgamma(n=10000,shape=1)
d2<- rgamma(n=10000,shape=0.5)
d3<- rgamma(n=10000,shape=0.1)
hist(d1,breaks=100)
windows()
hist(d2,breaks=100)
windows()
hist(d3,breaks=100)
windows()
R OUTPUT:

As the shape -parameter decreases, the variability in the data and the skewness increases.

b). Write a simulation that repeatedly draws two samples from a gamma distribution with shape = 1, then compares their means using a t-test. For this simulation, use n = 30 for the size of each sample. Write code that will save both the t-test statistic and p-value each time. Then make a histogram of the test statistics, and report the proportion of p-values less than 0.05. Note that, if the assumptions of the t-test are not violated, the p-value should be less than 0.05 5% of the time.

R CODE:
count=0
for(i in 1:10000){
t=t.test(rgamma(n=30,shape=1),rgamma(n=30,shape=1),alternative="two.sided")
p[i]=t$p.value
s[i]=t$statistic
if(p[i]<0.05){
count=count+1
}
}
count/10000
hist(s,breaks=100)
windows()

R OUTPUT:

The proportion of p-values less than 0.05 is obtained as 0.0446 which is less than 5%. Hence it confirms the conditions.

The histograph appears to be more or less mesokurtic and symmetric, following Central Limit Theorem

c). Do the same thing in part b. two more times, using shape = 0.5 and shape = 0.1. Does this seem to have any effect on the distribution of the test statistics, or the proportion of p- values less than 0.05?

R CODE:

count=0
for(i in 1:10000){
t=t.test(rgamma(n=30,shape=1),rgamma(n=30,shape=1),alternative="two.sided")
p[i]=t$p.value
s[i]=t$statistic
}
test1=t.test(rgamma(n=30,shape=0.5),rgamma(n=30,shape=0.5),alternative="two.sided")
test2=t.test(rgamma(n=30,shape=0.1),rgamma(n=30,shape=0.1),alternative="two.sided")
p[10001]=test1$p.value
p[10002]=test2$p.value
s[10001]=test1$statistic
s[10002]=test2$statistic
for(i in 1:10002){
if(p[i]<0.05){
count=count+1
}
}
count/10002
hist(s,breaks=100)

R OUTPUT:

This does not affect much the distribution of the test statistic as it is evident from the histogram being still mesokurtic and symmetric.

The proportion of p-values less than 0.05 is obtained to be less than 5%. Hence it does not affect the proportion of p-values less than 0.05.

d). Run the simulation three more times (once for each value of shape), using samples of size n = 10 rather than n = 30. Show the three histograms and three proportions of p-values less than 0.05. Did this have any noticeable effect on the results?

R CODE:

count=0
for(i in 1:10000){
t=t.test(rgamma(n=30,shape=1),rgamma(n=30,shape=1),alternative="two.sided")
p[i]=t$p.value
s[i]=t$statistic
}
test1=t.test(rgamma(n=30,shape=0.5),rgamma(n=30,shape=0.5),alternative="two.sided")
test2=t.test(rgamma(n=30,shape=0.1),rgamma(n=30,shape=0.1),alternative="two.sided")
test3=t.test(rgamma(n=10,shape=1),rgamma(n=10,shape=1),alternative="two.sided")
test4=t.test(rgamma(n=10,shape=0.5),rgamma(n=10,shape=0.5),alternative="two.sided")
test5=t.test(rgamma(n=10,shape=0.1),rgamma(n=10,shape=0.1),alternative="two.sided")
p[10001]=test1$p.value
p[10002]=test2$p.value
p[10003]=test3$p.value
p[10004]=test4$p.value
p[10005]=test5$p.value
s[10001]=test1$statistic
s[10002]=test2$statistic
s[10003]=test3$statistic
s[10004]=test4$statistic
s[10005]=test5$statistic
for(i in 1:10005){
if(p[i]<0.05){
count=count+1
}
}
count/10005
hist(s,breaks=100)

R OUTPUT:

[1] 0.04757621


This again does not affect much the distribution of the test statistic as it is evident from the histogram being still mesokurtic and symmetric.

The proportion of p-values less than 0.05 is obtained to be less than 5%. Hence it does not affect the proportion of p-values less than 0.05.

Hopefully this will help you. In case of any query, do comment. If you are satisfied with the answer, give it a like.Thanks.


Related Solutions

1) how do you decide how long to run a simulation for? (Run length) 2) any...
1) how do you decide how long to run a simulation for? (Run length) 2) any two methods for getting a sample for a non terminating solution 3) for a system where no data is available, how do you decide which probability distribution to use? And what do you do if none of the distributions work for you? 4) Why is average utilization a time weighted average? When do we use simple average ?
VHDL Code will not run simulation. What is the problem with my code?? --VHDL Code library...
VHDL Code will not run simulation. What is the problem with my code?? --VHDL Code library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.NUMERIC_STD.ALL; entity DataMemory16Bits is Port ( Address_DM : in STD_LOGIC_VECTOR(15 downto 0); Data_In_DM : in STD_LOGIC_VECTOR(15 downto 0); Clock : in STD_LOGIC; We_DM : in STD_LOGIC; Re_DM : in STD_LOGIC; Data_Out_DM : out STD_LOGIC_VECTOR(15 downto 0)); end DataMemory16Bits; architecture Behavioral of DataMemory16Bits is Type DataMemory16Bits is array(0 to 31) of STD_LOGIC_VECTOR(15 downto 0); signal memory: DataMemory16Bits; begin process...
What's the key difference between the first and second run of the immune system simulation? How...
What's the key difference between the first and second run of the immune system simulation? How does this simulate the function of the human immune system?
Capsim is a simulation of a business and gives the person the possibility to run a...
Capsim is a simulation of a business and gives the person the possibility to run a business as a manager before the actual business becomes into effect. During the simulation a person takes responsibility of a business and run the company as a manager. In addition, it gives the person the ability to see what a person can do better in order for their business to compete and be better with competitors.   Different things I had learned from business capsim....
PROGRAMMING LANGUAGE : JAVA Problem specification. In this assignment, you will create a simulation for a...
PROGRAMMING LANGUAGE : JAVA Problem specification. In this assignment, you will create a simulation for a CPU scheduler. The number of CPU’s and the list of processes and their info will be read from a text file. The output, of your simulator will display the execution of the processes on the different available CPU’s. The simulator should also display: -   The given info of each process -   CPU utilization - The average wait time - Turnaround time for each process...
In Concept Simulation 10.3 you can explore the concepts that are important in this problem. A...
In Concept Simulation 10.3 you can explore the concepts that are important in this problem. A block of mass m = 0.563 kg is fastened to an unstrained horizontal spring whose spring constant is k= 94.6 N/m. The block is given a displacement of +0.106 m, where the + sign indicates that the displacement is along the +x axis, and then released from rest. (a) What is the force (with sign) that the spring exerts on the block just before...
After the simulation with Red Yoder, How did you feel throughout the simulation experience? Give a...
After the simulation with Red Yoder, How did you feel throughout the simulation experience? Give a brief summary of this patient and what happened in the simulation. What were the main problems that you identified?
What assumption is being made when VaR is calculated using the historical simulation approach and 500...
What assumption is being made when VaR is calculated using the historical simulation approach and 500 days of data?
Explain how simulation is related to probability and explain how simulation is used in the real...
Explain how simulation is related to probability and explain how simulation is used in the real world.
What is the assumption of constant velocity in the long-run implied in the model of quantity...
What is the assumption of constant velocity in the long-run implied in the model of quantity theory of money? Explain.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT