In: Statistics and Probability
2. Write a simulation in R that shows the distribution of the t-test statistic when the null hypothesis is true. To do this, you should use a for loop that repeatedly performs t-tests comparing sample means of data that come from distributions with the same population mean and standard deviation. Use rnorm() to take samples, t.test() to perform the t-tests, and use “$statistic” to extract the t-test
statistic from the t.test() procedure (e.g. t.test(x,y)$statistic). Make a histogram of the test statistics. If you need help, look back at the notes on for loops.
One assumption of the t-test is that the populations you sample from have the same standard deviation. Violating this assumption can affect the distribution of the t-test statistic. This is especially the case when sample sizes are unequal.
Re-do the simulation from 2, but this time sample from normal distributions with the same mean but where one has a standard deviation of 1 and a sample size of 20, and the other has a standard deviation of 5 and a sample size of 100. Plot a histogram of the test statistics. How does this differ from the histogram in part 2?
Perform the procedure in part a. above, but this time use the
“pooled variance” t-test. To do this, add “var.equal=TRUE” as an
argument in the t.test function. Plot a histogram of the test
statistics. How does this differ from the histogram in part a.
above
using Rstudio
We need to write a simulation in R that shows the distribution of the t-test statistic when the null hypothesis is true.
Let x1,x2,...xn1 be n1 random samples from variable X with mean 1 and and y1,y2,...yn2 be n2 random samples from variable Y with mean 2
Then null hypothesis is
H0 : 1 = 2
We reject null hypothesis if Test statistics value is greater than t-table value
Where Test statistics : TS =
And t-table value is
If |TS| > , we reject null hypothesis .
Now we dose not need to do it manually , we will use R-software only to calculate Test statistics value which we need to plot .
We need to simulate when the null hypothesis is true. Thus mean of two samples need to be same
Thus we will take 2 diferent samples from normal distribution of different size but with same population mean and standard deviation .
R - Code and output
{
TS=1.0
# define variable TS
for(i in
1:100) # to
simulate 100 times
{
x=rnorm(50,5,2)
# First sample of size 50 , mean = 5 ,sd =2
y=rnorm(55,5,2)
# Second sample of size 55 , mean = 5 ,sd =2
TS[i]=t.test(x,y)$statistic
# to store test statistics values
}
hist(TS,xlab="Test
Statistic",col=2) # to
plot histogram of test statistics .
}
Now we need to simulate , but this time sample from normal distributions with the same mean but where one has a standard deviation of 1 and a sample size of 20, and the other has a standard deviation of 5 and a sample size of 100 .
Let both have same mean equal to 5 .
R - Code and output
{
TS=1.0
# define variable TS
for(i in
1:100) # to
simulate 100 times
{
x=rnorm(20,5,1)
# First sample of size 20, mean =5 , sd =1
y=rnorm(100,5,5)
# Second sample of size 100 , mean =5 , sd =5
TS[i]=t.test(x,y)$statistic
# to store statistics values
}
hist(TS,xlab="Test
Statistic",col=2) # to
plot histogram of test statistics .
We can this differ from the histogram in part 2 ,as we obser more extreme values of test statistics will will result into rejection of null hypothesis . Thus in these part we we have same ,mean but different sample size and standard deviation , we will observe more rejection of null hypothesis compared to that of lst part i.e t-test will falsly concluded that mean of two sample is not same .
Now we need to perform the procedure in part a. above, but this time use the “pooled variance” t-test. To do this, we need to add “var.equal=TRUE” as an argument in the t.test function .
These time we need to asuume that varation are equal.
We will use same previous random samples from rnorm(20,5,1) & rnorm(100,5,5)
R - code and output
{
TS=1.0
for(i in 1:100)
{
x=rnorm(20,5,1)
y=rnorm(100,5,5)
TS[i]=t.test(x,y,var.equal=TRUE)$statistic
# t-test with “var.equal=TRUE”
}
hist(TS,xlab="Test Statistic (with var.equal=TRUE)",col=3)
}
In all case null hypothesis is not rejected as all Test Statistic values are very smaller compared to that of last part when we have not consider “var.equal=TRUE” .
For these case we can see range of all Test Statistic is from -1.5 to 1.5 which is far smaller.
After using “var.equal=TRUE” t-test statistic values shows the null hypothesis to be true.