In: Statistics and Probability
Let's assume, for instance, that you are drawing a sample from a normal distribution with mean=0, and then you use t-test to assess whether the mean of the sample you just drawn is "significantly different" from zero. Then you repeat this procedure (draw-run t-test) many times. Will at least some of your samples (drawn from normal with known mean=0!) look significant? Why? Isn't p-value supposed to tell us good findings from the bad and true hypotheses from false ones? How will the distribution of p-values look across all your multiple trials (again, no simulations, rather what do you feel it should be and why)? Can we quantify how many "significant" samples we expect to see? Let's say we run 1000 trials, how many samples will be significant at any given significance level p?
Yes , at least some of our samples (drawn from normal with known mean=0!) look significant.
Because the sample of each time is a random sample with mean 0 and variance > 0.
So that if the extreme sample may be drawn some times which shows the significance result.
That is the mean is different than zero.
The p-value is the probability of rejecting null hypothesis but actual it is true.
So that there are some chance of false using the criterion of p-value. The less the value of p-value the chance of fall rejection of null hypothesis also less.
How will the distribution of p-values look across all your multiple trials (again, no simulations, rather what do you feel it should be and why)?
Here we take different sample from normal distribution with mean = 0
That is they are independent samples
Suppose we take n samples.
There are two possible outcomes as "the test is significant " and " the test is insignificant"
Therefore the sampling distribution of significant p-values is binomial with probability of success is equal to P(the test is significant) = p
In the 1000 trials there are n*p samples will be significant at any given significance level p.