In: Statistics and Probability
These are a few short answer questions I am stumped on.
1. What is the sampling distribution of the difference between means? Why can’t you conduct an independent samples t-test without it?
2. What are the assumptions of a two-sample t-test?
3. Why do we “pool” variance for a two-sample t-test? What are the assumptions that make this possible? How does it benefit us?
5. Why is a confidence interval not a probability statement?
9. What is an effect size? What happens to an effect size when sample size increases/decreases? Why?
10. What is power? How is it related to the α-level, sample size, and effect size?
ANSWER1; which is the difference between the two population means. The variance of the distribution of the sample differences is equal to ( / ) + ( / ). Therefore, the standard error of the differences between two means would be equal to . To convert to the standard normal distribution, we use the formula,
The sampling distribution of the difference between means can be thought of as the distribution that would result if we repeated the following three steps over and over again: (1) sample n1 scores from Population 1 and n2 scores from Population 2, (2) compute the means of the two samples (M1 and M2), and (3) compute the difference between means, M1 - M2. The distribution of the differences between means is the sampling distribution of the difference between means.
As you might expect, the mean of the sampling distribution of the difference between means is:
which says that the mean of the distribution of differences between sample means is equal to the difference between population means.
For example, say that the mean test score of all 12-year-olds in a population is 34 and the mean of 10-year-olds is 25. If numerous samples were taken from each age group and the mean difference computed each time, the mean of these numerous differences between sample means would be 34 - 25 = 9.
In order to test whether there is a difference between population means, we are going to make three assumptions:
The consequences of violating the first two assumptions are investigated in the simulation in the next section. For now, suffice it to say that small-to-moderate violations of assumptions 1 and 2 do not make much difference. It is important not to violate assumption 3.
We saw the following general formula for significance testing in the section on testing a single mean:
In this case, our statistic is the difference between sample means and our hypothesized value is 0. The hypothesized value is the null hypothesis that the difference between population means is 0. THIS IS WHY CAN'T BE USED.
2. The assumptions of the two-sample t-test are:1. The data are continuous (not discrete).2. The data follow the normal probability distribution.3. The variances of the two populations are equal. (If not, the Aspin-Welch Unequal-Variance test is used.)4. The two samples are independent. There is no relationship between the individuals in one sample as compared to the other (as there is in the paired t-test). 5. Both samples are simple random samples from their respective populations. Each individual in the population has an equal probability of being selected in the sample.
3. The pooled t-test or TS-pooled, which is the theoretically the correct t-TO POOL OR NOT TO POOL...499statistic, has fallen into some disfavor because of its ‘claimed’ sensitivity todepartures from the assumtions of equal population variances (Peck, Olsen, &Devore). We use a simulation study to disprove this claim. The study consistsof 240 comparisons of the two test statistics. For the sake ofsimplicity, we wildescribe here the basics of one such comparison. We will alsointroduce a fewterms which we will be using throughout the paper. For a single comparison ofthe two test statistics, we draw two independent random samples from two sim-ulated populations. The two populations may (or may not) have equal meansand/or equal variances. For a particular comparison, if thetwo populationsindeed have equal variances, we designate TS-pooled to be the ‘correct’ teststatistic. Similarly if the two populations have unequal variances, we desig-nate TS-unpooled to be the ‘correct’ test statistic. Once the two independentsamples from the two simulated populations are drawn, we perform the test ofhypothesis of equality of two means usingbothTS-pooled and TS-unpooled.Since the tests are conducted on samples from known populations, we recordthe conclusions of both TS-pooled and TS-unpooled as correct or incorrect.Furthermore, we label one of the two test statistics as the ‘better’ one if thep-value corresponding to that test-statistic is closer to the correct conclusion(unless, obviously, both test statistics have exactly the same p-value). For ex-ample, if the two populations, where the two samples were drawn from, indeedhad the same mean, then the test-statistic that yielded the bigger p-value islabeled as the ‘better’ one, whereas if the two populations,where the two sam-ples were drawn from, had unequal means, then the test-statistic that yieldedthe smaller p-value is labeled as the ‘better’ one. In addition, we label the teststatistic that yielded the p-value which is farther away (when compared to eachother) from the correct conclusion to be the ‘underperformer’. Also, if the twosamples were drawn from two populations with the same variance, we refer toit as the “equal-variance setting”, whereas if the two samples were drawn fromtwo populations with unequal variances, we refer to it as the“unequal-variancesetting”. Although we reveal, in section 3, the number of times each test statis-tic arrives at the correct conclusion, it is important to note that as far as thecomparisons are concerned, we are strictly interested in finding the ‘better’ one(or the ‘underperformer’) of the two.
5. I have seen posts argue along the lines of "the actually-computed CI either contains the population mean or it doesn't, so its probability is either 1 or 0", but this seems to imply a strange definition of probability that is dependent on unknown states (i.e. a friend flips fair coin, hides the result, and I am disallowed from saying there is a 50% chance that it's heads).
9. An effect size is a measure of how important a difference is: large effect sizes mean the difference is important; small effect sizes mean the difference is unimportant. It normalizes the average raw gain in a population by the standard deviation in individuals’ raw scores, giving you a measure of how substantially the pre- and post-test scores differ.
10. Power is the probability that the null hypothesis will be correctly rejected.
When conducting a power analysis a priori, there are typically three parameters a researcher will need to know to calculate an appropriate sample size to achieve empirical validity. Those parameters are the alpha value, the power, and the effect size. The alpha value is the level at which you determine to reject the null hypothesis. An alpha level of .05 is typically used when the statistical analysis is conducted in the social sciences field. Power is the probability that the null hypothesis will be correctly rejected. And according to Howell (2010), a generally accepted power is .80.
Regarding effect size, often times it is acceptable to use a medium effect in the sample size calculation, however, it is possible to determine an effect size that is more true to what has been found in previous studies in order to get a more accurate measure.