In: Statistics and Probability
Power calculation for hypothesis testing are relatively easy to do with modern statistical software. What do you think adequate power should be for an experiment? What issues need to be considered in answering the question?
In hypothesis testing, statistical power is the probability of correctly rejecting the null hypothesis H0 when the alternative H1 is true. Or in other words, it is the ability of a test to detect a specific effect, when that effect is actually present in the data. This is because, conventionally, we take H0 to represent the absence of any effect and H1 to represent the presence of that effect.
So, Power= 1- P(type-2 error), where type-2 error results by accepting Ho when H0 is actually false (or H1 is true).
Being a probability measure, power lies between 0 and 1.
Thus, a higher power means a lower type-2 error. As a result, a test with a higher power is more reliable in detecting the presence of an effect, than a test with a lower power.
Power should not take a value close to 0. Power value closer to 1 is desirable.
The issues to be considered are:
1. The level of significance or the probability of type-1 error, which is the probability of rejecting H0 when H0 is true, is important to consider because both type-1 and type-2 errors cannot be reduced simultaneously ; as power increases with decrease in type-2 error, one should be careful as not to increase the type-1 error in the process as this might lead to serious misinterpretation of the results. Thus type-1 error must be maintained at a certain desired level and only then type-2 error should be reduced.
2. The importance of the effect in question should be carefully considered. If the effect is such that it's presence must be noted without fail, power should be increased to reduce type-2 error even if type-1 error increases in the process.
3. The sample size of the data affects the power positively. A higher sample size increases power. Thus taking into consideration the importance of the effect, sample size should be carefully determined before data collection. If the effect is not of paramount importance, sample size shouldn't be increased too much as it affects the time and cost of analysis.
The power adequate for an experiment depends upon the nature of the hypothesis that we want to test. If noting the presence of a certain effect even in the slightest amount is important to us, power must be increased. If slight presence of said effect is not relevant and the effect is relevant only if present in a great degree, power should not be too high as this would falsely lead us into rejecting H0; instead, in such situations, we should be more careful about setting the type-1 error at the necessary level so that H0 may not be rejected falsely.