In: Statistics and Probability
Explain the conceptual and quantitative relationships between Alpha risk and Beta risk when testing hypotheses, and include the impact sample size plays in managing these risk levels.
Don't just write a brief blurb about the definitions of alpha and beta error. In particular, you should be writing about the ways that accepting an Alpha risk increase can impact total risk -- this is what we concentrate on as engineers. Hypothesis testing is a statistical technique, but risk management is an engineering requirement. Also consider what happens to these distributions as we increase the sample sizes being analyzed (Hint: What happens to the standard error as n increases?).
Minimum 400 words
Let's understand the concept of Null hypothesis first.
Null hypothesis is the prior belief about the parameter of
interest. Normally, there are good reasons for the belief,
so we only change it (reject it ) if we have enough evidence to
believe that it is wrong. Generally, this error
should be small and to ensure that, we fix a small percentage of
error bound for this. This is type 1 error and
the corresponding risk is called alpha risk. The risk is the
expected loss.
There are two types of error:
Type 1: Reject Null when Null is true
Type 2: Do NOT reject Null when Null is false.
Alpha risk:This is the risk when we Reject Null when Null is
true.
Beta risk: The risk when we Do NOT reject Null but Null is
false
Impact of Sample size: increasing sample size controls the error.
Larger the sample size is, the smaller the
beta risk would be given a level of alpha risk(or equivalently type
1 error). Since we fix our type 1 error, we can
only control beta risk. So increasing sample size improves beta
risk as we get more information to make the
decision.
alpha = PH0 (Reject H0) is the type 1 error.
beta = PH1 (Reject H1) is power
1-beta is the type 2 error
alpha-risk = E(L(theta,delta))
where theta is the parameter. If theta is in the null region, the risk is alpha risk, when it is in the alternative region,the risk is beta risk
The first step in the scientific process is not observation but
the generation of a hypothesis which may then be
tested critically by observations and experiments. Popper (a
scientist) makes the important claim that the goal of the
scientist’s efforts is not the verification but the falsification
of the initial hypothesis. It is logically impossible to
verify
the truth of a general law by repeated observations, but, at least
in principle, it is possible to falsify such a law by a
single observation. Repeated observations of white swans did not
prove that all swans are white, but the observation
of a single black swan sufficed to falsify that general
statement.
A hypothesis (for example, Tamiflu [oseltamivir], drug of
choice in H1N1 influenza, is associated with an increased
incidence of acute psychotic manifestations) is either true
or false in the real world. Because the investigator cannot
study all people who are at risk, he must test the
hypothesis in a sample of that target population. No matter
how many data a researcher collects, he can never
absolutely prove (or disprove) his hypothesis. There will
always be a need to draw inferences about phenomena in
the population from events observed in the sample .
Just like a judge’s conclusion, an investigator’s conclusion
may be wrong. Sometimes, by chance alone, a sample is
not representative of the population. Thus the results in
the sample do not reflect reality in the population, and the
random error leads to an erroneous inference. A type I
error (false-positive) occurs if an investigator rejects a
null
hypothesis that is actually true in the population; a type II
error (false-negative) occurs if the investigator fails to
reject a null hypothesis that is actually false in the
population. Although type I and type II errors can never be
avoided entirely, the investigator can reduce their
likelihood by increasing the sample size (the larger the
sample, the lesser is the likelihood that it will differ
substantially from the population).
False-positive and false-negative results can also occur
because of bias (observer, instrument, recall, etc.). (Errors
due to bias, however, are not referred to as type I and type
II errors.) Such errors are troublesome, since they may be
difficult to detect and cannot usually be quantified.
The likelihood that a study will be able to detect an
association between a predictor variable and an outcome
variable depends, of course, on the actual magnitude of
that association in the target population. If it is large
(such
as 90% increase in the incidence of psychosis in people
who are on Tamiflu), it will be easy to detect in the sample.
Conversely, if the size of the association is small (such as
2% increase in psychosis), it will be difficult to detect in
the
sample. Unfortunately, the investigator often does not
know the actual magnitude of the association — one of the
purposes of the study is to estimate it. Instead, the
investigator must choose the size of the association that
he would like to be able to detect in the sample. This
quantity is known as the effect size. Selecting an
appropriate effect size is the most difficult aspect of
sample size planning. Sometimes, the investigator can
use data from other studies or pilot tests to make an
informed guess about a reasonable effect size. When
there are no data with which to estimate it, he can choose
the smallest effect size that would be clinically
meaningful,
for example, a 10% increase in the incidence of psychosis.
Of course, from the public health point of view, even a 1%
increase in psychosis incidence would be important. Thus
the choice of the effect size is always somewhat arbitrary,
and considerations of feasibility are often paramount.
When the number of available subjects is limited, the
investigator may have to work backward to determine
whether the effect size that his study will be able to detect
with that number of subjects is reasonable.
After a study is completed, the investigator uses statistical
tests to try to reject the null hypothesis in favor of its
alternative (much in the same way that a prosecuting
attorney tries to convince a judge to reject innocence in
favor of guilt). Depending on whether the null hypothesis is
true or false in the target population, and assuming that
the study is free of bias.
The investigator establishes the maximum chance of
making type I and type II errors in advance of the study.
The probability of committing a type I error (rejecting the
null hypothesis when it is actually true) is called α (alpha)
the other name for this is the level of statistical
significance.
If a study of Tamiflu and psychosis is designed with α =
0.05, for example, then the investigator has set 5% as the
maximum chance of incorrectly rejecting the null
hypothesis (and erroneously inferring that use of Tamiflu
and psychosis incidence are associated in the population).
This is the level of reasonable doubt that the investigator
is willing to accept when he uses statistical tests to
analyze the data after the study is completed.
The probability of making a type II error (failing to reject
the null hypothesis when it is actually false) is called β
(beta). The quantity (1 - β) is called power, the probability
of observing an effect in the sample (if one), of a specified
effect size or greater exists in the population.
If β is set at 0.10, then the investigator has decided that
he
is willing to accept a 10% chance of missing an
association of a given effect size between Tamiflu and
psychosis. This represents a power of 0.90, i.e., a 90%
chance of finding an association of that size. For example,
suppose that there really would be a 30% increase in
psychosis incidence if the entire population took Tamiflu.
Then 90 times out of 100, the investigator would observe
an effect of that size or larger in his study. This does not
mean, however, that the investigator will be absolutely
unable to detect a smaller effect; just that he will have
less
than 90% likelihood of doing so.
Ideally alpha and beta errors would be set at zero,
eliminating the possibility of false-positive and false-
negative results. In practice they are made as small as
possible. Reducing them, however, usually requires
increasing the sample size. Sample size planning aims at
choosing a sufficient number of subjects to keep alpha
and beta at acceptably low levels without making the study
unnecessarily expensive or difficult.
Many studies set alpha at 0.05 and beta at 0.20 (a power
of 0.80). These are somewhat arbitrary values, and others
are sometimes used; the conventional range for alpha is
between 0.01 and 0.10; and for beta, between 0.05 and
0.20. In general the investigator should choose a low
value of alpha when the research question makes it
particularly important to avoid a type I (false-positive)
error, and he should choose a low value of beta when it is
especially important to avoid a type II error.
Hypothesis testing is the sheet anchor of empirical
research and in the rapidly emerging practice of evidence-
based medicine. However, empirical research and, ipso
facto, hypothesis testing have their limits. The empirical
approach to research cannot eliminate uncertainty
completely. At the best, it can quantify uncertainty. This
uncertainty can be of 2 types: Type I error (falsely
rejecting a null hypothesis) and type II error (falsely
accepting a null hypothesis). The acceptable magnitudes
of type I and type II errors are set in advance and are
important for sample size calculations. Another important
point to remember is that we cannot ‘prove’ or ‘disprove’
anything by hypothesis testing and statistical tests. We
can only knock down or reject the null hypothesis and by
default accept the alternative hypothesis. If we fail to
reject
the null hypothesis, we accept it by default.