In: Statistics and Probability
Part 1. Contrast the Bonferroni correction with the false discovery rate procedure.
Part 2. Contrast correlation and causation. Can we conclude about correlation from observational studies or from randomized controlled studies? How about causation?
Part 3. When do we use survival analysis techniques?
Part 4. What are some features of time-to-event data?
Part 1.) The Bonferroni adjustment controls the probability of making one false positive call. In contrast, false discovery rate estimation, as summarized in a q-value, controls the error rate among a set of tests.
The Bonferroni correction sets the significance cut-off at α/n.In large-scale multiple testing (as often happens in genomics), you may be better served by controlling the false discovery rate (FDR). This is defined as the proportion of false positives among all significant results. The FDR works by estimating some rejection region so that, on average, FDR < α.
To perform the correction, simply divide the original alpha level (most like set to 0.05) by the number of tests being performed. The output from the equation is a Bonferroni-corrected p value which will be the new threshold that needs to be reached for a single test to be classed as significant.
Part 2.) Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable.
Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect.
Theoretically, the difference between the two types of relationships are easy to identify — an action or occurrence can cause another (e.g. smoking causes an increase in the risk of developing lung cancer), or it can correlate with another (e.g. smoking is correlated with alcoholism, but it does not cause alcoholism). In practice, however, it remains difficult to clearly establish cause and effect, compared with establishing correlation.
Therefore, we can conclude causation from observational studies or from randomized controlled studies but not the correlation.
Part 3.) Survival Analysis is used to estimate the lifespan of a particular population under study. It is also called ‘Time to Event’ Analysis as the goal is to estimate the time for an individual or a group of individuals to experience an event of interest. This time estimate is the duration between birth and death events.
Survival analysis is also very useful when we attempt to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?
Part 4.) Time-to-event (TTE) data is unique because the outcome of interest is not only whether or not an event occurred, but also when that event occurred.
There are unique features of time to event variables. First, times to event are always positive and their distributions are often skewed. For example, in a study assessing time to relapse in high risk patients, the majority of events (relapses) may occur early in the follow up with very few occurring later. On the other hand, in a study of time to death in a community based sample, the majority of events (deaths) may occur later in the follow up. Standard statistical procedures that assume normality of distributions do not apply. Nonparametric procedures could be invoked except for the fact that there are additional issues. Specifically, complete data (actual time to event data) is not always available on each participant in a study. In many studies, participants are enrolled over a period of time (months or years) and the study ends on a specific calendar date. Thus, participants who enroll later are followed for a shorter period than participants who enroll early.