In: Math
In the description of the statistical models that relate one variable to the other we used terms that suggest a causality relation. One variable was called the "explanatory variable" and the other was called the "response". One may get the impression that the explanatory variable is the cause for the statistical behavior of the response. In negation to this interpretation, some say that all that statistics does is to examine the joint distribution of the variables, but casuality cannot be inferred from the fact that two variables are statistically related.
What do you think? Can statistical reasoning be used in the determination of causality?
As part of your answer in may be useful to consider a specific situation where the determination of casuality is required. Can any of the tools that were discussed in the book be used in a meaningful way to aid in the process of such determination?
Two or more variables considered to be related, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction). One of the two variables which change according to the other variables (Can be in the same direction or different direction) is called the response and another variable is called the explanatory variable.
Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect. At first, we need to elaborate on the causal relationship between the two variables.
The use of a controlled study is the most effective way of
establishing causality between variables. In a controlled study,
the sample or population is split in two, with both groups being
comparable in almost every way. The two groups then receive
different treatments, and the outcomes of each group are
assessed.
For example, in medical research, one group may receive a placebo
while the other group is given a new type of medication. If the two
groups have noticeably different outcomes, the different
experiences may have caused the different outcomes.
Due to ethical reasons, there are limits to the use of controlled studies; it would not be appropriate to use two comparable groups and have one of them undergo a harmful activity while the other does not. To overcome this situation, observational studies are often used to investigate correlation and causation for the population of interest. The studies can look at the groups' behaviours and outcomes and observe any changes over time.
Now, how to use any statistical tool to handle the causal relationship between the response and explanatory variables.
We can use a cholesterol study for understanding the causal relationship between two variables. Assume that there is exactly one type of vegetarian diet and one type of non-vegetarian diet. To simplify notation, let’s call the vegetarian diet “treatment a” and the non-vegetarian diet “treatment b”. For each unit u in the study, there is some time t1 when the unit begins following one of the diets, and some time t2 (e.g., six months later) when the cholesterol is measured. There are two potential outcomes at time t2 for each unit: the cholesterol level that would be observed 2 if the unit were exposed to treatment a at time t1, and the cholesterol level that would be observed if the unit were exposed to treatment b at time t1. Let’s denote these outcomes as Yua and Yub, respectively. The only distinction between Yua and Yub is exposure to different treatments; so, the only explanation of any difference between Yua and Yub is a difference in the effects of the two treatments on cholesterol levels. Thus, if somehow we could simultaneously observe Yua and Yub, the quantity Yua − Yub would tell us exactly how much the cholesterol level for unit u would change if treatment a were used instead of treatment b. Because of this property, Yua − Yub is defined as the causal effect of treatment a relative to treatment b for unit u (Rubin, 1974). When there are n units in the study, one measure of a typical causal effect is the average of these causal effects:
Hence we can use the above-mentioned measure for a causal relationship between two variables.