Question

In: Statistics and Probability

Post pairs of variables that exhibit positive correlation, negative correlation and no correlation. Could any of...

Post pairs of variables that exhibit positive correlation, negative correlation and no correlation. Could any of the proposed correlated variables be the result of causation? How could an experiment be designed to establish causation? Would it be ethical to do such an experiment? What percentage of the variation in the response variable do you think can be explained by the predictor variable? Do you think there are any lurking variables in your situation?

Sample Student Response

Positive Correlation: rain and the rate the grass grows. Yes there is causation. We could do an experiment to measure the effect or rain on grass growth rate. We can just observe this, but if we want to say causation we need experimentation. I would guess about 75% of the variation in grass growth rate could be explained by the amount of rainfall. Lurking variables might be temperature, fertilizer, sunshine, type of soil...

Negative Correlation: The more I study the less free time I have. Yes there is potential causation. We could design an experiment on this to see if additional studying does reduce free time for people. This may be unethical if it would negatively affect a student’s grade, so it might be best to just do an observational study. I would guess about 50% of the variation in free time could be explained by the amount to study time. Lurking variables might be hours at the workplace, family obligations, sickness, laziness,...

No Correlation: A person’s head circumference and the quantity of text messages a day. The rest of these questions are moot.

Solutions

Expert Solution

One of the central tenets of the pro-vaccine world is that correlation does not imply causation – but it is misused and frequently abused by many writers. We, the pro-science/pro-vaccine world, dismiss correlation, if real correlation can be shown, as robust evidence indicative of any causal relationship.

Conflating causation and correlation is somewhat different than the logical fallacy of post hoc ergo propter hoc, where one thinks one event follows the first event because of the existence of the first event. I’m sure all good luck charms and superstitions, like walking under a ladder, are related to the post hoc ergo propter hoc fallacy. So if I walk under a ladder, then trip on a black cat, then crash into a mirror, I don’t immediately blame the initial act of walking under the ladder. I just assume I’m clumsy.

Correlation and causation are a very critical part of scientific research. Basically, correlation is the statistical relationship between two random sets of data. The closer the relationship, the higher the correlation. However, without further data, correlation may not imply causation, that the one set of data has some influence over the other.

An Example

Let’s invent a massive study to investigate car accidents after vaccinations. In our imaginary study, we find that the rate of automobile accidents with a child in the back seat after a child is vaccinated is higher than the background rate of automobile accidents with children in the back seat who aren’t vaccinated. Does the vaccination itself cause the higher rate of accidents? Well, I suppose you could make an argument that a post-vaccinated child is still screaming or something, distracting the driver, but that variable could happen with unvaccinated children just screaming because they didn’t get their GMO-free, organic, free-range ice cream cone.

But did the vaccine itself cause the accident? Or is it some other factor? Like the driver being stressed because of going to the pediatrician for the vaccine because she read all those lies from the antivaccination groupies? Or because her child is a bit fussy after vaccination, because it happens? In other words, we have data, but it really has no meaning without establishing a reasonable level of causality.

So when you read an article in one of the antivaccination sites that X number of girls died because of the HPV vaccine, or that because the rate of autism has increased while the number of vaccines has increased, the increased vaccination caused the increased rate of autism, immediately, one of us (you know, the pro-science skeptics) will proclaim correlation does not imply causation.

The problem with that proclamation is that it’s too simple. Like everything in science there is more to the understanding of relationship between correlation and causation than simply dismissing it.

For example, the whole science behind vaccines is really showing strong correlation. We know that the smallpox vaccine eradicated smallpox, not because we had direct evidence of causality between the vaccine and the eradication of smallpox, it’s because we had overwhelming correlation along with other types of direct evidence that established causality. And it is this other evidence that is actually more powerful in establishing correlation and causation, or, alternatively, that the evidence of correlation has no relationship to causality.

Evaluating correlation and causation

So how do we know if correlation does not imply causation – alternatively, when do we know it does imply it? There are seven additional tests of the correlation data that could be used to move a finding of correlation between two sets of data from an unknown causal relationship to a presumed one.

  1. The data must be strong. If one observes correlation, to establish causation the increase in risk has to be substantial. In the early research in the links between smoking and lung cancer, it was found that the risk of cancer was 5-10X higher in smokers. If we’re looking at vaccinations, to begin to show causation, you’d need to show a substantial increase in risk compared to a control population. The larger the increase in risk, the better your data supporting causation. In addition, the data must show rates of risk that far surpass the background (general population) risk.
  2. The data must be consistent. If one shows data that could imply causation, it must be consistent across a number of studies with different populations (gender, ethnicity, income, age). Again, going back to the early research in smoking and lung cancer, the first two studies looking at the link were done separately in two different continents (and this being the 1940’s and 50’s, information sharing was limited at best) but showed nearly the same results.
  3. The data must be specific. The data must predict causality, very precisely. Again, back to lung cancer and smoking, the data showed that smoking was linked to one type of cancer, in the lungs, at the precise location where smoking enters the body. One cannot show causality with general data, simply because the data is too imprecise.
  4. The data must be temporal. To show causality, one needs to show association grows stronger over time. For example, the longer one smoked, the greater the increase in risk for lung cancer.
  5. The data must possess a dose response effect. That is, one can show that as you consume or receive more of X, the specific Y response increases. Back to smoking–the more cigarettes smoked, the higher the risk of lung cancer. The temporal effect, mentioned above, and dose response effect are often interrelated. So, if we were to examine a particular specific adverse event to vaccines, then the rate of that specific risk must increase in a linear fashion with additional numbers of vaccines.
  6. The causal effect must be plausible. If we look at smoking again, the early researchers could show a plausible mechanistic link between an inhaled carcinogen and malignant change in the lung’s cells. This is an important facet of determining causality in biological systems.When one argues that GMO foods may cause cancer, how plausible is that? Is there some plausible mechanism between the GMO food and one of 250 different cancers? To be factual, there are precious few environmental factors that cause cancers, and those that are known are not implausible. When someone says that vaccines cause autism, is it even plausible? Is there some mechanism between stimulating the immune system through immunization and causing autism?Plausibility doesn’t mean we take the easy way and just say, “well, just because we don’t know of a mechanism doesn’t mean there isn’t one.” Actually, no you can’t say that. We know a lot about human physiology. It’s not a giant mystery wrapped around an enigma. Human physiology is complex and detailed, so we can envision what is plausible or what isn’t. And what isn’t is a plausible mechanism between vaccination and autism.
  7. The data must be coherent. Other types of evidence, like experimental ones in other models, ought to support the causality. Going back to smoking, the tar from cigarettes was painted on the back of mice, which induced tumors. Moreover, there was other evidence being found in the 1950’s and 1960’s that smoking was associated with increases in cancers of the lip, throat, tongue and esophagus.

So correlation by itself does not imply causation. But when one gathers other evidence, that requires separate studies and analysis, correlation becomes one of the fundamental pieces of evidence that establish causality.

Summary

Like I wrote previously, research isn’t easy. One just can’t state that they see an observed correlation, then immediately state that one causes the other. They can’t see an increase in the autism rate, along side an increase in the number of vaccinations, then state, after looking at those numbers for an hour, that vaccines cause autism. You can’t without further, more complex, data that supports the hypothesis.

Does one need each of those seven additional data points to show causality? Yes (although 4&5 can be combined). Again, those who try to oversimplify the process are the ones with the agenda. Those who try to make it easy are the ones who a trying to find data that proves their dogma and beliefs, rather than trying to determine what the data actually states. The data should drive the conclusion, as opposed to taking the easy course–searching for data to establish a preconceived conclusion.

Research is hard work. And if a researcher, or some random person on the internet, wants to establish causation from correlation, then they need to provide a lot more evidence. It’s not easy, but it can be done.


Related Solutions

For each of the following pairs of variables, would you expect a strong negative/positive correlation, a...
For each of the following pairs of variables, would you expect a strong negative/positive correlation, a moderate negative/positive correlation, a weak negative/positive correlation, other association or scattered. 1. The age of a used car and it’s price. 2. The weight of a new car and it’s overall miles per gallon rating. 3. The height of a person and the height of the persons father. 4. The height and IQ of a person.
a. Draw a scatter plot. Is this a positive or negative correlation? Explain what a positive correlation and negative correlation are. b.Compute the pearson correlation.
 For the following scores, x y 0 4 2 9 1 6 1 9 a. Draw a scatter plot. Is this a positive or negative correlation? Explain what a positive correlation and negative correlation are. b.Compute the pearson correlation.
What’s an example of a perfect positive and a perfect negative correlation.
What’s an example of a perfect positive and a perfect negative correlation.
State whether each of the following is an example of a positive correlation or a negative...
State whether each of the following is an example of a positive correlation or a negative correlation. a. Higher education level is associated with a larger annual income. b. Increased testosterone is associated with increased aggression. c. The smaller the class size, the more students believe they are receiving a quality education. d. Rising prices of apples are associated with the sale of fewer apples.
A) Differentiate between perfect negative correlation and perfect positive correlation. B) Explain the concept of diversification...
A) Differentiate between perfect negative correlation and perfect positive correlation. B) Explain the concept of diversification in investment portfolio. C) What are the two (2) advantages of Payback Period? D) Amaryllis Incorporated is attempting to evaluate the feasibility of investing RM95,000 in a piece of equipment that has a 5-year life. The firm has estimated cash inflows associated with the proposal as shown in the following table. The firm has a 12% cost of capital.
A) Differentiate between perfect negative correlation and perfect positive correlation. B) Explain the concept of diversification...
A) Differentiate between perfect negative correlation and perfect positive correlation. B) Explain the concept of diversification in investment portfolio. C) What are the two (2) advantages of Payback Period?
You wish to determine if there is a negative linear correlation between the two variables at...
You wish to determine if there is a negative linear correlation between the two variables at a significance level of α=0.01α=0.01. You have the following bivariate data set. x y 67.2 118 53.8 131.4 68.2 321.5 60.5 -67.4 58.2 154.3 60.5 230.3 41.2 157.7 105.5 91 72.8 -80 80.9 -105.7 79.6 125.1 81.7 -58.5 67.5 70.3 48.5 -81 17.9 184.8 57.4 149.2 59 30.8 43.1 104.9 63.9 -75 81.4 -1.2 50.5 -58.2 90.9 -27.9 77 115.3 54.2 171.7 78.9 208.3...
What are the properties of a correlation? What is a “negative” relation and what is a “positive” relation?
What are the properties of a correlation? What is a “negative” relation and what is a “positive” relation? Give examples of each using real variables.
CHAPTER 6: CORRELATION Key Terms ---------------------------------------------------------------------------------------------------------------------------- Positive relationship --- Occurs in so far as pairs of...
CHAPTER 6: CORRELATION Key Terms ---------------------------------------------------------------------------------------------------------------------------- Positive relationship --- Occurs in so far as pairs of observations tend to occupy similar relative positions in their respective distribution. Negative relationship --- Occurs in so far as pairs of observations tend to occupy dissimilar relative positions in their respective distribution. Scatterplot --- a graph containing a cluster of dots that represents all pairs of observations. Person correlation coefficient --- A number between –1 and +1 that describes the linear relationship between pairs...
a. Calculate the covariance between variables X and Y. Is it a positive or negative relationship between the two variables?
Observation x y 1 -22 22 2 -33 49 3 2 8 4 29 -16 5 -13 10 6 21 -28 7 -13 27 8 -23 35 9 14 -5 10 3 -3 11 -37 48 12 34 -29 13 9 -18 14 -33 31 15 20 -16 16 -3 14 17 -15 18 18 12 17 19 -20 -11 20 -7 -22 Answer the following questions a. Calculate the covariance between variables X and Y. Is it a positive...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT