In: Statistics and Probability
Discuss in detail why determining true cause and effect is difficult and why it is important. Also discuss how the results of a stsitical test can be significant without there being
1. Association between the two variables
2. A true cause and effect relationship between the two variables
1.suppose we observe that people who daily drink more than 4 cups of coffee have a decreased chance of developing skin cancer. This does not necessarily mean that coffee confers resistance to cancer; one alternative explanation would be that people who drink a lot of coffee work indoors for long hours and thus have little exposure to the sun, a known risk. If this is the case, then the number of hours spent outdoors is a confounding variable—a cause common to both observations. In such a situation, a direct causal link cannot be inferred; the association merely suggests a hypothesis, such as a common cause, but does not offer proof. In addition, when many variables in complex systems are studied, spurious associations can arise. Thus, association does not imply causation.
2. If your data have a correlation coefficient of +1 or -1, it is important to note that correlation still does not imply causality. For instance, a scatterplot of popsicle sales and skateboard accidents in a neighborhood may look like a straight line and give you a correlation coefficient of 0.9999...but buying popsicles clearly doesn't cause skateboard accidents. However, more people ride skateboards and more people buy popsicles in hot weather, which is the reason these two factors are correlated.
It is also important to note that the correlation coefficient only measures linear relationships. A meaningful nonlinear relationship may exist even if the correlation coefficient is 0.