In: Statistics and Probability
1. Direct and Inverse Relationships and examples.
2. Association and Causation in terms of relationships. In particular, does correlation imply causation?
3. Spurious correlation and examples.
4. Measuring correlation - correlation coefficient r and its interpretation.
Direct Relationship: This is where two variables do the same
thing. If one increases, the other
increases. If one decreases, the other decreases.
Inverse Relationship: This is where two variables do the
opposite thing. If one increases, the
other decreases.
A direct relationship looks like
An inverse relationship looks like
Direct Relationships are written as A = kB where k is a nonzero
constant.
Inverse Relationships are written as A =
k/B
where k is a nonzero constant.
Direct Relationship Examples:
Radius of a circle and its area are directly related. If the radius
increases, the area will get bigger. If the radius
decreases, the area will be smaller.
Pay and the number of hours worked are a direct relationship. If
the hours worked increase, pay will increase.
If the hours worked decrease, pay will decrease.
Inverse Relationship Examples:
Speed and the time it takes to travel are inversely related. As you
increase your speed, the travel time
decreases. As you decrease your speed, the travel time
increases.
The Law of Supply and Demand is an inverse relationship. As the
demand of a product increases, its supply
will decrease. As the supply of a product increases, the amount of
demand will decrease.
Association
When two variables are related, we say that there is association
between them.
When researchers find a correlation, which can also be called an
association, what they are saying is that they found a relationship
between two, or more, variables.
Causation
One variable has a direct influence on the other, this is called a
causal relationship.
Causality can only be determined by reasoning about how the data
were collected.
The data values themselves contain no information that can help you
to decide.
If two variables are causally related, it is possible to conclude
that changes to the explanatory variable, X, will have a direct
impact on Y.If one variable causally affects the other, then
adjusting the value of that variable will cause the other to
change.Obviously, it is much more difficult to prove causation than
it is to prove an association.
“Correlation Does Not Imply Causation"
It seems pretty self-explanatory, but it's not always easy to
understand exactly what this phrase means until you examine it
carefully. First of all, it is important to understand what a
correlation is and what a causation is. A correlation is a mutual
relationship or a connection between two variables. Causation is
the relationship between cause and effect. So, when a cause results
in an effect, that's a causation. In other words,
correlationbetween two events or variables simply indicates that a
relationship exists, whereas causation is more specific and says
that one event actually causes the other.
When we say that correlation does not imply cause, we mean that
just because you can see a connection or a mutual relationship
between two variables, it doesn't necessarily mean that one causes
the other. Of course, it might be the case that one event or
variable causes the other, but we can't know that by looking at the
correlation alone. More research would be necessary before that
conclusion could be reached.
Spurious Correlation
In statistics, a spurious correlation, or spuriousness, refers to a connection between two variables that appears causal but is not. Spurious relationships often have the appearance of one variable affecting another. This spurious correlation is often caused by a third factor that is not apparent at the time of examination, sometimes called a confounding factor.
Example of Spurious Correlations
It is not too challenging to discover interesting correlations. Many will turn out to be spurious, though. For the male species on Wall Street, two popular spurious correlations involve women and sports. Originating in the 1920s is the skirt length theory, which holds that skirt lengths and stock market direction are correlated. If skirt lengths are long, that means the stock market is going down; if they are short, the market is going up. Around late January there is talk about the so-called Super Bowl indicator, which suggests that a win by the AFC team likely means that the stock market will go down in the coming year, whereas a victory by the NFC team portends a rise in the market. Since 1966, the indicator has had an accuracy rate of 80%. It is a fun conversation piece but probably not something a serious financial advisor would recommend as an investment strategy for clients.
Pearson r correlation: Pearson r correlation is the most widely used correlation statistic to measure the degree of the relationship between linearly related variables. For example, in the stock market, if we want to measure how two stocks are related to each other, Pearson r correlation is used to measure the degree of relationship between the two. The point-biserial correlation is conducted with the Pearson correlation formula except that one of the variables is dichotomous. The following formula is used to calculate the Pearson r correlation:
rxy=Cov(x,y)/(Var(x)Var(y))
rxy = Pearson r correlation coefficient between x and y
n = number of observations
xi = value of x (for ith observation)
yi = value of y (for ith observation)
Interpretation:
Pearson’s correlation coefficient is represented by r for a sample statistic. This correlation coefficient is a single number that measures both the strength and direction of the linear relationship between two
continuous variables. Values can range from -1 to +1.
Strength: The greater the absolute value of the correlation coefficient, the stronger the relationship.
The extreme values of -1 and 1 indicate a perfectly linear relationship where a change in one variable is accompanied by a perfectly consistent change in the other. For these relationships, all of the data points fall on a line. In practice, you won’t see either type of perfect relationship.
A coefficient of zero represents no linear relationship. As one variable increases, there is no tendency in the other variable to either increase or decrease.
When the value is in-between 0 and +1/-1, there is a relationship, but the points don’t all fall on a line. As r approaches -1 or 1, the strength of the relationship increases and the data points tend to fall closer to a line.
Direction: The sign of the correlation coefficient represents the direction of the relationship.
Positive coefficients indicate that when the value of one variable increases, the value of the other variable also tends to increase. Positive relationships produce an upward slope on a scatterplot.
Negative coefficients represent cases when the value of one variable increases, the value of the other variable tends to decrease. Negative relationships produce a downward slope.