In: Statistics and Probability
1. What is meant when we say that two variables have a strong positive (or negative) linear correlation? Is it possible that two variables could be strongly related but have a low linear correlation? Can you give an example?
2. Give a very general description of how the least-squares criterion is involved in the construction of the least squares line.
1) The correlation coefficient (ρ) is a measure that determines
the degree to which two variables' movements are associated. The
most common correlation coefficient, generated by the Pearson
product-moment correlation, may be used to measure the linear
relationship between two variables. However, in a non-linear
relationship, this correlation coefficient may not always be a
suitable measure of dependence.
Correlation coefficients are used to measure the strength of the
relationship between two variables.
Positive correlation is a relationship between two variables in
which both variables move in tandem—that is, in the same
direction.
Negative correlation or inverse correlation is a relationship
between two variables whereby they move in opposite
directions.
Negative correlation is a key concept in portfolio construction, as
it enables the creation of diversified portfolios that can better
withstand portfolio volatility and smooth out returns.
INVESTING FUNDAMENTAL ANALYSIS
What Does it Mean if the Correlation Coefficient is Positive,
Negative, or Zero?
FACEBOOK
TWITTER
LINKEDIN
By STEVEN NICKOLAS
Updated Dec 7, 2019
TABLE OF CONTENTS
EXPAND
Understanding Correlation
Calculating ρ
Positive Correlation
Negative Correlation
The Bottom Line
The correlation coefficient (ρ) is a measure that determines the
degree to which two variables' movements are associated. The most
common correlation coefficient, generated by the Pearson
product-moment correlation, may be used to measure the linear
relationship between two variables. However, in a non-linear
relationship, this correlation coefficient may not always be a
suitable measure of dependence.
KEY TAKEAWAYS
Correlation coefficients are used to measure the strength of the
relationship between two variables.
Positive correlation is a relationship between two variables in
which both variables move in tandem—that is, in the same
direction.
Negative correlation or inverse correlation is a relationship
between two variables whereby they move in opposite
directions.
Negative correlation is a key concept in portfolio construction, as
it enables the creation of diversified portfolios that can better
withstand portfolio volatility and smooth out returns.
Understanding Correlation
The range of values for the correlation coefficient is -1.0 to 1.0.
In other words, the values cannot exceed 1.0 or be less than -1.0
whereby a correlation of -1.0 indicates a perfect negative
correlation, and a correlation of 1.0 indicates a perfect positive
correlation. Anytime the correlation coefficient, denoted as r, is
greater than zero, it's a positive relationship. Conversely,
anytime the value is less than zero, it's a negative relationship.
A value of zero indicates that there is no relationship between the
two variables.
If the correlation coefficient of two variables is zero, it
signifies that there is no linear relationship between the
variables. However, this is only for a linear relationship; it is
possible that the variables have a strong curvilinear
relationship.
When the value of ρ is close to zero, generally between -0.1 and
+0.1, the variables are said to have no linear relationship or a
very weak linear relationship. For example, suppose the prices of
coffee and of computers are observed and found to have a
correlation of +.0008; this means that there is no correlation, or
relationship, between the two variables.
Positive Correlation
A positive correlation, when the correlation coefficient is greater
than 0, signifies that both variables move in the same direction or
are correlated. When ρ is +1, it signifies that the two variables
being compared have a perfect positive relationship; when one
variable moves higher or lower, the other variable moves in the
same direction with the same magnitude.
The closer the value of ρ is to +1, the stronger the linear
relationship. For example, suppose the value of oil prices are
directly related to the prices of airplane tickets, with a
correlation coefficient of +0.8. The relationship between oil
prices and airfares has a very strong positive correlation since
the value is close to +1. So if the price of oil decreases,
airfares follow in tandem. If the price of oil increases, so does
the prices of airplane tickets.
Negative Correlation
A negative (inverse) correlation occurs when the correlation
coefficient is less than 0 and indicates that both variables move
in the opposite direction. In short, any reading between 0 and -1
means that the two securities move in opposite directions. When ρ
is -1, the relationship is said to be perfectly negative
correlated; in short, if one variable increases, the other variable
decreases with the same magnitude, and vice versa. However, the
degree to which two securities are negatively correlated might vary
over time and are almost never exactly correlated, all the
time.
For example, suppose a study is conducted to assess the
relationship between outside temperature and heating bills. The
study concludes that there is a negative correlation between the
prices of heating bills and the outdoor temperature. The
correlation coefficient is calculated to be -0.96. This strong
negative correlation signifies that as the temperature decreases
outside, the prices of heating bills increase and vice versa.
The list below shows what different correlation coefficient values
indicate:
Exactly –1. A perfect negative (downward sloping) linear
relationship
–0.70. A strong negative (downward sloping) linear
relationship
–0.50. A moderate negative (downhill sloping) relationship
–0.30. A weak negative (downhill sloping) linear relationship
0. No linear relationship
+0.30. A weak positive (upward sloping) linear relationship
+0.50. A moderate positive (upward sloping) linear
relationship
+0.70. A strong positive (upward sloping) linear relationship
Exactly +1. A perfect positive (upward sloping) linear
relationship
2) A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property which may or may not be desirable depending on the problem at hand.
The linear least squares fitting technique is the simplest and most commonly applied form of linear regression and provides a solution to the problem of finding the best fitting straight line through a set of points. In fact, if the functional relationship between the two quantities being graphed is known to within additive or multiplicative constants, it is common practice to transform the data in such a way that the resulting line is a straight line, say by plotting T vs. sqrt(l) instead of T vs. l in the case of analyzing the period T of a pendulum as a function of its length l. For this reason, standard forms for exponential, logarithmic, and power laws are often explicitly computed. The formulas for linear least squares fitting were independently derived by Gauss and Legendre.
For nonlinear least squares fitting to a number of unknown parameters, linear least squares fitting may be applied iteratively to a linearized form of the function until convergence is achieved. However, it is often also possible to linearize a nonlinear function at the outset and still use linear methods for determining fit parameters without resorting to iterative procedures. This approach does commonly violate the implicit assumption that the distribution of errors is normal, but often still gives acceptable results using normal equations, a pseudoinverse, etc. Depending on the type of fit and initial parameters chosen, the nonlinear fit may have good or poor convergence properties. If uncertainties (in the most general case, error ellipses) are given for the points, points can be weighted differently in order to give the high-quality points more weight.
Vertical least squares fitting proceeds by finding the sum of the squares of the vertical deviations R^2 of a set of n data points
from a function . Note that this procedure does not minimize the actual deviations from the line (which would be measured perpendicular to the given function). In addition, although the unsquared sum of distances might seem a more appropriate quantity to minimize, use of the absolute value results in discontinuous derivatives which cannot be treated analytically. The square deviations from each point are therefore summed, and the resulting residual is then minimized to find the best fit line. This procedure results in outlying points being given disproportionately large weighting.
The condition for to be a minimum is that
for , ..., . For a linear fit,
so
These lead to the equations
In matrix form,
so
The matrix inverse is
so
These can be rewritten in a simpler form by defining the sums of squares which are also written as Here, is the covariance and and are variances. Note that the quantities and can also be interpreted as the dot products In terms of the sums of squares, the regression coefficients is given by and is given in terms of The overall quality of the fit is then parameterized in terms of a quantity known as the correlation coefficient, defined by which gives the proportion of which is accounted for by the regression. Let be the vertical coordinate of the best-fit line with -coordinate , so then the error between the actual vertical point and the fitted point is given by Now define as an estimator for the variance in , Then can be given by The standard error for and are |