In: Statistics and Probability
A friend of yours is analyzing the relationships among a large number of variables. He's decided that the best strategy is to compute correlation coefficients among all of them and see which relationships appear to be significant. Advise him on issues he must consider when using the correlation coefficient to describe the relationships between variables.
Please don't hesitate to give a "thumbs up" for the
answer in case the answer has helped you
The following points should be taken care of while considering the correlation coefficient to describe the relationship between variables:
1. Check the type of the relation - is it linear, quadratic or any kind of non-linear. Linear correlation will only makes sense when the relation between the 2 variables is linear, otherwise correlation doesn't make sense
2. The correlation should be also followed up with a Significance test - to test whether the correlation itself is statistically siginficant or not
3. Check the magnitude of correlation - if they are close to 0 or if the scatter plot doesn't have a trend then no correlation exists. If it is close to 1 or -1 then its a strong correlation
4. Check the direction of correlation - If the correlation is negative then inverse relation exists. If it is positive then the 2 vairables will change in the same direction
5. Correlation doesn't imply causality - Very important. If X and Y are highly correlated, we shouldn't conclude that increasing Y causes X to increase - because casusality is not implied in correlation. A latent or lurking variable may be response for a "spurious" correlation.