In: Statistics and Probability
One way to understand the strength of the correlation coefficient is to take the absolute value and then see how close it is to 1. The reason that makes it easier for some people to understand is that they get confused by the negative sign. The closer it is to -1 or +1, the stronger the correlation, as it doesn't consider the direction of the correlation. However, many students get confused by the negative aspect and assume that a positive correlation coefficient is stronger. Therefore, if you refer to the absolute value, that will eliminate any confusion regarding the negative or positive sign. On the topic of a representative sample, it's debatable how many data points you actually need. At what point do you think it's necessary to look at every data point, compared to a representative sample? When would you feel comfortable that your sample truly represents the population?
Let us consider the definition of the sample correlation coefficient, in way that where the strength of the relationship play a key role, since it is defined as the ratio of the covariance of the x and y divided by the sample standard deviation of x times the sample standard deviation of y, numerator in this proportion is the covariance of x and y. The positive and negative sign is arising from the sample standard deviations. What does it mean that if either of the sample standard deviations is negative, then the x and y varies linearly but in the sum of the difference of the mean and the each x values is below the average of x values. It does not convey information regarding the relationship. Absolute value shows the symmetrical behavior of the y with respect to x in minus or plus side of the spread about the mean. But in our definition we need the proportion of the variation in both x and y over the product of the individual sample standard deviations. Taking absolute value of this ratio does not imply the existence of the data points on either of the real number line, so we do not take the absolute value of the sample correlation coefficient.
Our main concern here is to study the data points variations about the mean, and the parameter mean is already taken, that is for the given mean value, the variation of the x or y is measured, we need to calculate the sample standard deviation with dividing the (n-1), but not n alone. Increasing or decreasing the number of trials of a experiments will definitely have direct impact on the correlation coefficient. It would be better if there are more data points to measure the correlation coefficient but how minimum he sample size is sufficient to reach the stability of the frequency of the event of interest can be mathematically calculated for the known population parameters.