In: Statistics and Probability
(a) Does a high value of ? 2 in a simple regression model imply that the two variables are causally related? Explain.
(b) Compare and contrast mean and median as measurements of the center of a distribution. Under what circumstances, we should use one instead of the other as the measurement of the center of a distribution? Explain.
a) R is the called the correlation coefficient in simple regression which can be either positive or negative and it is a measure of linear association between the variables considered.
A high value of R^2 tells us that the two variables are strongly associated but it does not Indicate a cause and effect (causation). The variables may exhibit strong R^2 value because both are increasing/decreasing with time but jumping to a cause and effect conclusion that a change in one variable causes a change in the other due to this is incorrect.
b) Both mean and median are measures of location of a dataset.
Mean is the average of all the values given.
Median is the middlesmost value of the datset.
Since mean is the average it will be affected by outliers but median will not be affected by outliers, so in a case where outliers exists, median should be used.
In cases where outliers (extreme values compared to most of the data in the dataset) do not exists, mean should be used.