In: Statistics and Probability
What are confounding variables, and what effect do they have on assessing cause-and-effect relationships?
When would you prefer median to mean as a measure of central tendency?
Why don’t we just sum the deviations from the mean to measure dispersion of a variable?
When is it legitimate to use the empirical rule?
How would you go about identifying outliers in your data?What would you do if you found an outlier?
(a) Confounding variable is an outside influence that changes the effect of a dependent and independent variable. On assessing cause-and effect relationships, confiunding variables systematically inflences the independent variable and also influences the dependent variable, thus making the analysis meaningless.
(b) When there are outliers in the data set, median is prefereed as a measure of Central Tendency in place of mean because mean is affected by outliers, whereas median is less affected by outliers.
(c) If we just sum the deviations from the mean, we will get the result as 0. So, it cannot be used to measure dispersion of a variable.
(d) Only for Normal Distributions, it is legitimate to use the empirical rule.
(e)
Outlier on Lower side:
Data points less than Q1 - (1.5 IQR)
Outliers on Upper side:
Data points greater than Q3 + (1.5 IQR),
where
Q1 = First Quartile
Q3 + third Quartile
IQR = Inter Quartile Range
If an outlier is found in a data set, find out any assignable cause is present. It so, change the settings to remove the cause and then we can proceed with the calculations omitting the outlier. If no assignable cause can be found, then the deviations are inherent in the system and the application should use a classification algorithm that is robust to outliers.