In: Statistics and Probability
After examining these data for all the jurisdictions, someone notes that certain areas have an unusually high “percent of 18-64 yr-olds with no high school diploma.” Based on this finding, this individual concludes that the high percentages are due to the rising population of immigrants in those areas. Further, the individual argues that any estimates of the associated “percent of low-income working families” in those areas should be recalculated after removing this sub-population from the data set, as they are causing the area to “look bad”. In addition to thinking critically, use the key rules about linear regression and extrapolation to write a statistically appropriate and socially responsible response to the individual’s conclusion and argument.
Statistically speaking an outlier is removed only when the data values are impossible or incorrect . In this case the data values seems to be influenced by immigrants as claimed by the individual however they are not incorrect./impossible values hence the data should not be removed ; instead an appropiate study should be initiated which measures the income and education values with & without the immigrants data followed by statistical tests if the hypothesis made by the individual is indeed true and significant.
If the results are found to be true and signifcant, than special mention should be made about this group and the need to formulate appropiate policies for the upliftment of such immigrant groups instead of hiding such data from the viewers which defeats the entire purpose of data collection excercise which is to identify pain points in our demograpic areas and report them to policymakers