Question

In: Computer Science

Which of the four methods for handling missing data would tend to lead to an underestimate of the spread (e.g., standard deviation) of the variable?


5. Which of the four methods for handling missing data would tend to lead to an underestimate of the spread (e.g., standard deviation) of the variable? What are some benefits to this method?

6. Calculate the mean, median, and mode (in dollars) (10,7,20,12,75,15,9,18,4,12,8,14)

Solutions

Expert Solution

5) Which of the four methods for handling missing data would tend to lead to an underestimate of the spread (e.g., standard deviation) of the variable? What are some benefits to this method?

1. Listwise or case deletion = Listwise deletion is the most frequently used method in handling missing data, and thus has become the default option for analysis in most statistical software packages. Some researchers insist that it may introduce bias in the estimation of the parameters. However, if the assumption of MCAR is satisfied, a listwise deletion is known to produce unbiased estimates and conservative results. When the data do not fulfill the assumption of MCAR, listwise deletion may cause bias in the estimates of the parameters.

2. Pairwise deletion = Pairwise deletion eliminates information only when the particular data-point needed to test a particular assumption is missing. If there is missing data elsewhere in the data set, the existing values are used in the statistical testing. Since a pairwise deletion uses all information observed, it preserves more information than the listwise deletion, which may delete the case with any missing data.

3. Mean substitution = In a mean substitution, the mean value of a variable is used in place of the missing data value for that same variable. This allows the researchers to utilize the collected data in an incomplete dataset. The theoretical background of the mean substitution is that the mean is a reasonable estimate for a randomly selected observation from a normal distribution

4. Regression imputation = In regression imputation, the existing variables are used to make a prediction, and then the predicted value is substituted as if an actual obtained value. This approach has a number of advantages, because the imputation retains a great deal of data over the listwise or pairwise deletion and avoids significantly altering the standard deviation or the shape of the distribution. However, as in a mean substitution, while a regression imputation substitutes a value that is predicted from other variables, no novel information is added, while the sample size has been increased and the standard error is reduced.

6) Calculate the mean, median, and mode (in dollars) (10,7,20,12,75,15,9,18,4,12,8,14)

The Mean Is: 17$
The Median Is: 12$
The Mode Is: 12$


Related Solutions

We have seen that the standard deviation σ measures the spread of a data set about...
We have seen that the standard deviation σ measures the spread of a data set about the mean μ. Chebyshev's inequality gives an estimate of how well the standard deviation measures that spread. One consequence of this inequality is that for every data set at least 75% of the data points lie within two standard deviations of the mean, that is, between μ − 2σ and μ + 2σ (inclusive). For example, if μ = 20 and σ = 5,...
What is an example of when you might want a large standard deviation? That is, data is spread out?
What is an example of when you might want a large standard deviation? That is, data is spread out?
Which normal distribution has a wider​ spread: the one with mean 4 and standard deviation 9...
Which normal distribution has a wider​ spread: the one with mean 4 and standard deviation 9 or the one with mean 9 and standard deviation 4​?
What measure of the center and spread should be used for the Pulse Rate data? The mean and standard deviation, or the Median and the IQR? Explain,
StatisticsVariableMeanStDevMinimumQ1MedianQ3MaximumRangeIQRPulse Rate (bpm)77.1012.6860.0068.0076.0080.00124.0064.0012.00What measure of the center and spread should be used for the Pulse Rate data? The mean and standard deviation, or the Median and the IQR? Explain,
If data set A has a larger standard deviation than data set B, what would be...
If data set A has a larger standard deviation than data set B, what would be different about their distributions?
A statistical practitioner determined that the mean and standard deviation of a data set which is...
A statistical practitioner determined that the mean and standard deviation of a data set which is symmetrical and normal (bell-shaped) were 120 and 10, respectively. What can you say about the proportions of observations that lie between each of the following intervals? a.   90 and 150 b. 100 and 140 c. 110 and 150
Analyze the data set (e.g. mean, standard deviation, scatterplot, histogram, bar chart, etc.)and discuss important findings....
Analyze the data set (e.g. mean, standard deviation, scatterplot, histogram, bar chart, etc.)and discuss important findings. Suggest courses of action related to the given situation. Variable Names: 1. VOL: Cubic feet of cab space 2. HP: Engine horsepower 3. MPG: Average miles per gallon 4. SP: Top speed (mph) 5. WT: Vehicle weight (100 lb) MAKE / MODEL VOL HP MPG SP WT GM/GeoMetroXF1 89 49 65.4 96 17.5 GM/GeoMetro 92 55 56 97 20 GM/GeoMetroLSI 92 55 55.9 97...
Analyze the data set (e.g. mean, standard deviation, scatterplot, histogram, bar chart, etc.)and discuss important findings.
  Analyze the data set (e.g. mean, standard deviation, scatterplot, histogram, bar chart, etc.)and discuss important findings. Suggest courses of action related to the given situation. Variable Names:           1. VOL: Cubic feet of cab space           2. HP: Engine horsepower           3. MPG: Average miles per gallon           4. SP: Top speed (mph)           5. WT: Vehicle weight (100...
Analyze the data set (e.g. mean, standard deviation, scatterplot, histogram, bar chart, etc.)and discuss important findings....
Analyze the data set (e.g. mean, standard deviation, scatterplot, histogram, bar chart, etc.)and discuss important findings. Suggest courses of action related to the given situation. Variable Names: 1. VOL: Cubic feet of cab space 2. HP: Engine horsepower 3. MPG: Average miles per gallon 4. SP: Top speed (mph) 5. WT: Vehicle weight (100 lb) MAKE / MODEL VOL HP MPG SP WT GM/GeoMetroXF1 89 49 65.4 96 17.5 GM/GeoMetro 92 55 56 97 20 GM/GeoMetroLSI 92 55 55.9 97...
The expected return and standard deviation of return for four securities are listed below. Which security...
The expected return and standard deviation of return for four securities are listed below. Which security is the least risky? A B C D Expected Return 15% 12% 18% 8% Standard Deviation 14% 14% 18% 10% A. Security C. B. Security A. C. Security B. D. Security D.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT