Question

In: Statistics and Probability

When calculating variance, we square the difference between each data point and the mean in a set of numbers. Why do we do this?

When calculating variance, we square the difference between each data point and the mean in a set of numbers. Why do we do this?

A. The deviation scores will sum to zero and cancel each other out if they are not squared first.

B. We need to even out the numbers to make them easier to handle.

C.Squaring the difference between a data point and the mean is the way to calculate deviance.

D. Larger numbers are easier to calculate than smaller numbers.


Solutions

Expert Solution

The variance is a squared value because it's convenient. To calculate it, you first determine the mean of a data distribution, then figure how far each data point is from that mean, and use positive for the right of the mean and negative for those left of the mean.

Add them all together and divide by the number of values you have, and you supposedly have a measure of distance from the mean, or "spread".

But here a problem arises. If you have a huge number of data points right and left, and they all carry their signs, when you add them all together, you get zero every time.

So, we perform a mathematical trick that works out nicely: we square each value before adding them together, and that blows away the negative signs. Just using the absolute values might seem like a simpler solution, but it's not. Squaring the values actually produces a more precise and useful answer.

The benefits of squaring include:

· Squaring always gives a positive value, so the sum will not be zero.

· Squaring emphasizes larger differences and help in analyzing the smaller difference we cannot see without squaring.

Hence option A is the most appropriate and strong reason for squaring.


Related Solutions

Explain the difference between a training set and a testing set. Why do we need to...
Explain the difference between a training set and a testing set. Why do we need to differentiate them? Can the same set be used for both purposes? Why or why not? explain with your own words please
What is the difference between a suspicious data point and an extreme data point?
What is the difference between a suspicious data point and an extreme data point?
10.2 Suppose we a data set where each data point represents a single student's scores on...
10.2 Suppose we a data set where each data point represents a single student's scores on a math test, a physics test, a reading comprehension test, and a vocabulary test. We find the first two principal components, which capture 90% of the variability in the data, and interpret their loadings. We conclude that the first principal component represents overall academic ability, and the second represents a contrast between quantitative ability and verbal ability. What loadings would be consistent with that...
Why is the Mean Square due to Error a better estimate of the population variance than...
Why is the Mean Square due to Error a better estimate of the population variance than the Mean Square due to Treatment? When is the Mean Square due to Treatment also a good estimate for the population variance? Why?
How do you graph outliers on a box plot when given a data set of numbers?...
How do you graph outliers on a box plot when given a data set of numbers? I found the median, lower and upper quartile numbers and have already plotted that but how do you plot outliers? Lets say the data is :1 2 3 4 5 6 7 8 9
What is the difference between ordinal data and ratio data? What is the variance for problem...
What is the difference between ordinal data and ratio data? What is the variance for problem number 3? #3 Identify the mode and median of the following data. Compute the mean, range and standard deviation as well. 3 place decimals please!! 18, 20, 19, 22, 20 25 Points – Mode, median and range are 3 points each, mean is 6 points, and standard deviation is 10 points
Suppose we are interested in whether there is a difference between the median numbers of hours...
Suppose we are interested in whether there is a difference between the median numbers of hours spent each week by men and woman watching television. Two random samples were taken: the numbers of hours taken by men and women watching television are shown below. You are about to test the null hypothesis that there is no difference in weekly television watching hours between men and woman. Men: 5 10 12 15; Women: 0 7 4 3 6 8 a) calculate...
Suppose we are interested in whether there is a difference between the median numbers of hours...
Suppose we are interested in whether there is a difference between the median numbers of hours spent each week by men and woman watching television. Two random samples were taken: the numbers of hours taken by men and women watching television are shown below. You are about to test the null hypothesis that there is no difference in weekly television watching hours between men and woman. Men: 5 10 12 15; Women: 0 7 4 3 6 8 a) calculate...
Given the data set below, calculate the range, mean, variance, and standard deviation
Given the data set below, calculate the range, mean, variance, and standard deviation. 19,  33,  8,  29,  18,  5, 10, 14,  25 Range = Mean = Variance = Standard deviation =
We have learned hypothesis tests for the mean (when population variance is known and when it...
We have learned hypothesis tests for the mean (when population variance is known and when it is unknown) for a percentage to see if the means of two sets of data are the same goodness of fit test test for independence For each type give a brief example. You do not have to solve the problem you give. Try to come up with a problem on your own
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT