In: Statistics and Probability
Why in the formalism of the standard deviation of the population we divide by n and for the standard deviation of the sample we divide by n-1?
I will give you an intuitive explanation as to what is happening when we calculate the standard deviation of the sample.
Standard deviation is a measure of the spread of the data, which means it tells us that how much the data is spread around on both sides around the mean value. The smaller the value of the standard deviation, it means that the data is more concentrated near the mean and less spread out. Conversely, if the value of the standard deviation is larger, this means that the data is more spread out and less concentrated arounfd the mean value.
A sample must be such that it should be a true representative of the population from which it is taken. So all the characteristics of the sample must also represent the characteristics of the population as accurately as possible.
When you take a sample from a population, all the sample values generally will lie closer to the sample mean than to the population mean. So when you measure the deviation of the data points of the sample from the sample mean, it can not be used to represent the standard deviation of the population, because it underestimates the population standard deviation. The population standard deviation is slightly larger than the sample deviation. So in order to correct for that, you make the sample standard deviation a bit bigger by dividing with a smaller term, that is, (n-1). This makes the sample deviation value a bit larger and a more accurate representation of the standard deviation of the whole population.
That's the whole concept behind using (n-1) in place of 'n' for calculating the standard deviation of the sample.