In: Math
Briefly explain specifically why categorical and quantitative variables require different methods in order to describe their distributions.
A correct answer will accurately address BOTH types of variables and clearly explain the REASONS these two types of variables REQUIRE different methods.
Categorical and Quantitative variables require different methods in order to describe their distribution.
Why?
Hmm, let’s think. Categorical variable is represented by probability mass function which represent the probability that the variable will take that value. Since the categorical values can ONLY take their values in countable set, we can have this luxury. For a set of points, we can add the probability values to get the probability that variable will take the value in that set! So since we have luxury of individual probability, we represent the distribution by probability mass function. E.g. Rolling of a die, we say 1/6 probability for all 1,2,...,6 which means the probability that 4 will come on a die IS 1/6. The values can be summed countably!
Quantitative variable is whole different game though! They might take values in an uncountable set. So we CAN’T have the non-zero probability of each point otherwise the sum of probability will exceed 1! Also we don’t know how to add in a arbitrary set! (If we can’t count the numbers, we can’t add it!). Hence we define the distribution as probability density function. It is interpreted as if the pdf is high in a region, then the probability of variable taking values in that region is high! Although a single point or countable collection of points still have probability zero, we have probability of an interval which is basically the integral of PDF in that interval. That’s why we only define PDF in case of quantitative variable!
E.g. Uniform in 0-1. We can’t have non-zero probability of every point since we have uncountable number of points! So in order to say every point is equally likely we say the pdf is constant. The PDF integral over Real line must be 1.
We can’t always define the probability function for each set in Power set of sample space. In case of categorical variable, we CAN and we DO. But in case of quantitative variable, we CAN’T.
You’ll know more about this in measure theory course!!