In: Math
Three statistics students are having a discussion about selecting the appropriate distribution for a data set. Explain why you agree or disagree with each student and give your own suggestion for the approach the students should take.
Maya: Maya argues that since the students don’t know what the population data looks like they should simply use the sample probability mass distribution as their population mass distributions.
Greg: Greg says that the sample probability mass distribution is oddly shaped and will almost certainly not be the same as the population mass distribution function. He suggests that it’s best to find a match from the common probability mass functions that the students know about.
Jane: Jane argues that both Greg and Maya’s approach could introduce unknown error into the analysis that they are performing. She reasons that as long as there is going to be error, the students should try both approaches and choose the one that produces the results that they would most like to see.
We should agree with Jane and Maya and Greg.
If we consider Maya's opinion to use sample probability function as the population function, then, though according to the theorems, empirical cdf converges to the orifinal cdf, but in small sample that may not be true so we may exclude some regions from the sample space.
Again, if we include Greg's opinion, though it will be good to chrck for the known distribution functions, but if the sample is large and it has no known functions, then the cumulative distribution function of the sample will lead to the original population function but it may not be some known function. The it will be bad to consider only the known function.
For this reason we always go with Jane's opinion to check for both sample distribution function and known functions to see which one suits better and where. For example, if the population follows a mixed distribution of a known distribution function and an unknown distribution function, then considering both type of functions will always give us good results.