Question

In: Math

Three statistics students are having a discussion about selecting the appropriate distribution for a data set....

Three statistics students are having a discussion about selecting the appropriate distribution for a data set. Explain why you agree or disagree with each student and give your own suggestion for the approach the students should take.

Maya: Maya argues that since the students don’t know what the population data looks like they should simply use the sample probability mass distribution as their population mass distributions.

Greg: Greg says that the sample probability mass distribution is oddly shaped and will almost certainly not be the same as the population mass distribution function. He suggests that it’s best to find a match from the common probability mass functions that the students know about.

Jane: Jane argues that both Greg and Maya’s approach could introduce unknown error into the analysis that they are performing. She reasons that as long as there is going to be error, the students should try both approaches and choose the one that produces the results that they would most like to see.

Solutions

Expert Solution

We should agree with Jane and Maya and Greg.

If we consider Maya's opinion to use sample probability function as the population function, then, though according to the theorems, empirical cdf converges to the orifinal cdf, but in small sample that may not be true so we may exclude some regions from the sample space.

Again, if we include Greg's opinion, though it will be good to chrck for the known distribution functions, but if the sample is large and it has no known functions, then the cumulative distribution function of the sample will lead to the original population function but it may not be some known function. The it will be bad to consider only the known function.

For this reason we always go with Jane's opinion to check for both sample distribution function and known functions to see which one suits better and where. For example, if the population follows a mixed distribution of a known distribution function and an unknown distribution function, then considering both type of functions will always give us good results.


Related Solutions

The distribution of statistics marks of some sandwich students was found to be normal with a...
The distribution of statistics marks of some sandwich students was found to be normal with a mean within a range of 61- 65 and a standard deviation of within a range of 7.8 – 8.6 (1dp). It was found out that there were 348 students who took the course. With an assumed mean and standard deviation, answer the questions that follow: a. If the minimum mark to qualify for an interview was 43. What is the probability that a student...
An instructor of a statistics course had students report data about themselves, including hand preference (left...
An instructor of a statistics course had students report data about themselves, including hand preference (left or right handed) and pulses rate (beats per minute). We will use these data to conduct a hypothesis test to answer the question "does mean pulse rate differ for left-handed students, as compared to right-handed students?" A. What are the hypotheses for your test? B. Do you have paired samples or independent samples?
Selecting the appropriate cost driver is the key to effective activity based costing. Identify three overhead...
Selecting the appropriate cost driver is the key to effective activity based costing. Identify three overhead elements in a simple product your current or former employer makes, and identify what you think the best cost driver would be for each element. 100 words or more. NB/ My place of work is a food processing company
Grade distribution: A statistics teacher claims that, on the average, 10% of her students get a...
Grade distribution: A statistics teacher claims that, on the average, 10% of her students get a grade of A, 24% get a B, 38% get a C, 18% get a D, and 10% get an F. The grades of a random sample of 100 students were recorded. The following table presents the results. A. Compute the expected frequencies B. List all of the grades which were given more often than expected separate by commas. The grades which were given more...
A frequency distribution for the ages of randomly selected 27 students taking a statistics course in...
A frequency distribution for the ages of randomly selected 27 students taking a statistics course in a college is given below. a. Make a relative frequency histogram for the data. Label axes and units. b. What is the shape of the distribution? c. Compute the sample mean d. Use information from (c) to fill in the blanks in the following statement: In the sample of 27 students taking statistics, the average age of a student is about _______. Age Frequency...
1) The normal distribution is the most important continuous distribution in statistics. Give at least three...
1) The normal distribution is the most important continuous distribution in statistics. Give at least three reasons why? 2. The director of a hospital wishes to estimate the mean number of people who are admitted to the emergency room during a 24-hour period. The director randomly selects 49 different 24-hour periods and determines the number of admissions for each. For this sample, x = 17.2 and s2 = 25. Estimate the mean number of admissions per 24-hour period with a...
For the data in Data Set #1, generate a Frequency Distribution with an interval size of...
For the data in Data Set #1, generate a Frequency Distribution with an interval size of 10, a lower apparent limit value as a multiple of 10, the largest interval size place on the top of the distribution, and use this distribution to answer questions 11-18. 54 67 88 109 26 33 92 97 32 55 75 81 83 45 21 86 94 100 78 62 What is the midpoint of the lowest interval? What is the relative frequency of...
A data set includes data from student evaluations of courses. The summary statistics are n =...
A data set includes data from student evaluations of courses. The summary statistics are n = 84, x = 3.35, s = 0.58. Use a 0.05 significance level to test the claim that the population of student course evaluations has a mean equal to 3.50. Assume that a sample random sample has been selected. a. Identify the null hypotheses and alternate hypotheses. addresses the original claim. Ho: H1: b. Find the test statistics equation and value. c. Find the P...
The test scores of Statistics are listed below, find the mean of the data set that...
The test scores of Statistics are listed below, find the mean of the data set that excluding the outliers: 75,48, 83, 55, 70, 78, 50, 52, 53, 40, 54, 60, 48, 65, 53, 47, 33, 53, 28, 50, 48,55 A. 54.35 B. 56.45 C. 61.15 D. 53.15
The following is the data obtained from a set of samples on the relation between statistics...
The following is the data obtained from a set of samples on the relation between statistics final exam scores and the students’ confidence rating on mathematical skills. Student ID Exam score Math confidence Z exam Z math conf A 60 1 A -.77 -1.61 B 80 5 B .77 1.36 C 70 3 C 0 -0.12 D 50 2 D -1.55 -0.87 E 90 4 E 1.55 0.62 F 70 4 F 0 0.62 Mean 70 3.17 SD 12.9 1.34...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT