Question

In: Statistics and Probability

Consider the number of days absent from a random sample of six students during a semester:...

  1. Consider the number of days absent from a random sample of six students during a semester: A= {2, 3, 2, 4, 2, 5}
    1. Compute the arithmetic mean, geometric mean, median, and mode by hand and verify the results using R
      1. Arithmetic Mean: X=i=1nXin=2+3+2+4+2+56=3

mean(data2$absent)

[1] 3

  1. Geometic Mean: GMx=Πi=1nX11n=2∙3∙2∙4∙2∙516=2.79816

>gmean <- prod(data2$absent)^(1/length(data2$absent))

> gmean

[1] 2.798166

  1. Median: X=12n+1th, Xi2,2,2,3,4,5, n=6=126+1th ranked value=3.5, value=2.5 days absent

>median(data2$absent)

[1] 2.5

  1. Mode: Most frequent value=2

> mode <- names(table(data2$absent)) [table(data2$absent)==max(table(data2$absent))]

> mode

[1] "2"

  1. Compute the variance, standard deviation, and coefficient of variation by hand and verify the results using R.
    1. Sample Variance: S2=i=1nXi-X2n-1=2-32+3-32+2-324-322-325-326-1=85=1.6

> var(data2$absent)

[1] 1.6

  1. Sample Standard Deviation: s=s2=i=1nXi-X2n-1=1.6=1.26491

> sd(data2$absent)

[1] 1.264911

  1. Coefficient of Variation: CV=SX=1.264912.5=.421637

> cv <- abs(sd(data2$absent)/mean(data2$absent))

> cv

[1] 0.421637

  1. Compute the interquartile range of this data by hand and with R

    > IQR(data2$absent, type=6)

    [1] 2.25

    1. Display the data above with a well‐formatted and well‐labeled pie chart created with R.  Recall that with a pie chart, the slices sum to 100 percent, so the slice labeled “2 days absent” should be 50 percent of the pie.

      Pie Chart with Percentages

      slices <- c(50, 16.6666667, 16.6666667, 16.6666667)

      lbls <- c("2 days absent", "3 days absent", "4 days absent", "5 days absent")

      pct <- round(slices/sum(slices)*100)

      lbls <- paste(lbls, pct) # add percents to labels

      lbls <- paste(lbls,"%",sep="") # ad % to labels

      pie(slices,labels = lbls, col=rainbow(length(lbls)),

          main="Pie Chart of Absences")

      1. Display the data above with a well‐labeled and well‐formatted frequency distribution (histogram) using R.

        x <- data2$absent

        > h<-hist(x, breaks=10, col="magenta", xlab="# of student absences",

        +         main="Histogram of student absences")

        > xfit<-seq(min(x),max(x),length=40)

        > yfit<-dnorm(xfit,mean=mean(x),sd=sd(x))

        > yfit <- yfit*diff(h$mids[1:2])*length(x)

        > lines(xfit, yfit, col="blue", lwd=2)

        1. Use the relationship between the mean and median to determine whether the distribution above is symmetrical, negatively skewed or positively skewed.  Explain.
          1. Mean > Median = 3 > 2.5, the distribution above is positively skewed.
        2. Calculate excess kurtosis using the formula from lecture.  Is this distribution leptokurtic, mesokurtic, or platykurtic?  Explain.  Compared to the normal distribution with the same variance, does this distribution exhibit fatter tails?  Briefly explain
          1. 1ni=1nxi-X41ni=1nxi-X22=162-34+3-34+2-34+4-34+2-34+5-34162-32+3-32+2-32+4-32+2-32+5-322=103169=158≈1.875
          2. Excess kurtosis = kurtosis – 3 = 1.875 – 3 = -1.125, -1.125 < 3
          3. This distribution is platykurtic as excess kurtosis < 0, kurtosis < 3
          4. No it will exhibit thinner tails than a normal distribution.

        Suppose someone claims that the population mean is 5 days absent (?: ? =5). What is the alternative hypothesis?  Can you reject the null hypothesis at the 5‐percent and 1‐percent levels of significance?  Use the critical value, p‐value, and confidence interval approaches both “by hand” and with R.  In the “by hand” approach, you can use R (or a statistical table) to get the critical values. ????

        Solutions

        Expert Solution

        A= {2, 3, 2, 4, 2, 5}

        Null Hypothesis H0: ? =5

        Alternative Hypothesis H1; ? 5

        Sample mean = (2 + 3 + 2 + 4 + 2 + 5) / 6 = 3

        Sample variance = [ (2 - 3)2 +  (3 - 3)2 +  (2 - 3)2 +  (4 - 3)2 +  (2 - 3)2 +  (5 - 3)2 ] / 5

        = 1.6

        Sample Standard deviation, s = = 1.264911

        Standard error, SE = s / = 1.264911 / = 0.5163978

        Test statistic, t = (3 - 5) / 0.5163978 = -3.872983

        Degree of freedom = n-1 = 6 - 1 = 5

        For two tail test, P-value = 2 * P(t < -3.872983) = 0.01172482

        P-value approach - Since, p-value is less than 0.05 significance level, we reject null hypothesis H0 at 5% significance level and conclude that there is strong evidence that ? 5.

        Since, p-value is greater than 0.01 significance level, we fail to reject null hypothesis H0 at 1% significance level and conclude that there is no strong evidence that ? 5.

        Critical value of t at df = 5 and 5‐percent and 1‐percent levels of significance are  2.57 and 4.032. That is we reject H0 if t <-2.57 or t > 2.57 at 5% significance level. And we reject H0 if t <-4.032 or t > 4.032 at 1% significance level.

        Critical value approach -

        Since, test statistic lie in the critical region we reject null hypothesis H0 at 5% significance level and conclude that there is strong evidence that ? 5.

        Since, test statistic does lie in the critical region, we fail to reject null hypothesis H0 at 1% significance level and conclude that there is no strong evidence that ? 5.

        For 5% significance level ,

        Margin of error = t * SE = 2.57 * 0.5163978 = 1.327142

        95% confidence interval is,

        (3 - 1.327142, 3 + 1.327142)

        (1.672858, 4.327142)

        Since hypothesized mean 5 does not lie in the 95% confidence interval, we conclude that there is strong evidence that ? 5.

        For 1% significance level ,

        Margin of error = t * SE = 4.032 * 0.5163978 = 2.082116

        99% confidence interval is,

        (3 - 2.082116, 3 + 2.082116)

        (0.917884, 5.082116)

        Since hypothesized mean 5 lie in the 99% confidence interval, we conclude that there is no strong evidence that ? 5.

        Using R,

        For 5% significance level ,

        t.test(A, mu = 5)

           One Sample t-test

        data: A
        t = -3.873, df = 5, p-value = 0.01172
        alternative hypothesis: true mean is not equal to 5
        95 percent confidence interval:
        1.672557 4.327443
        sample estimates:
        mean of x
        3

        For 1% significance level ,

        t.test(A, mu = 5, conf.level = 0.99)

           One Sample t-test

        data: A
        t = -3.873, df = 5, p-value = 0.01172
        alternative hypothesis: true mean is not equal to 5
        99 percent confidence interval:
        0.9178103 5.0821897
        sample estimates:
        mean of x
        3


        Related Solutions

        3. If a random sample of 53 students was asked for the number of semester hours...
        3. If a random sample of 53 students was asked for the number of semester hours they are taking this semester. The sample standard deviation was found to be s = 4.7 semester hours. How many more students should be included in the sample to be 99% sure that the sample mean x is within 1 semester hour of the population mean  for all students at this college? i. A marketer is trying two different sales pitches to sell...
        Researchers have collected data from a random sample of six students on the number of hours...
        Researchers have collected data from a random sample of six students on the number of hours spent studying for an exam and the grade received on the exam as given in Table 6.5. Table 6.5 Observation Grade Number of Hours Studying 1 85 8 2 73 10 3 95 13 4 77 5 5 68 2 6 95 12 d) Find and interpret a 90% confidence interval for the true population slope parameter.
        The following data represent the number of days absent per year in a population of six...
        The following data represent the number of days absent per year in a population of six employees in a small company: 1 3 6 7 9 10 a. Compute the population mean. b. Compute the population standard deviation. c. Assuming that you sample without replacement, select all possible samples of size n=2 and construct the sampling distribution of the mean. d. Compute the mean of all the sample means. How does the mean of the sample means and the population...
        Professor Venkat is on a mission to find out the average number of students absent from...
        Professor Venkat is on a mission to find out the average number of students absent from his lectures. For this, he analyzed 14 random attendance sheets from his previous semesters and found out that on average, the number of absentees was 38 with a standard deviation of 12. Let's assume that the data follows a Gaussian distribution. Construct a 99% confidence interval for the true mean of the dataset.
        1. T The following data represents the number of days absent from school in one school...
        1. T The following data represents the number of days absent from school in one school year for a sample of 40 students in Ms. Jinn’s fourth grade class.                                 0              1              2              2              2              4              5              5              7              7                                 7              7              8              8              8              8              8              8              10           10                               12             12           12           13           14          ...
        3. Below is the number of texts per day for a random sample of students from...
        3. Below is the number of texts per day for a random sample of students from last semester. (Population normal enough.) a. Create the 95% confidence interval for both sets of data. Is there evidence that the population means are different? Under 19 years old: 250, 15, 200, 10, 15, 50, 20, 150, 63, 15, 7, 20, 35, 4, 20 At least 19 years old: 100, 20, 40, 5, 30, 100, 100, 65, 5, 25, 20, 50, 10, 10 b....
        a. A random number table is used to randomly pick a sample of 118 students from...
        a. A random number table is used to randomly pick a sample of 118 students from a population of 2000 students. I would group the numbers found in the random numbertable in groups of ________. b. Here is the row I use from the random number table: 11029 39384 59382 31393 38388 37362 01836 2334…………………………….. We will use multiple labeling, such that each member of the population gets five labels. Please list the first five members of the sample of...
        A random sample of 49 children with working mothers showed that they were absent from school...
        A random sample of 49 children with working mothers showed that they were absent from school an average of 6 days per term with a standard deviation of 1.8 days. Write down the equation you should use to construct the confidence interval for the average number of days absent per term for all the children. Determine a 98% confidence interval estimate for the average number of days absent per term for all the children. Determine a 95% confidence interval estimate...
        A random sample of 49 children with working mothers showed that they were absent from school...
        A random sample of 49 children with working mothers showed that they were absent from school an average of 6 days per term with a standard deviation of 1.8 days. Write down the equation you should use to construct the confidence interval for the average number of days absent per term for all the children. (6 points) Determine a 98% confidence interval estimate for the average number of days absent per term for all the children. (8 points) Determine a...
        A random sample of 49 children with working mothers showed that they were absent from school...
        A random sample of 49 children with working mothers showed that they were absent from school an average of 6 days per term with a standard deviation of 1.8 days. Write down the equation you should use to construct the confidence interval for the average number of days absent per term for all the children. Determine a 98% confidence interval estimate for the average number of days absent per term for all the children. Determine a 95% confidence interval estimate...
        ADVERTISEMENT
        ADVERTISEMENT
        ADVERTISEMENT