In: Statistics and Probability
The table below represents 2 attempts that students had to complete the same statistics exam in a course.
Student |
Exam- 1st attempt (%) |
Exam-2nd attempt (%) |
1 |
59 |
71 |
2 |
64 |
63 |
3 |
86 |
87 |
4 |
74 |
82 |
5 |
83 |
89 |
6 |
52 |
40 |
7 |
57 |
62 |
8 |
38 |
55 |
9 |
31 |
70 |
10 |
74 |
78 |
11 |
70 |
78 |
12 |
64 |
59 |
13 |
40 |
57 |
14 |
55 |
59 |
15 |
70 |
65 |
1) The professor believes that, on average, students will do better on the second attempt than on the first.
a) Choose an appropriate test to determine if students improved on the second attempt compared to their first. Draw appropriate conclusions.
b) Calculate the size/magnitude of this effect.
c) Identify the 95% confidence interval around our measurement and explain what this result tells us about our data.
2) I modify the table so that the column labelled 1st attempt now represents the results from students in Prof A’s statistics class, and the column labelled 2nd attempt represents the results of a completely different group of students taking Prof B’s class.
a) Choose the appropriate test to demonstrate if there is a significant difference in the results in Professor A’s class compared to Professor B.
b) Is my approach to this problem the same as in Question 1). Why or why not?
3) I keep the table changes mentioned in question 2. The 1st attempt column still represents the results from students in Professor A's statistics class and the 2nd attempt column represents different students from Professor B.'s class. Professor A discovers that 3 students in his class cheated so he eliminates their grades from the group. If I now wanted to compare the performance of class A and class B, should the statistical approach change compared to Question 2? Why or why not? (Note: You do not need to do the calculations; you just need to provide an explanation.)
**********************
I'm going to modify the table a bit. Data now represent the marks
obtained by students in the statistics course at the mid-term exam
and in the final exam in 2018.
Student |
Grade in midterm exam (%) |
Grade in final exam (%) |
1 |
59 |
71 |
2 |
64 |
63 |
3 |
86 |
87 |
4 |
74 |
82 |
5 |
83 |
89 |
6 |
52 |
40 |
7 |
57 |
62 |
8 |
38 |
55 |
9 |
31 |
70 |
10 |
74 |
78 |
11 |
70 |
78 |
12 |
64 |
59 |
13 |
40 |
57 |
14 |
55 |
59 |
15 |
70 |
65 |
4) I would like to know if there is a relationship or link between the grades that the students obtained in the midterm exam and in the final exam.
a) Make an appropriate graphical representation to illustrate these data.
b) What conclusions can we draw only by looking at this graph? Are there any data points that seem problematic?
5) a) What is the strength of the relationship between these two variables?
b) What part of the variance could be explained by the relationship that exists between these variables?
c) Is this relationship statistically significant?
6) In the 2019 winter semester, a student obtained a grade of 64 in her midterm exam. What grade could we predict that she will get in the final exam?
7) a) If I wanted to test the relationship between the midterm exam performance and the final exam using a chi-square test, how would the above data table need be rearranged/modified?
b) Despite my suggestion in 7a) to use a chi-square test, it would actually be a bad idea to use the chi-square test with this type of problem. Why might that be the case? What problem (s) would it cause? (Think of the rules we discussed for using chi-square.)
a) Paired t test:
b) pvalue which corresponds to the t value is 0.0339<0.05 which concludes us to reject the null hypothesis at 5% significance level. But the hypothesis cannot be rejected at 1% or 2% significance level. We could conclude second attempt is better than the first attempt only at 5% significance level not at 1% or 2% level of significance. because the magnitude of this effect is 0.0339.
c) 95% confidence interval for the difference in the grades of two attempts
Confidence intervals are better information providers than point estimates.
It is believed with some sort of certainity that the actual difference lies between these two bounds of -12.795 and 0.509 since 95% of the time confidence intervals contain the actual difference ie when repeated samples were taken and the 95% confidence interval were computed for each sample, 95% of the intervals would accomodate the actual difference