In: Statistics and Probability
1. The data set on sheet #1 gives data on GPA category and number of hours studied. Construct comparative box plots of the data first GPA category. Then conduct two-sample t-test on the data for whether GPA category influences the number of hours studied. Be prepared to explain the results of the test and the meaning of the boxplots and how they relate to each other. Then redo the analysis by replacing the ordinal GPA category with a numerical dummy variable with Low=0, High=1. Run a regression analysis on how study hours (x) influence GPA category (y). Include the scatterplot. Compare the results of the two tests. Be able to state and null and alternative hypotheses
Student | GPA | Hours per week |
1 | Low | 6 |
2 | Low | 18 |
3 | Low | 16 |
4 | Low | 14 |
5 | High | 0 |
6 | Low | 22 |
7 | Low | 15 |
8 | Low | 12 |
9 | High | 6 |
10 | Low | 7 |
11 | Low | 5 |
12 | High | 20 |
13 | High | 9 |
14 | High | 9 |
15 | Low | 22 |
16 | Low | 23 |
17 | High | 8 |
18 | Low | 7 |
19 | Low | 14 |
20 | Low | 12 |
21 | Low | 0 |
22 | High | 7 |
23 | High | 4 |
24 | Low | 9 |
25 | Low | 0 |
26 | Low | 0 |
27 | High | 6 |
28 | High | 14 |
29 | Low | 10 |
30 | Low | 9 |
31 | High | 5 |
32 | High | 7 |
33 | High | 4 |
34 | High | 16 |
35 | High | 0 |
36 | Low | 20 |
37 | Low | 13 |
38 | High | 0 |
39 | High | 4 |
40 | Low | 6 |
41 | Low | 17 |
42 | Low | 8 |
43 | High | 4 |
44 | Low | 0 |
45 | High | 16 |
46 | Low | 17 |
47 | Low | 4 |
48 | High | 11 |
49 | Low | 14 |
50 | Low | 16 |
51 | High | 11 |
52 | High | 7 |
53 | High | 4 |
54 | Low | 11 |
55 | Low | 8 |
56 | High | 2 |
57 | Low | 0 |
58 | Low | 0 |
59 | High | 13 |
60 | Low | 18 |
61 | Low | 28 |
62 | High | 1 |
63 | Low | 20 |
64 | Low | 13 |
65 | Low | 4 |
66 | Low | 7 |
67 | High | 11 |
68 | Low | 12 |
69 | High | 5 |
70 | Low | 7 |
71 | Low | 22 |
72 | High | 8 |
73 | Low | 19 |
74 | Low | 8 |
75 | High | 2 |
76 | High | 11 |
77 | Low | 18 |
78 | Low | 20 |
79 | High | 7 |
80 | High | 4 |
81 | High | 4 |
82 | High | 16 |
83 | High | 15 |
84 | Low | 9 |
85 | High | 8 |
86 | High | 10 |
87 | Low | 13 |
88 | High | 9 |
89 | Low | 2 |
90 | Low | 22 |
91 | Low | 12 |
92 | High | 6 |
93 | High | 9 |
94 | Low | 20 |
95 | Low | 14 |
96 | High | 7 |
97 | High | 15 |
98 | High | 9 |
99 | High | 2 |
100 | Low | 23 |
Box plot:
we can observe that, Mean number of hours studied by GPA-Low students is greater than Mean number of hours studied by GPA-high students. Now we need to test this statement using 2 sample t-test.
2 sample t-test:
Null hypothesis Ho: There is no difference in mean number of hours studied by GPA-Low students and mean number of hours studied by GPA-High students.
Alternative hypothesis H1: Mean number of hours studied by GPA-Low students is greater than Mean number of hours studied by GPA-high students.
(So this is a right tailed or one tailed test)
Test statistic:
where
By usual definition of mean and standard deviation we get,
Substituting the above values in test statistic equation we get,
t=3.511
and degrees of freedom
Now to draw the conclusion, we need to compare the t value (3.511) with t-distribution value at 5% level of significance () with degrees of freedom 98. (its called critical value)
i.e from t-distibution table we get
Since , we reject the null hypothesis at 5% level of significance.
Which means "Mean number of hours studied by GPA-Low students is greater than Mean number of hours studied by GPA-high students."
Or
"GPA category influence the number of hours studied"
Scatter Plot:
we can observe from above Scatter plot that, there is no linear relationship between, Number of hours studied and GPA category. Since dependent variable GPA category is binary (o or 1) we can try to fit a logistic regression.
Logistic regression Model:
Logistic regression model is given by,
and we get the model,