In: Math
Pls answer all three parts for UPVOTE
Data set:
Students Outside US
Stu ID | Age | GPA | Hrs spend on sch wrk |
3 | 48 | 4.00 | 7 |
6 | 47 | 2.79 | 14 |
9 | 45 | 3.48 | 5 |
12 | 19 | 4.00 | 30 |
15 | 24 | 3.10 | 10 |
18 | 34 | 3.24 | 2 |
21 | 44 | 36.00 | 6 |
24 | 19 | 2.85 | 7 |
27 | 19 | 2.80 | 10 |
30 | 27 | 3.40 | 8 |
33 | 28 | 2.90 | 16 |
36 | 27 | 3.40 | 8 |
39 | 28 | 2.90 | 16 |
42 | 21 | 2.9 | 4 |
45 | 20 | 2.50 | 6 |
48 | 23 | 3.3 | 18 |
51 | 41 | 3.80 | 7 |
54 | 21 | 2.60 | 26 |
57 | 39 | 5 | |
60 | 18 | 3.10 | 12 |
63 | 28 | 3.70 | 20 |
66 | 35 | 8 | |
69 | 37 | 2.80 | 6 |
72 | 21 | N/A | 21 |
75 | 20 | 3.00 | 4 |
78 | 30 | 3.50 | 6 |
81 | 21 | 3.1 | 4 |
84 | 21 | 3.20 | 3 |
87 | 37 | 2.86 | 3 |
90 | 19 | 3.30 | 12 |
Students in US:
Stu ID | Age | GPA | Hrs spend on sch wrk |
175 | 20 | 3.20 | 10 |
178 | 20 | 2.40 | 12 |
181 | 17 | 3.98 | 6 |
184 | 20 | 3.00 | 15 |
187 | 27 | 2.20 | 6 |
190 | 19 | 3.00 | 7 |
193 | 3.10 | 15 | |
196 | 20 | 6 | |
199 | 43 | 3.67 | 12 |
202 | 44 | 3.80 | 10 |
205 | 26 | 3.80 | 4 |
208 | 25 | 2.50 | 5 |
211 | 21 | 3.72 | 10 |
214 | 29 | 2.54 | 4 |
217 | 33 | 3.85 | 21 |
220 | 18 | 3.00 | 5 |
223 | 21 | 3.00 | 6 |
226 | 19 | 3.00 | 5 |
229 | 26 | 3.00 | 4 |
232 | 19 | 2.81 | 4 |
235 | 19 | 3.00 | 6 |
238 | 28 | 3.50 | 10 |
241 | 19 | 4.00 | 12 |
244 | 20 | 3.20 | 2 |
247 | 21 | 9 | |
250 | 20 | 2.51 | 3 |
253 | 20 | 3.20 | 5 |
256 | 23 | 2 | |
259 | 20 | 3.10 | 27 |
262 | 26 | 2.50 | 8 |
a) Construct a 5-number summary and boxplot using the variable “hours spent on school work at home” for both groups.
b) Compare the means for both groups to answer the research questions in the first paragraph. Which group has a higher mean GPA? Which group spends more time on their homework? What conclusions can you draw about students who were born in the USA and those who were born outside the USA based on this analysis?
c) Identify any outliers in both groups
a. For students who were born in the USA
The minimum is the smallest value in a data set.
Ordering the data from least to greatest, we get:
2 3 3 4 4 4 5 5 6 6 6 6 7 7 7 8 8 8 10 10 12 12 14 16 16 18 20 21 26 30
So, the minimum is 2.
The first quartile (or lower quartile or 25th percentile) is the median of the bottom half of the numbers. So, to find the first quartile, we need to place the numbers in value order and find the bottom half.
2 3 3 4 4 4 5 5 6 6 6 6 7 7 7 8 8 8 10 10 12 12 14 16 16 18 20 21 26 30
So, the bottom half is
2 3 3 4 4 4 5 5 6 6 6 6 7 7 7
The median of these numbers is 5.
The median is the middle number in a sorted list of numbers. So, to find the median, we need to place the numbers in value order and find the middle number.
Ordering the data from least to greatest, we get:
2 3 3 4 4 4 5 5 6 6 6 6 7 7 7 8 8 8 10 10 12 12 14 16 16 18 20 21 26 30
As you can see, we do not have just one middle number but we have a pair of middle numbers, so the median is the average of these two numbers:
Median=
The third quartile (or upper quartile or 75th percentile) is the median of the upper half of the numbers. So, to find the third quartile, we need to place the numbers in value order and find the upper half.
2 3 3 4 4 4 5 5 6 6 6 6 7 7 7 8 8 8 10 10 12 12 14 16 16 18 20 21 26 30
So, the upper half is
8 8 8 10 10 12 12 14 16 16 18 20 21 26 30
The median of these numbers is 14.
The maximum is the greatest value in a data set.
Ordering the data from least to greatest, we get:
2 3 3 4 4 4 5 5 6 6 6 6 7 7 7 8 8 8 10 10 12 12 14 16 16 18 20 21 26 30
So, the maximum is 30.
For students born outside the USA
The minimum is the smallest value in a data set.
Ordering the data from least to greatest, we get:
2 2 3 4 4 4 4 5 5 5 5 6 6 6 6 6 7 8 9 10 10 10 10 12 12 12 15 15 21 27
So, the minimum is 2.
The first quartile (or lower quartile or 25th percentile) is the median of the bottom half of the numbers. So, to find the first quartile, we need to place the numbers in value order and find the bottom half.
2 2 3 4 4 4 4 5 5 5 5 6 6 6 6 6 7 8 9 10 10 10 10 12 12 12 15 15 21 27
So, the bottom half is
2 2 3 4 4 4 4 5 5 5 5 6 6 6 6
The median of these numbers is 5.
The median is the middle number in a sorted list of numbers. So, to find the median, we need to place the numbers in value order and find the middle number.
Ordering the data from least to greatest, we get:
2 2 3 4 4 4 4 5 5 5 5 6 6 6 6 6 7 8 9 10 10 10 10 12 12 12 15 15 21 27
As you can see, we do not have just one middle number but we have a pair of middle numbers, so the median is the average of these two numbers:
Median=
The third quartile (or upper quartile or 75th percentile) is the median of the upper half of the numbers. So, to find the third quartile, we need to place the numbers in value order and find the upper half.
2 2 3 4 4 4 4 5 5 5 5 6 6 6 6 6 7 8 9 10 10 10 10 12 12 12 15 15 21 27
So, the upper half is
6 7 8 9 10 10 10 10 12 12 12 15 15 21 27
The median of these numbers is 10.
The maximum is the greatest value in a data set.
Ordering the data from least to greatest, we get:
2 2 3 4 4 4 4 5 5 5 5 6 6 6 6 6 7 8 9 10 10 10 10 12 12 12 15 15 21 27
So, the maximum is 27
b. For USA students
For non USA students
Mean=
Students of US group have higher mean
US students group spend more time on their homework
We conclude that students who are born in US spend more time on their homework
c. An outlier is defined as being any point of data that lies
over 1.5 IQRs below the first quartile (Q1) or above the third
quartile (Q3)in a data set.
High = (Q3) + 1.5 IQR
Low = (Q1) – 1.5 IQR
For US group High=14+1.5(14-5)=27.5
For Low=5-1.5(9)=8.5
Hence 30 is outlier for US students
For non US students
High = (Q3) + 1.5 IQR=10+1.5(10-5)=17.5
Low = (Q1) – 1.5 IQR=5-1.5*5=2.5
So 21, 27 are outliers