In: Statistics and Probability
–Construct a reasonable frequency distribution of High School GPA (HSGPA)
–Construct a histogram
–Present the frequency distribution and histogram
There are a total of 196 HS student GPAs. 1.6, 2, 2.1, 2.1, 2.2, 2.2, 2.2, 2.4, 2.4, 2.5, 2.5, 2.5, 2.5, 2.5, 2.6, 2.7, 2.75, 2.75, 2.75, 2.75, 2.75, 2.8, 2.8, 2.8, 2.9, 2.9, 2.9, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3.1, 3.1, 3.1, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.23, 3.25, 3.25, 3.25, 3.25, 3.3, 3.3, 3.3, 3.3, 3.3, 3.31, 3.34, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.45, 3.479, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.63, 3.63, 3.64, 3.65, 3.65, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.729, 3.75, 3.75, 3.75, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.81, 3.81, 3.83, 3.9, 3.9, 3.9, 3.9, 3.9, 3.92, 3.94, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4
Problem Statement: In a class there are 196 students and we have their GPAs available. For this dataset of GPAs of 196 students a frequency distribution and a histogram needs to be constructed.
NOTE: Although in the question it is mentioned that there are GPAs of 196 students, when looked at data provided closely it looks like there are only 188 entries, implying GPAs of only188 students is available. It should be noted that irrespective of number of data samples available (188 v/s 196), the approach taken TO SOLVE THIS PROBLEM STATEMENT IS SAME.
IN THE WORKING I WILL BE USING 188 AS NUMBER OF DATA POINTS. IF ALL 196 POINTS, AS MENTIONED IN QUESTION ARE AVAILABLE , WE CAN REPLACE 188 WITH 196 AND USE THE SAME STEPS.
Given: In the problem statement we are provided with GPAs of 188 students.
Step 1: First step to be done while trying to construct frequency distribution and histogram is to arrange data in ascending order, as it gives a vague idea about data distribution. r.In the dataset provided, GPAs are already in ascending order.
Step 2: To obtain frequency distribution for the data, we have to decide on the number of bins that would be suitable for this dataset. Bins are class intervals arranged ascending order where each data point would fall in to one of the interval. One of the major property/characteristic of bin is all bins should be of same size. In most cases, deciding on number of bins required is subjective and depends on the understanding, application of the problem statement. However, there are certain rules that directs us to thee optimal number of bins for a dataset and we will use of those for this problem statement.
In this problem let us use Struge's rule to compute the number of bins required. Struge's rule is one the simple and highly used technique used to create bins.
Struge's rule is given by:
Number of bins (n)=1+(3.322*Logarithm10 (number of data points (m) )
Using the formula we get,
n= 1+(3.322*Logarithm10 (188) )= 8.5547.
Since number of bins should be a whole number we can round the number of bins to 8.
Step 3: Once the number of bins are decided, compute the number of data points falling in each of these equally spaced ascendingly ordered class interval. First bin starts with 1.6 (lowest GPA). Highest GPA scored is 4. Size of each bin would be =(4-1.6)/8= 0.3GPA
This forms frequency distribution table. An extra column could be computed to calculate relative frequency. Where number of data points within a class interval is divided by total number of samples
Below is the table:
GPA | Frequency | Relative Frequency |
1.6 to 1.9 | 1 | 0.53% |
1.9 to 2.2 | 6 | 3.19% |
2.2 to 2.5 | 7 | 3.72% |
2.5 to 2.8 | 10 | 5.32% |
2.8 to 3.1 | 25 | 13.30% |
3.1 to 3.4 | 36 | 19.15% |
3.4 to 3.7 | 51 | 27.13% |
3.7 to 4 | 52 | 27.66% |
Total | 188 | 100.00% |
Step 4: Once we have obtained frequency of data points for each class interval (bin) we can proceed to plot histogram. Histogram is visual representation of frequency of data points in each class interval when the class intervals are arranged in an ascending order.
Below is the histogram for above data points:
Note : [ ] in graph indicates, number is inclusive in the interval while ( ) indicates the number is not included in the interval during frequency calculation.