In: Statistics and Probability
The given stemplot shows the percentages of adults aged 18–34 who were considered minorities in each of the states and the District of Columbia.
0 | 8 8 9 |
1 | 1 5 5 6 7 8 9 |
2 | 0 2 2 3 3 3 3 3 3 6 7 7 8 |
3 | 0 1 1 1 4 5 7 9 9 |
4 | 0 1 1 1 2 5 8 8 |
5 | 1 1 1 2 2 3 4 |
6 | 1 7 7 |
7 | 5 |
(a) Make another stemplot of this data by splitting the stems, placing leaves 00 to 44 on the first stem and leaves 55 to 99 on the second stem of the same value.
(b) Create a histogram that uses class intervals that give the same pattern as the stemplot you created with the split stems. It might be a good idea to do this by hand. If you use software, you may have to adjust the settings in order to get a histogram that matches the stemplot exactly.
RStudio-RNotebook
We are given stem and leaf plot of percentages of of adults aged 18–34 who were considered minorities in each of the states and the District of Columbia.
```{r}
df1 <- data.frame(
Stem = c(0,1,2,3,4,5,6,7),
Leaf = c('8 8 9', '1 5 5 6 7 8 9', '0 2 2 3 3 3 3 3 3 6 7 7 8', '0 1 1 1 4 5 7 9 9', '0 1 1 1 2 5 8 8', '1 1 1 2 2 3 4', '1 7 7', '5')
)
pander::pander(df1)
```
### Part a
Split the stems, placing leaves 00 to 44 on the first stem and leaves 55 to 99 on the second stem of the same value
---
Example -
For old stem 2 the leaves are "0 2 2 3 3 3 3 3 3 6 7 7 8".
Leaves "0 2 2 3 3 3 3 3 3" fall under 00-44 split.
Leaves "6 7 7 8" fall under 55-99 split.
Frequency is then count of leafs on new stems.
New stem 2(00-44) has 9 leaves.
New stem 2(55-99) has 4 leaves.
---
Applying the same to all new stems.
```{r}
df2 <- data.frame(
Stem_old = c(0,1,1,2,2,3,3,4,4,5,6,6,7),
Stem_new = c('0(55-99)', '1(00-44)', '1(55-99)', '2(00-44)', '2(55-99)', '3(00-44)', '3(55-99)', '4(00-44)', '4(55-99)', '5(00-44)', '6(00-44)', '6(55-99)', '7(55-99)'),
Leaf = c('8 8 9', '1', '5 5 6 7 8 9', '0 2 2 3 3 3 3 3 3', '6 7 7 8', '0 1 1 1 4', '5 7 9 9', '0 1 1 1 2', '5 8 8', '1 1 1 2 2 3 4', '1', '7 7', '5'),
Split = c("55-99","00-44","55-99","00-44","55-99","00-44","55-99","00-44","55-99","00-44","00-44","55-99","55-99"),
Freq = c('3', '1', '6', '9', '4', '5', '4', '5', '3', '7', '1', '2', '1')
)
pander::pander(df2)
```
---
### Part b
Plot the histogram using new stems on x axis and count of leaves for each new stem on y axis.
```{r, warning=FALSE}
library(ggplot2)
ggplot(df2) + theme_linedraw() + geom_histogram(aes(x = Stem_new, y = Freq), stat = "identity") + xlab("Stem")
```