In: Statistics and Probability
3 dplyr
Let’s work with the data set diamonds :
data(diamonds) head(diamonds)
A) Calculate the average price of a diamond:
[your code here]
B) Use group_by() to group diamonds by color, then use summarise() to calculate the average price and the standard deviation in price by color:
[your code here)
C) Use group_by() to group diamonds by cut, then use summarise() to count the number of observations by cut:
[your code here]
D) Use filter() to remove observations with a depth greater than 62, then use group_by() to group diamonds by clarity, then use summarise() to find the maximum price of a diamond by clarity:
[your code here]
E) Use mutate() and log() to add a new variable to the data called “log_price”:
[your code here]
Solution-A:
Rcode:
library(ggplot2)
library(dplyr)
diamonds %>%
summarise(Average = mean(price),
)
Output:
Average
<dbl>
1 3933.
Solution-B:
Rcode;
diamonds %>%
group_by(color) %>%
summarise(Avg_price = mean(price),
std_deviation = sd(price))
Output:
color Avg_price std_deviation
<ord> <dbl> <dbl>
1 D 3170. 3357.
2 E 3077. 3344.
3 F 3725. 3785.
4 G 3999. 4051.
5 H 4487. 4216.
6 I 5092. 4722.
7 J 5324. 4438.
Solution-c:
Rcode:
diamonds %>%
group_by(cut) %>%
summarise(counts = n())
Output:
cut counts
<ord> <int>
1 Fair 1610
2 Good 4906
3 Very Good 12082
4 Premium 13791
5 Ideal 21551
Solution-D
depgt_62 <- filter(diamonds, depth > 62)
depgt_62 %>%
group_by(clarity) %>%
summarise(max_price = max(price))
Output:
clarity max_price
<ord> <int>
1 I1 18531
2 SI2 18804
3 SI1 18818
4 VS2 18791
5 VS1 18500
6 VVS2 18768
7 VVS1 18777
8 IF 18552
Rscreenshot: