In: Computer Science
R Language
library(tidyverse)
data(diamonds)
(a) How many diamonds have a `Very Good` cut or better?
- Note that cut is an *ordered factor* so the levels are in order.
(b) Which diamond has the highest price per carat (ppc = price / carat)? What is the value?
(c) Find the 95th percentile for diamond price.
- Try the `quantile()` function.
(d) What proportion of the diamonds with a price above the 95th percentile and have the color `D` or `J`?
(e) What proportion of diamonds with a clarity of VS2 have a Fair cut and a table below 56.1?
(f) What is the average price per carat (ppc=price / carat) for each cut?
a) Find the number of diamonds with cut Very Good, Premium and Ideal
length(which(diamonds$cut == "Very Good"))
length(which(diamonds$cut=='Premium'))
length(which(diamonds$cut=='Ideal'))
length(which(diamonds$cut == "Very Good"))+ length(which(diamonds$cut=='Premium'))+length(which(diamonds$cut=='Ideal'))
Sum of all: 47424
b) Make a column named ppc and the find the diamond with max ppc (price per carat)
diamonds$ppc<-diamonds$price/diamonds$carat
subset(diamonds, ppc == max(ppc))
Output:
c) FInd quantile (95%)
price = diamonds$price
quantile(price, c(.95)) 95% 13107.1
d) FInd the number of diamonds with price greater than quantile, and those having color D or J
length(which(diamonds$price>quantile(price,c(.95)) & (diamonds$color=='D' | diamonds$color=='J'))) [1] 437
Thus the proportion is
0.008101594
e) Find the number of diamonds with cut Fair , clarity VS2 and table value less than 56.1
> length(which(diamonds$cut == "Fair" & diamonds$clarity=="VS2" & diamonds$table<56.1)) [1] 85
Thus the proportion is
0.001575825
f) Find the diamond with each cut and then mean of all prices belonging to that cut
> a = diamonds[which(diamonds$cut == "Fair"),] > b = diamonds[which(diamonds$cut == "Good"),] > c = diamonds[which(diamonds$cut == "Very Good"),] > d = diamonds[which(diamonds$cut == "Premium"),] > e = diamonds[which(diamonds$cut == "Ideal"),] [1] 4358.758 > mean(b$ppc) [1] 3860.028 > mean(a$ppc) [1] 3767.256 > mean(c$ppc) [1] 4014.128 > mean(d$ppc) [1] 4222.905 > mean(e$ppc) [1] 3919.7