Question

In: Computer Science

R Language library(tidyverse) data(diamonds) (a) How many diamonds have a `Very Good` cut or better?              ...

R Language

library(tidyverse)

data(diamonds)

(a) How many diamonds have a `Very Good` cut or better?

              - Note that cut is an *ordered factor* so the levels are in order.

(b) Which diamond has the highest price per carat (ppc = price / carat)? What is the value?

(c) Find the 95th percentile for diamond price.

               - Try the `quantile()` function.

(d) What proportion of the diamonds with a price above the 95th percentile and have the color `D` or `J`?

(e) What proportion of diamonds with a clarity of VS2 have a Fair cut and a table below 56.1?

(f) What is the average price per carat (ppc=price / carat) for each cut?

Solutions

Expert Solution

a) Find the number of diamonds with cut Very Good, Premium and Ideal

length(which(diamonds$cut == "Very Good"))

length(which(diamonds$cut=='Premium'))

length(which(diamonds$cut=='Ideal'))

length(which(diamonds$cut == "Very Good"))+ length(which(diamonds$cut=='Premium'))+length(which(diamonds$cut=='Ideal'))

Sum of all: 47424

b) Make a column named ppc and the find the diamond with max ppc (price per carat)

diamonds$ppc<-diamonds$price/diamonds$carat

subset(diamonds, ppc == max(ppc))

Output:

c) FInd quantile (95%)

price = diamonds$price 
quantile(price, c(.95))
    95% 
13107.1 

d) FInd the number of diamonds with price greater than quantile, and those having color D or J

 length(which(diamonds$price>quantile(price,c(.95)) & (diamonds$color=='D' | diamonds$color=='J')))
[1] 437

Thus the proportion is

0.008101594

e) Find the number of diamonds with cut Fair , clarity VS2 and table value less than 56.1

> length(which(diamonds$cut == "Fair" & diamonds$clarity=="VS2" & diamonds$table<56.1))
[1] 85

Thus the proportion is

0.001575825

f) Find the diamond with each cut and then mean of all prices belonging to that cut

> a = diamonds[which(diamonds$cut == "Fair"),]
> b = diamonds[which(diamonds$cut == "Good"),]
> c = diamonds[which(diamonds$cut == "Very Good"),]
> d = diamonds[which(diamonds$cut == "Premium"),]
> e = diamonds[which(diamonds$cut == "Ideal"),]

[1] 4358.758
> mean(b$ppc)
[1] 3860.028
> mean(a$ppc)
[1] 3767.256
> mean(c$ppc)
[1] 4014.128
> mean(d$ppc)
[1] 4222.905
> mean(e$ppc)
[1] 3919.7

Related Solutions

Consider the diamonds data set. How many diamonds are there in the dataset with a cut...
Consider the diamonds data set. How many diamonds are there in the dataset with a cut considered Premium? 4906 12082 13791 21551 1610
Language: Java or C (NO OTHER LANGUAGE) Do NOT use Java library implementations of the data...
Language: Java or C (NO OTHER LANGUAGE) Do NOT use Java library implementations of the data structures (queues, lists, STs, hashtables etc.) Have a unit test implemented in main(). And comment every code. Show examples from the executions. Assume that the edges defined by the vertex pairs in the data base are one-way. Question: Write a program that can answer if there is a path between any to vertices. For the vertex pairs use this as your input example: AL...
Language: Java or C (NO OTHER LANGUAGE) Do NOT use Java library implementations of the data...
Language: Java or C (NO OTHER LANGUAGE) Do NOT use Java library implementations of the data structures (queues, lists, STs, hashtables etc.) Have a unit test implemented in main(). And comment every code. Show examples from the executions. Assume that the edges defined by the vertex pairs are two-way. Question: First step: write a program based on DFS which can answer questions of the type: "Find the a path from X to Y" Which should result in a list of...
Language: Java or C (NO OTHER LANGUAGE) Do NOT use Java library implementations of the data...
Language: Java or C (NO OTHER LANGUAGE) Do NOT use Java library implementations of the data structures (queues, lists, STs, hashtables etc.) Have a unit test implemented in main(). And comment every code. Show examples from the executions. Question: First step: write a program based on DFS which can answer questions of the type: "Find the a path from X to Y" Which should result in a list of vertices traversed from X to Y if there is a path....
When coding in R Studio install.packages("hflights") library(hflights) if filter for flights >= 3000 miles how many...
When coding in R Studio install.packages("hflights") library(hflights) if filter for flights >= 3000 miles how many flights in new dataframe
In R, Use library(MASS) to access the data sets for this test. Use the Pima.tr data...
In R, Use library(MASS) to access the data sets for this test. Use the Pima.tr data set to answer questions 1-5. What is the average age for women in this data set? What is the maximum number of pregnancies for women in this data set ? What is the median age for women who have diabetes? What is the median age for women who do not have diabetes? What is the third quartile of the skin variable?
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed,...
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed, temperature, and solar radiation by month and day for May through September in New York. Attach airquality to your workspace and then construct side-by-side boxplots of Wind by Month. Month is a numeric variable in the airquality data frame. You can treat it as a factor by using the ”as.factor” function, e.g., > plot(Wind ∼ as.factor(Month)) Next, do an analysis of variance to determine...
install.packages("mosaic") library(mosaic) Data=(RailTrail) RailTrail above is the data set it can be found in R (a)...
install.packages("mosaic") library(mosaic) Data=(RailTrail) RailTrail above is the data set it can be found in R (a) Perform multivariate regression model that can predict the variable volume based on the variables hightemp, lowtemp, cloudcover, precip,. Interpret and discuss all the necessary statics from the output. (b) Test whether cloudcover can be dropped from the regression model given that precipitation, hightemp, and lowtemp are retained. Use the F statistic and level of significance 0.01. State the hypotheses, p-value, and conclusion in terms...
Have a pretty good idea on how to solve this program, I'm just not very accustomed...
Have a pretty good idea on how to solve this program, I'm just not very accustomed to formatting well in Java for these types of problems. In java, you have full control of what's printed, I just need to get better at it, so I thought I'd post it here to receive an answer to learn from. Thanks for everything guys. This program must use the Exclusive OR Operator ^ in java, it must also use a while loop. The...
using the mtcars data set data(mtcars) USE data in mtcars library in R 5.Use k means...
using the mtcars data set data(mtcars) USE data in mtcars library in R 5.Use k means cluster analysis. 6. Get cluster means. 7. Visualize the clustering result.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT