Question

In: Statistics and Probability

3 dplyr Let’s work with the data set diamonds : data(diamonds) head(diamonds) A) Calculate the average...

3 dplyr
Let’s work with the data set diamonds :

data(diamonds)
head(diamonds)

A) Calculate the average price of a diamond:

[your code here]

B) Use group_by() to group diamonds by color, then use summarise() to calculate the average price and the standard deviation in price by color:

[your code here)

C) Use group_by() to group diamonds by cut, then use summarise() to count the number of observations by cut:

[your code here]

D) Use filter() to remove observations with a depth greater than 62, then use group_by() to group diamonds by clarity, then use summarise() to find the maximum price of a diamond by clarity:

[your code here]

E) Use mutate() and log() to add a new variable to the data called “log_price”:

[your code here]

Solutions

Expert Solution

Solution-A:

Rcode:

library(ggplot2)
library(dplyr)

diamonds %>%
summarise(Average = mean(price),

)

Output:

Average
<dbl>
1 3933.

Solution-B:

Rcode;

diamonds %>%
group_by(color) %>%
summarise(Avg_price = mean(price),
std_deviation = sd(price))

Output:

color Avg_price std_deviation
<ord> <dbl> <dbl>
1 D 3170. 3357.
2 E 3077. 3344.
3 F 3725. 3785.
4 G 3999. 4051.
5 H 4487. 4216.
6 I 5092. 4722.
7 J 5324. 4438.

Solution-c:

Rcode:

diamonds %>%
group_by(cut) %>%
summarise(counts = n())

Output:

cut counts
<ord> <int>
1 Fair 1610
2 Good 4906
3 Very Good 12082
4 Premium 13791
5 Ideal 21551

Solution-D


depgt_62 <- filter(diamonds, depth > 62)
depgt_62 %>%
group_by(clarity) %>%
summarise(max_price = max(price))

Output:

clarity max_price
<ord> <int>
1 I1 18531
2 SI2 18804
3 SI1 18818
4 VS2 18791
5 VS1 18500
6 VVS2 18768
7 VVS1 18777
8 IF 18552


Rscreenshot:

  


Related Solutions

Consider the diamonds data set. How many diamonds are there in the dataset with a cut...
Consider the diamonds data set. How many diamonds are there in the dataset with a cut considered Premium? 4906 12082 13791 21551 1610
For the attached data set, 1. create a 3-month and 6-month moving average forecast. 2. Calculate...
For the attached data set, 1. create a 3-month and 6-month moving average forecast. 2. Calculate the standard errors 3. compare their forecast accuracy Month/Year Unemployment rate Jan-17 5.1 Feb-17 4.9 Mar-17 4.6 Apr-17 4.1 May-17 4.1 Jun-17 4.5 Jul-17 4.6 Aug-17 4.5 Sep-17 4.1 Oct-17 3.9 Nov-17 3.9 Dec-17 3.9 Jan-18 4.5 Feb-18 4.4 Mar-18 4.1 Apr-18 3.7 May-18 3.6 Jun-18 4.2 Jul-18 4.1 Aug-18 3.9 Sep-18 3.6 Oct-18 3.5 Nov-18 3.5
Now, let’s calculate the least-squares line based on your data.  Show your work.    x y x2...
Now, let’s calculate the least-squares line based on your data.  Show your work.    x y x2 xy y2 1045 183 2266 283 584 163 444 205 2746 283 698 146 796 143 1304 223 2. Determine the Sample Correlation Coefficient, .
Using R calculate the following properties of the Data Set given below: (a) The average (mean)...
Using R calculate the following properties of the Data Set given below: (a) The average (mean) value for each of the four features (b) (b) the standard deviation for each of the features (c) repeat steps (a) and (b) but separately for each type of flower (d) (d) draw four box plots, one for each feature, such that each figure shows three boxes, one for each type of flower. Properly label your axes in all box plots. Data Set {...
Construct a scattergram for each data set. Then calculate r and r2 for each data set....
Construct a scattergram for each data set. Then calculate r and r2 for each data set. Interpret their values. Complete parts a through d. a. x −1 0 1 2 3 y −3 0 1 4 5 Calculate r. r=. 9853.​(Round to four decimal places as​ needed.) Calculate r2. r2=0.9709​(Round to four decimal places as​ needed.) Interpret r. Choose the correct answer below. A.There is not enough information to answer this question. B.There is a very strong negative linear relationship...
Construct a scattergram for each data set. Then calculate r and r 2 for each data...
Construct a scattergram for each data set. Then calculate r and r 2 for each data set. Interpret their values. Complete parts a through d a. x −1 0 1 2 3 y −3 0 1 4 5 Calculate r. r=. 9853 ​(Round to four decimal places as​ needed.) Calculate r2. r2=0.9709. ​(Round to four decimal places as​ needed.) Interpret r. Choose the correct answer below. A.There is not enough information to answer this question. B.There is a very strong...
a) Use the data in the table and calculate the average costs and the marginal cost...
a) Use the data in the table and calculate the average costs and the marginal cost Output(units) Total cost AFC AVC ATC MC 0 $400 10 540 20 620 30 810 40 910 b) Discuss the relationship between ATC and MC. Also draw a graph showing both curves.
Calculate taxable income for Rod Thirion, who files head of household and claims 3 exemptions: salary,...
Calculate taxable income for Rod Thirion, who files head of household and claims 3 exemptions: salary, $59,800; wages, $25,200; capital gains, $12,500; mortgage interest, $12,970; charitable contributions, $410
Calculate the average value of the numbers 3, 3, 5, 5 first by calculating the normal...
Calculate the average value of the numbers 3, 3, 5, 5 first by calculating the normal average and then by calculating the weighted average. Are the two results the same?
In this question you will work with A*. Let’s suppose that you are given an upper...
In this question you will work with A*. Let’s suppose that you are given an upper bound Z to the cost of the optimal solution of a search problem. We decide to modify A* in the following manner: every time that a node i with f(i) > Z is generated, the node is immediately discarded. • Is this new algorithm guaranteed to find the optimal solution? Explain your answer and state your assumptions. • What will happen if you use...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT