In: Math
Test if the population mean price for clarity “VS1” is different than that for clarity “VVS1 or VVS2”.
Please answer with R programming code
Clarity | Price |
VS2 | 1302 |
VS1 | 1510 |
VVS1 | 1510 |
VS1 | 1260 |
VS1 | 1641 |
VS1 | 1555 |
VS1 | 1427 |
VVS2 | 1427 |
VS2 | 1126 |
VS1 | 1126 |
VS1 | 1468 |
VS2 | 1202 |
VS2 | 1327 |
VS2 | 1098 |
VS1 | 1693 |
VS1 | 1551 |
VS1 | 1410 |
VS2 | 1269 |
VS1 | 1316 |
VS2 | 1222 |
VS1 | 1738 |
VS1 | 1593 |
VS1 | 1447 |
VS2 | 1255 |
VS1 | 1635 |
VVS2 | 1485 |
VS2 | 1420 |
VS1 | 1420 |
VS1 | 1911 |
VS1 | 1525 |
VS1 | 1956 |
VVS2 | 1747 |
VS1 | 1572 |
VVS2 | 2942 |
VVS2 | 2532 |
VS1 | 3501 |
VS1 | 3501 |
VVS2 | 3501 |
VS1 | 3293 |
VS1 | 3016 |
VVS2 | 3567 |
VS1 | 3205 |
VS2 | 3490 |
VS1 | 3635 |
VVS2 | 3635 |
VS1 | 3418 |
VS1 | 3921 |
VVS2 | 3701 |
VS1 | 3480 |
VVS2 | 3407 |
VS1 | 3767 |
VVS1 | 4066 |
VVS2 | 4138 |
VS1 | 3605 |
VVS2 | 3529 |
VS1 | 3667 |
VVS2 | 2892 |
VVS2 | 3651 |
VVS2 | 3773 |
VS1 | 4291 |
VVS1 | 5845 |
VVS2 | 4401 |
VVS1 | 4759 |
VVS1 | 4300 |
VS1 | 5510 |
VS1 | 5122 |
VVS2 | 5122 |
VS2 | 3861 |
VVS2 | 5881 |
VS1 | 5586 |
VS2 | 5193 |
VVS2 | 5193 |
VS2 | 5263 |
VVS2 | 5441 |
VS2 | 4948 |
VS2 | 5705 |
VS2 | 6805 |
VVS2 | 6882 |
VS1 | 6709 |
VVS2 | 6682 |
VS1 | 3501 |
VVS1 | 3432 |
VVS1 | 3851 |
IF | 3605 |
VS1 | 3900 |
VVS1 | 3415 |
IF | 4291 |
IF | 6512 |
VS1 | 5800 |
VVS1 | 6285 |
To perform the analysis in R, first we have to load the data in R Studio. We can do this by two methods, either we load it directly or we can use the Excel/CSV Format. I have used the CSV format here and labelled the file Clarity_data.
#Printing the data
> print(Clarity_data)
Clarity Price
1 VS2 1302
2 VS1 1510
3 VVS1 1510
4 VS1 1260
5 VS1 1641
6 VS1 1555
7 VS1 1427
8 VVS2 1427
9 VS2 1126
10 VS1 1126
11 VS1 1468
12 VS2 1202
13 VS2 1327
14 VS2 1098
15 VS1 1693
16 VS1 1551
17 VS1 1410
18 VS2 1269
19 VS1 1316
20 VS2 1222
21 VS1 1738
22 VS1 1593
23 VS1 1447
24 VS2 1255
25 VS1 1635
26 VVS2 1485
27 VS2 1420
28 VS1 1420
29 VS1 1911
30 VS1 1525
31 VS1 1956
32 VVS2 1747
33 VS1 1572
34 VVS2 2942
35 VVS2 2532
36 VS1 3501
37 VS1 3501
38 VVS2 3501
39 VS1 3293
40 VS1 3016
41 VVS2 3567
42 VS1 3205
43 VS2 3490
44 VS1 3635
45 VVS2 3635
46 VS1 3418
47 VS1 3921
48 VVS2 3701
49 VS1 3480
50 VVS2 3407
51 VS1 3767
52 VVS1 4066
53 VVS2 4138
54 VS1 3605
55 VVS2 3529
56 VS1 3667
57 VVS2 2892
58 VVS2 3651
59 VVS2 3773
60 VS1 4291
61 VVS1 5845
62 VVS2 4401
63 VVS1 4759
64 VVS1 4300
65 VS1 5510
66 VS1 5122
67 VVS2 5122
68 VS2 3861
69 VVS2 5881
70 VS1 5586
71 VS2 5193
72 VVS2 5193
73 VS2 5263
74 VVS2 5441
75 VS2 4948
76 VS2 5705
77 VS2 6805
78 VVS2 6882
79 VS1 6709
80 VVS2 6682
81 VS1 3501
82 VVS1 3432
83 VVS1 3851
84 IF 3605
85 VS1 3900
86 VVS1 3415
87 IF 4291
88 IF 6512
89 VS1 5800
90 VVS1 6285
#We do a basic EDA (Exploratory Data Analysis) of
Clarity Data
> summary(Clarity_data)
Clarity Price
Length:90 Min. :1098
Class :character 1st Qu.:1552
Mode :character Median :3496
Mean :3301
3rd Qu.:4291
Max. :6882
Here we get the overall five number summary of the data.
> dim(Clarity_data)
[1] 90 2
Here we see the dimensions of the data.
> str(Clarity_data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 90 obs. of 2
variables:
$ Clarity: chr "VS2" "VS1" "VVS1" "VS1" ...
$ Price : num 1302 1510 1510 1260 1641 ...
Here we see the data structure that is character and numeric.
> unique(Clarity_data$Clarity)
[1] "VS2" "VS1" "VVS1" "VVS2" "IF"
Here we see the list of unique clarity entities.
> count(Clarity_data$Clarity)
x freq
1 IF 3
2 VS1 40
3 VS2 16
4 VVS1 9
5 VVS2 22
Here we observe the frequency of each entity.
#To Test if the population mean price for clarity “VS1” is different than that for clarity “VVS1 or VVS2”., we have to use the R package crunch. Since it is already installed in my system, I simply call out the library.
> library(crunch)
> Clarity_VS1 <- Clarity_data[Clarity_data$Clarity ==
"VS1",]
> mean(Clarity_VS1$Price)
[1] 2829.55
We observe the mean value for VS1 as
2829.55.
> Clarity_VVS1 <- Clarity_data[Clarity_data$Clarity ==
"VVS1",]
> mean(Clarity_VVS1$Price)
[1] 4162.556
We observe the mean value for VVS1 as
4162.556.
> Clarity_VVS2 <- Clarity_data[Clarity_data$Clarity ==
"VVS2",]
> mean(Clarity_VVS2$Price)
[1] 3887.682
We observe the mean value for VVS2 as 3887.682.
So after performing our analyses in R we observe that the population mean price of VS1 different from VVS1 and VVS2.