In: Statistics and Probability
I only need the best and effective advice on choosing the right data mining method for this problem. I am not sure which data model is best.
Choose a data mining method to create the automatization. Use
your model to make predictions on the testing dataset. Evaluate the
precision of your model using Root Mean Square Error (rmse) and
Correlation (cor) as metrics.
You are hired as a data analyst for Diamonds Inc., a company who
appraises diamonds. The company would like to automate appraisals
for future shipments and you are responsible for developing the
method of automatization. For reference, you are given a dataset
containing 48547 diamonds with the following characteristics
(variables)
price |
price in US dollars (\$326--\$18,823) |
carat |
weight of the diamond (0.2--5.01) |
cut |
quality of the cut (Fair, Good, Very Good, Premium, Ideal) |
color |
diamond color, from J (worst) to D (best) |
clarity |
a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) |
x |
length in mm (0--10.74) |
y |
width in mm (0--58.9) |
z |
depth in mm (0--31.8) |
depth |
total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79) |
table |
width of top of diamond relative to widest point (43--95) |