In: Statistics and Probability
code i wrote in R
#ANAND KUMAR
library(ggplot2)
library(onehot)
?diamonds
data = diamonds
data = as.data.frame(data)
View(data)
corrplot(data, na.rm = T)
#model 1 of price vs carat
summary(lm(data$price ~ data$carat))$r.squared
ggplot(data, aes(x = data$carat, y = data$price)) + geom_point()
+
ggtitle("Scatter Diagram with estimated regression line on carat vs
price") + xlab("carat") + ylab("price") +
stat_smooth(method = "lm", se = F)
#model 2 of x vs carat
summary(lm(data$price ~ data$x))$r.squared
ggplot(data, aes(x = data$x, y = data$price)) + geom_point()
+
ggtitle("Scatter Diagram with estimated regression line on x vs
price") + xlab("x") + ylab("price") +
stat_smooth(method = "lm", se = F)
#model 3 of y vs carat
summary(lm(data$price ~ data$y))$r.squared
ggplot(data, aes(x = data$y, y = data$price)) + geom_point()
+
ggtitle("Scatter Diagram with estimated regression line on y vs
price") + xlab("y") + ylab("price") +
stat_smooth(method = "lm", se = F)
#model 4 of z vs carat
summary(lm(data$price ~ data$z))$r.squared
ggplot(data, aes(x = data$z, y = data$price)) + geom_point()
+
ggtitle("Scatter Diagram with estimated regression line on z vs
price") + xlab("z") + ylab("price") +
stat_smooth(method = "lm", se = F)
#model 5 combining price with carat, x, y, z
summary(lm(data$price ~ data$z + data$y + data$x +
data$carat))$r.squared
summary(data$color)
onehot(data$color)
inherits(data, "data.frame")
The model 1 had an R squared value of 0.84 and the combined model had an R squared value of 0.85 which fits data reasonably well, and depth and table values are significantly correlated with x,y,z values which isn't good while doing regression, so those columns haven't been taken into account.