In: Math
2M2_IND3. Prices of diamond jewelry are based on the “4Cs” ofdiamonds: cut, color, clarity, and carat. A jeweler is trying to estimate the price of diamond earrings based on color, carats, and clarity. The jeweler has collected some data on 22 diamond pieces and the data is shown in Worksheet IND3. The jeweler wouldlike to build a multiple regression model to estimate the price of the pieces based on color, carats, and clarity.a)Prepare a scatter plot showing the relationship betweenthe price and each of the independent variables.b)If the jeweler wanted to build a regression model using only one independent variable to predict price, which variable should be used?c)Why?d)How do you use the value of Significance F in the model with only one independent variable?e)If the jeweler wanted to build a regression model using twoindependent variables to predict price, which variable should be addedto the variable selected in the one independent variable model?f)Why?g)If the jeweler wanted to build a regression model using three independent variables to predict price, which variable should be addedto the variables selectedfor the two variable model?h)Why?i)Based on your best model, how should the jeweler price a diamond with a color of 2.75, a clarity of 3.00, and a weight of 0.85 carats?j)How do you use the value of Significance F in the multiple regression model?k)Does there appear to be any multicollinearity among the independent variables?l)How can you tell if you have multicollinearity?
| Color | Clarity | Carats | Price | 
| 2.50 | 1.50 | 0.50 | 474.99 | 
| 3.50 | 4.00 | 0.50 | 539.99 | 
| 3.50 | 4.50 | 0.70 | 549.99 | 
| 3.00 | 3.50 | 0.75 | 523.99 | 
| 3.00 | 3.50 | 0.75 | 523.99 | 
| 3.50 | 4.00 | 0.75 | 539.99 | 
| 1.50 | 3.50 | 0.75 | 664.99 | 
| 1.50 | 2.00 | 0.75 | 699.99 | 
| 2.50 | 3.50 | 0.75 | 902.99 | 
| 2.50 | 1.50 | 0.75 | 1,128.99 | 
| 2.50 | 1.50 | 0.75 | 1,139.99 | 
| 3.00 | 2.00 | 0.75 | 1,125.00 | 
| 3.50 | 4.00 | 1.00 | 799.99 | 
| 3.50 | 4.50 | 1.00 | 899.99 | 
| 2.50 | 3.50 | 1.00 | 999.99 | 
| 3.00 | 3.50 | 1.00 | 1,082.99 | 
| 3.00 | 3.50 | 1.00 | 1,082.99 | 
| 1.50 | 3.50 | 1.00 | 1,329.99 | 
| 2.50 | 1.50 | 1.00 | 1,329.99 | 
| 1.50 | 3.50 | 1.00 | 1,399.99 | 
| 2.50 | 1.50 | 1.00 | 1,624.99 | 
| 3.50 | 3.00 | 1.00 | 1,625.00 | 
a)Prepare a scatter plot showing the relationship between the price and each of the independent variables.
scatter plot between "Price" and "Color" by using R, code for scatter plot in R is "plot(data$Price, data$Color)"

scatter plot between "Price" and "Clarity" by using R, code for scatter plot in R is "plot(data$Price, data$Clarity)"

scatter plot between "Price" and "Carats" by using R, code for scatter plot in R is "plot(data$Price, data$Carats)"

b)If the jeweler wanted to build a regression model using only one independent variable to predict price, which variable should be used?
Regression model by using Price as an Dependent variable and Color as an independent Variable.
> model1= lm(data$Price ~ data$Color, data = data)
> summary(model1)
Call:
lm(formula = data$Price ~ data$Color, data = data)
Residuals:
    Min      1Q  Median      3Q     Max 
-8.1859 -3.7140  0.4714  4.1933  9.1287 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   7.2981     4.6340   1.575    0.131
data$Color    0.6293     1.6609   0.379    0.709
Residual standard error: 5.338 on 20 degrees of freedom
Multiple R-squared:  0.007126,  Adjusted R-squared:  -0.04252 
F-statistic: 0.1435 on 1 and 20 DF,  p-value: 0.7088
Regression model by using Price as an Dependent variable and Clarity as an independent Variable.
> model2= lm(data$Price ~ data$Clarity, data = data)
> summary(model2)
Call:
lm(formula = data$Price ~ data$Clarity, data = data)
Residuals:
    Min      1Q  Median      3Q     Max 
-9.0396 -2.2153 -0.1832  3.3911  7.9604 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)    2.0347     3.1941   0.637   0.5313  
data$Clarity   2.2871     0.9944   2.300   0.0323 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.764 on 20 degrees of freedom
Multiple R-squared:  0.2092,    Adjusted R-squared:  0.1696 
F-statistic:  5.29 on 1 and 20 DF,  p-value: 0.03234
Regression model by using Price as a Dependent variable and Carats as an Independent Variable.
> model3= lm(data$Price ~ data$Carats, data = data)
> summary(model3)
Call:
lm(formula = data$Price ~ data$Carats, data = data)
Residuals:
    Min      1Q  Median      3Q     Max 
-7.4052 -3.2623  0.0948  3.2876  9.7377 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   12.834      5.947   2.158   0.0433 *
data$Carats   -4.572      6.962  -0.657   0.5189  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.3 on 20 degrees of freedom
Multiple R-squared:  0.02111,   Adjusted R-squared:  -0.02784 
F-statistic: 0.4312 on 1 and 20 DF,  p-value: 0.5189
c)Why?
We create 3 models by using three different variables, but in the second model where an independent variable is "Clarity", we got maximum R square value as well as minimum Residual standard error value.
d)How do you use the value of Significance F in the model with only one independent variable?
we got p-value = 0.03234 if the p-value is smaller than 0.05 it means that our variable is significant to predict.
e)If the jeweler wanted to build a regression model using two independent variables to predict price, which variable should be added to the variable selected in the one independent variable model?
Regression model by using Price as a Dependent variable, Color+Clarity as an Independent Variable.
> model4= lm(data$Price ~ Color+Clarity, data = data)
> summary(model4)
Call:
lm(formula = data$Price ~ Color + Clarity, data = data)
Residuals:
    Min      1Q  Median      3Q     Max 
-8.9019 -2.0739  0.0981  3.2459  7.7171 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.5094     4.5386   0.773   0.4489  
Color        -0.7619     1.6322  -0.467   0.6460  
Clarity       2.4795     1.0949   2.265   0.0354 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.86 on 19 degrees of freedom
Multiple R-squared:  0.2182,    Adjusted R-squared:  0.1359 
F-statistic: 2.651 on 2 and 19 DF,  p-value: 0.09653
Regression model by using Price as a Dependent variable, Color+Carats as an Independent Variable.
> model5= lm(data$Price ~ Color+Carats, data = data)
> summary(model5)
Call:
lm(formula = data$Price ~ Color + Carats, data = data)
Residuals:
   Min     1Q Median     3Q    Max 
-7.550 -3.045 -0.322  3.690  9.819 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  11.2036     7.9176   1.415    0.173
Color         0.5448     1.6930   0.322    0.751
Carats       -4.3847     7.1469  -0.614    0.547
Residual standard error: 5.423 on 19 degrees of freedom
Multiple R-squared:  0.02641,   Adjusted R-squared:  -0.07607 
F-statistic: 0.2577 on 2 and 19 DF,  p-value: 0.7755
Regression model by using Price as a Dependent variable, Clarity+Carats as an Independent Variable.
> model6= lm(data$Price ~ Clarity+Carats, data = data)
> summary(model6)
Call:
lm(formula = data$Price ~ Clarity + Carats, data = data)
Residuals:
    Min      1Q  Median      3Q     Max 
-8.0507 -2.7564 -0.6747  2.6698  8.9493 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    7.085      5.843   1.212   0.2402  
Clarity        2.418      1.001   2.416   0.0259 *
Carats        -6.496      6.298  -1.031   0.3153  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.756 on 19 degrees of freedom
Multiple R-squared:  0.2511,    Adjusted R-squared:  0.1723 
F-statistic: 3.186 on 2 and 19 DF,  p-value: 0.06411
We create 3 different models by using three different combinations of dependent variables, we select the "Carats" variable for adding with "Clarity" variable.
f)Why?
Because the combination of these two variables gives us maximum R square, as well as minimum error, compared to other models.
g)If the jeweler wanted to build a regression model using three independent variables to predict price, which variable should be added to the variables selected for the two-variable model?
" Color" variable
h)Why?
In the two-variable model, we have the "Carats" variable and "Clarity" variable so we add a " Color" variable for the three-variable model because only variable "Colour" is remaining.
i)Based on your best model, how should the jeweler price a diamond with a color of 2.75, a clarity of 3.00, and a weight of 0.85 carats?
Final model is
> model7= lm(data$Price ~ Clarity+Carats+Color, data = data)
> summary(model7)
Call:
lm(formula = data$Price ~ Clarity + Carats + Color, data = data)
Residuals:
    Min      1Q  Median      3Q     Max 
-7.7825 -2.5493 -0.4644  2.6984  8.7111 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    9.481      7.095   1.336   0.1982  
Clarity        2.685      1.106   2.428   0.0259 *
Carats        -7.056      6.467  -1.091   0.2896  
Color         -1.013      1.640  -0.617   0.5447  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.836 on 18 degrees of freedom
Multiple R-squared:  0.2667,    Adjusted R-squared:  0.1444 
F-statistic: 2.182 on 3 and 18 DF,  p-value: 0.1255
So the equation of the model is
Price = 9.481 + ( -1.013) * Color + (2.685) * Clarity + ( -7.056) * Carats
Price = 9.481 + ( -1.013) * 2.75 + (2.685) * 3.00 + ( -7.056) * 0.85
Price = 8.75265
j)How do you use the value of Significance F in the multiple regression model?
if the p-value is smaller than 0.05 it means that our variable is significant to predict.
k)Does there appear to be any multicollinearity among the independent variables?
cor(data)
              Color   Clarity      Carats       Price
Color    1.00000000 0.3763679 -0.08126966  0.08441445
Clarity  0.37636788 1.0000000  0.12648024  0.45737140
Carats  -0.08126966 0.1264802  1.00000000 -0.14527925
Price    0.08441445 0.4573714 -0.14527925  1.00000000
No multicollinearity appear in this data set.