In: Statistics and Probability
The data below relates diamond carats to purchase prices.
Carat | 0.31 | 0.32 | 0.36 | 0.38 | 0.4 | 0.43 | 0.45 | 0.48 | 0.5 |
Price | 1641 | 1468 | 1685 | 1420 | 1797 | 1824 | 2043 | 2342 | 2122 |
Run a linear regression model, with the carat being the independent variable ( X ) and the purchase price being the dependent variable ( Y ).
(a) Find the estimate of the intercept for the linear regression model.
(b) Find the estimate of the slope for the linear regression model.
(c) What is the predicted price of a 0.37 carat diamond?
(d) What fraction of the dependent variable’s variation can be explained by this linear regression model?
SOLUTION:
Given in the question
Carat is the independent variable and the purchase price is
dependent variable
Regression equation can be calculated as
Y = a+bX
where a is intercept of regression line and b is slope of
regression line
Slope of regression line can be calculated as
Slope = ((n*Summation(XY)) -
(Summation(X)*Summation(Y))/(n*Summation(X^2) -
(Summation(X))^2))
Carat(X) |
Price(Y) |
X^2 |
Y^2 |
XY |
0.31 |
1641 |
0.0961 |
2692881 |
508.71 |
0.32 |
1468 |
0.1024 |
2155024 |
469.76 |
0.36 |
1685 |
0.1296 |
2839225 |
606.6 |
0.38 |
1420 |
0.1444 |
2016400 |
539.6 |
0.4 |
1797 |
0.16 |
3229209 |
718.8 |
0.43 |
1824 |
0.1849 |
3326976 |
784.32 |
0.45 |
2043 |
0.2025 |
4173849 |
919.35 |
0.48 |
2342 |
0.2304 |
5484964 |
1124.16 |
0.5 |
2122 |
0.25 |
4502884 |
1061 |
3.63 |
16342 |
1.5003 |
30421412 |
6732.3 |
Slope = (9*6732.3 - 3.63*16342) / (9*1.5003 - 3.63*3.63) =
3895.76
Intercept of regression line can be calculated as
Interecept = (Summation(Y) - slope*Summation(X))/n = (16342 -
3895.76*3.63)/9 = 244.49
So regression equation is Y = 244.49 + 3895.76*X
Solution(c)
If X = 0.37 than Y can be calcualted as
Y = 244.49 + 3895.76*X = 244.49 + 3895.76*0.37 = 1685.92
Solution(d)
For calculating coefficient of determination, first we will
calculate correaltion coefficient which can be calculated as
Correlation coefficient = (n*Summation(XY) -
Summation(X)*Summation(Y))/sqrt(((n*Summation(X^2) -
Summation(X)^2))*((n*Summation(Y^2) - Summation(Y)^2))) = (9*6732.3
- 3.63*16342)/sqrt((9*1.5003 - 3.63*3.63)*(9*30421412 - 16342*16342)) =
1269.24/sqrt(0.3258*6731744) = 0.8570
So coefficient of determination can be calculated as
Coefficient of determination = (Correlation coeffcient)^2 =
(0.8570)^2 = 0.7345
So this model explain the 73.45% fraction of the dependent
variables variation can be explained by this linear regression
model.