In: Statistics and Probability
5. You have been given the task of determining if the number of board feet of lumber (Volume) from a black cherry tree can be predicted from the diameter of the tree at 54 inches from the ground. You have data from 31 black cherry trees in New Zealand about their volume, height, and diameter. You need to analyze the data and write a couple of sentences discussing the questions below. Follow the steps we use in class to analyze the simple regression model. (13 points) Try to run the following charts, scatter plots, statistics, etc. to support the use of the linear regression model for this situation. (3 points) a. Scatterplot, IV vs. DV with regression line and r2 (Excel) (1 point) b. Residuals (Excel), Scatterplot, Residuals vs. IV (residual plot) (Excel) (1 point) c. Regression Statistics (b0 and b1, multiple r, r2, SST, SSR, and SSE) highlighted in your regression output (1 point) Analyze the model and answer the following questions in paragraph form. Give me a few sentences as justification using the values from your output. (10 points) a. Can you predict volume with diameter using a linear regression model? Were any violations of the assumptions of linear regression found? (2 points) b. How accurate is your model? (i.e., What is the coefficient of determination?) (2 points) c. What is the magnitude of the relationship between volume and diameter? (2 points) d. What would you predict volume to be if diameter is 17.4 inches? (2 points) e. Is there a statistically significant relationship between last volume and diameter? (2 points )
Diam | Height | Volume |
8.3 | 70 | 10.3 |
8.6 | 65 | 10.3 |
8.8 | 63 | 10.2 |
10.5 | 72 | 16.4 |
10.7 | 81 | 18.8 |
10.8 | 83 | 19.7 |
11.0 | 66 | 15.6 |
11.0 | 75 | 18.2 |
11.1 | 80 | 22.6 |
11.2 | 75 | 19.9 |
11.3 | 79 | 24.2 |
11.4 | 76 | 21.0 |
11.4 | 76 | 21.4 |
11.7 | 69 | 21.3 |
12.0 | 75 | 19.1 |
12.9 | 74 | 22.2 |
12.9 | 85 | 33.8 |
13.3 | 86 | 27.4 |
13.7 | 71 | 25.7 |
13.8 | 64 | 24.9 |
14.0 | 78 | 34.5 |
14.2 | 80 | 31.7 |
14.5 | 74 | 36.3 |
16.0 | 72 | 38.3 |
16.3 | 77 | 42.6 |
17.3 | 81 | 55.4 |
17.5 | 82 | 55.7 |
17.9 | 80 | 58.3 |
18.0 | 80 | 51.5 |
18.0 | 80 | 51.0 |
20.6 | 87 | 77.0 |
Solution:
a. Can you predict volume with diameter using a linear regression model? Were any violations of the assumptions of linear regression found?
Yes, we can predict the volume with diameter using a linear regression model, because the p-value for this regression model is given as 0.00 which is less than 5% level of significance or alpha value 0.05. There are no any violations of the assumptions of linear regression found. The regression coefficients for this regression models are statistically significant as the p-values for these coefficients are less than significance level 0.05.
b. How accurate is your model? (i.e., what is the coefficient of determination?
The value for the R square or coefficient of determination is given as 0.9353, which means about 93.53% of the variation in the dependent variable volume is explained by the independent variable diameter. So, we can say that given regression model is 93.53% accurate.
c. What is the magnitude of the relationship between volume and diameter?
The correlation coefficient between the dependent variable volume and independent variable diameter is given as 0.9671, this means there is a strong positive linear association or relationship or correlation exists between the given two variables volume and diameter.
d. What would you predict volume to be if diameter is 17.4 inches?
From given regression output, we are given a regression model for prediction of volume as below:
Volume = -36.9435 + 5.0659*Diameter
We are given Diameter = 17.4
Volume = -36.9435 + 5.0659*17.4
Volume = 51.20316
e. Is there a statistically significant relationship between last volume and diameter?
There is a statistically significant relationship exists between the two variables volume and diameter, because the p-value for the regression model is given as 0.00 approximately. This p-value is less than alpha value 0.05 and it is indicated that there is a statistically significant relationship exists between two variables.
Excel Outputs and Explanation:
Here, we have to analyze given data for three variables diameter, height, and volume. We have to use descriptive statistics, scatter plots, correlation coefficients, regression analysis, residual plots, etc. We have to construct the regression model for the prediction of dependent variable or response variable volume based on the independent variable diameter. Also, we have to see the regression model for the prediction of response variable volume based on the independent variables diameter and height. We have to use Excel data analysis for analyzing this data. Excel outputs for this analysis is given as below:
Descriptive statistics for the given three variables are summarised as below:
Diameter |
Height |
Volume |
|
Mean |
13.2483871 |
76 |
30.17096774 |
Standard Error |
0.563626334 |
1.144411386 |
2.952324375 |
Median |
12.9 |
76 |
24.2 |
Mode |
11 |
80 |
10.3 |
Standard Deviation |
3.138138617 |
6.371812929 |
16.43784644 |
Sample Variance |
9.847913978 |
40.6 |
270.2027957 |
Kurtosis |
-0.435421384 |
-0.45110847 |
0.772668312 |
Skewness |
0.553465175 |
-0.39420581 |
1.119260015 |
Range |
12.3 |
24 |
66.8 |
Minimum |
8.3 |
63 |
10.2 |
Maximum |
20.6 |
87 |
77 |
Sum |
410.7 |
2356 |
935.3 |
Count |
31 |
31 |
31 |
Scatter plots between different variables are given as below:
Correlation coefficients between the different variables are summarised as below:
Diameter |
Height |
Volume |
|
Diameter |
1 |
||
Height |
0.51928007 |
1 |
|
Volume |
0.96711937 |
0.59824965 |
1 |
Regression model for the prediction of dependent variable Volume based on the independent variable diameter is given as below:
SUMMARY OUTPUT |
||||||
Regression Statistics |
||||||
Multiple R |
0.967119368 |
|||||
R Square |
0.935319872 |
|||||
Adjusted R Square |
0.933089523 |
|||||
Standard Error |
4.251987522 |
|||||
Observations |
31 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
1 |
7581.781332 |
7581.781332 |
419.3602785 |
8.64433E-19 |
|
Residual |
29 |
524.3025387 |
18.07939788 |
|||
Total |
30 |
8106.083871 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
-36.94345912 |
3.365144948 |
-10.97826682 |
7.62145E-12 |
-43.82595322 |
-30.06096503 |
Diameter |
5.065856423 |
0.24737695 |
20.47828798 |
8.64433E-19 |
4.559913759 |
5.571799086 |
RESIDUAL OUTPUT |
||||||
Observation |
Predicted Volume |
Residuals |
||||
1 |
5.103149185 |
5.196850815 |
||||
2 |
6.622906112 |
3.677093888 |
||||
3 |
7.636077396 |
2.563922604 |
||||
4 |
16.24803332 |
0.151966685 |
||||
5 |
17.2612046 |
1.5387954 |
||||
6 |
17.76779024 |
1.932209758 |
||||
7 |
18.78096153 |
-3.180961527 |
||||
8 |
18.78096153 |
-0.580961527 |
||||
9 |
19.28754717 |
3.312452831 |
||||
10 |
19.79413281 |
0.105867189 |
||||
11 |
20.30071845 |
3.899281546 |
||||
12 |
20.8073041 |
0.192695904 |
||||
13 |
20.8073041 |
0.592695904 |
||||
14 |
22.32706102 |
-1.027061023 |
||||
15 |
23.84681795 |
-4.74681795 |
||||
16 |
28.40608873 |
-6.20608873 |
||||
17 |
28.40608873 |
5.39391127 |
||||
18 |
30.4324313 |
-3.032431299 |
||||
19 |
32.45877387 |
-6.758773868 |
||||
20 |
32.96535951 |
-8.065359511 |
||||
21 |
33.9785308 |
0.521469205 |
||||
22 |
34.99170208 |
-3.29170208 |
||||
23 |
36.51145901 |
-0.211459007 |
||||
24 |
44.11024364 |
-5.810243641 |
||||
25 |
45.63000057 |
-3.030000568 |
||||
26 |
50.69585699 |
4.704143009 |
||||
27 |
51.70902828 |
3.990971725 |
||||
28 |
53.73537084 |
4.564629156 |
||||
29 |
54.24195649 |
-2.741956487 |
||||
30 |
54.24195649 |
-3.241956487 |
||||
31 |
67.41318319 |
9.586816814 |
Residual plot for this regression model is given as below:
Regression model for the prediction of dependent variable Volume based on the independent variables diameter and height is given as below:
SUMMARY OUTPUT |
||||||
Regression Statistics |
||||||
Multiple R |
0.973627258 |
|||||
R Square |
0.947950038 |
|||||
Adjusted R Square |
0.944232183 |
|||||
Standard Error |
3.881832038 |
|||||
Observations |
31 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
2 |
7684.162512 |
3842.081256 |
254.9723374 |
1.07124E-18 |
|
Residual |
28 |
421.9213592 |
15.06861997 |
|||
Total |
30 |
8106.083871 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
-57.98765892 |
8.638225865 |
-6.712913024 |
2.74951E-07 |
-75.68226224 |
-40.2930556 |
Diameter |
4.708160503 |
0.264264609 |
17.81608409 |
8.2233E-17 |
4.166838997 |
5.249482009 |
Height |
0.339251234 |
0.130151181 |
2.606593597 |
0.014490974 |
0.07264863 |
0.605853839 |
RESIDUAL OUTPUT |
||||||
Observation |
Predicted Volume |
Residuals |
||||
1 |
4.837659654 |
5.462340346 |
||||
2 |
4.553851633 |
5.746148367 |
||||
3 |
4.816981266 |
5.383018734 |
||||
4 |
15.87411523 |
0.525884771 |
||||
5 |
19.86900844 |
-1.069008438 |
||||
6 |
21.01832696 |
-1.318326957 |
||||
7 |
16.19268807 |
-0.592688075 |
||||
8 |
19.24594918 |
-1.045949183 |
||||
9 |
21.4130214 |
1.186978595 |
||||
10 |
20.18758128 |
-0.287581284 |
||||
11 |
22.01540227 |
2.184597729 |
||||
12 |
21.46846462 |
-0.468464619 |
||||
13 |
21.46846462 |
-0.068464619 |
||||
14 |
20.50615413 |
0.79384587 |
||||
15 |
23.95410969 |
-4.854109686 |
||||
16 |
27.8522029 |
-5.652202905 |
||||
17 |
31.58396648 |
2.216033519 |
||||
18 |
33.80648192 |
-6.406481917 |
||||
19 |
30.6009776 |
-4.900977604 |
||||
20 |
28.69703501 |
-3.797035015 |
||||
21 |
34.38818439 |
0.111815605 |
||||
22 |
36.00831896 |
-4.308318964 |
||||
23 |
35.38525971 |
0.914740291 |
||||
24 |
41.768998 |
-3.468997996 |
||||
25 |
44.87770232 |
-2.277702318 |
||||
26 |
50.94286776 |
4.457132242 |
||||
27 |
52.22375109 |
3.476248908 |
||||
28 |
53.42851283 |
4.871487175 |
||||
29 |
53.89932888 |
-2.399328876 |
||||
30 |
53.89932888 |
-2.899328876 |
||||
31 |
68.51530482 |
8.484695177 |