In: Statistics and Probability
3.2. A Trent Honour’s Thesis student is doing a project examining the health of a native brook trout population in Cold Creek. He has collected length, weight and girth measurements on several hundred fish but only determined the fat content in 40 of the fish. He wants to use the girth and fat content data from the 40 fish to be able to predict the fat content of the other fish. He has asked you to help him construct this predictive model.
Girth (mm) | Fat Content (%) |
90 | 7.2 |
90 | 8.5 |
87 | 7.8 |
85 | 7.5 |
84 | 8.9 |
81 | 8.7 |
76 | 7.6 |
75 | 6.2 |
75 | 8.2 |
70 | 6.9 |
70 | 7.7 |
69 | 7.6 |
67 | 8.6 |
62 | 5.6 |
61 | 5.3 |
60 | 6.7 |
59 | 7.1 |
56 | 8.0 |
53 | 6.1 |
52 | 4.4 |
51 | 5.2 |
50 | 6.5 |
50 | 7.1 |
48 | 3.8 |
46 | 5.5 |
43 | 7.8 |
41 | 6.0 |
37 | 6.8 |
36 | 5.1 |
36 | 3.9 |
33 | 2.6 |
31 | 5.4 |
29 | 4.1 |
29 | 4.8 |
27 | 2.2 |
22 | 4.7 |
22 | 6.4 |
21 | 2.7 |
20 | 2.9 |
20 | 3.0 |
A). Create a well labelled xy scatter plot on the spreadsheet page with the data; make note of the general pattern of the plotted data and whether there are any suspect data points.
B). Use the advanced regression analysis tool to conduct a complete simple linear regression analysis for the data. You will not require the options for "residuals" for this analysis. Remember to complete the five steps of hypothesis testing in the spaces provided on the worksheet.
C). Use the linear regression equation determined in part "B" to calculate a set of "predicted y values" for each observed x value.
D). Add this set of predicted y values (from part C) to the scatter plot created previously - be sure to show this new set of values as a "line" rather than as a set of "markers".
**please be sure to show all work and each of the steps for the hypothesis test so that I can learn and understand**
a)
....
b)
Regression Statistics | ||||||
Multiple R | 0.7782 | |||||
R Square | 0.6056 | |||||
Adjusted R Square | 0.5953 | |||||
Standard Error | 1.1936 | |||||
Observations | 40 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 83.14 | 83.14 | 58.36 | 0.0000 | |
Residual | 38 | 54.14 | 1.42 | |||
Total | 39 | 137.28 | ||||
Coefficients | Standard Error | t Stat | P-value | lower 95% | upper 95% | |
Intercept | 2.4787 | 0.501 | 4.943 | 0.0000 | 1.4636 | 3.49 |
X | 0.067 | 0.009 | 7.639 | 0.0000 | 0.0494 | 0.0849 |
Ho: β1= 0
H1: β1╪ 0
n= 40
alpha = 0.05
estimated std error of slope =Se(ß1) = Se/√Sxx =
1.1936/√18439.1= 0.0088
t stat = estimated slope/std error =ß1 /Se(ß1) =
(0.0671-0)/0.0088= 7.64
Degree of freedom ,df = n-2= 38
p-value = 0.0000
decison : p-value<α , reject Ho
Conclusion: Reject Ho and conclude that slope is
significantly different from zero
..............
c
x | y | Ŷ | ||||
90 | 7.2 | 8.52 | ||||
90 | 8.5 | 8.52 | ||||
87 | 7.8 | 8.32 | ||||
85 | 7.5 | 8.19 | ||||
84 | 8.9 | 8.12 | ||||
81 | 8.7 | 7.92 | ||||
76 | 7.6 | 7.58 | ||||
75 | 6.2 | 7.51 | ||||
75 | 8.2 | 7.51 | ||||
70 | 6.9 | 7.18 | ||||
70 | 7.7 | 7.1791 | ||||
69 | 7.6 | 7.1120 | ||||
67 | 8.6 | 6.977657 | ||||
62 | 5.6 | 6.641912 | ||||
61 | 5.3 | 6.574763 | ||||
60 | 6.7 | 6.507615 | ||||
59 | 7.1 | 6.440466 | ||||
56 | 8 | 6.239019 | ||||
53 | 6.1 | 6.037572 | ||||
52 | 4.4 | 5.970423 | ||||
51 | 5.2 | 5.903275 | ||||
50 | 6.5 | 5.836126 | ||||
50 | 7.1 | 5.836126 | ||||
48 | 3.8 | 5.701828 | ||||
46 | 5.5 | 5.56753 | ||||
43 | 7.8 | 5.366083 | ||||
41 | 6 | 5.231786 | ||||
37 | 6.8 | 4.96319 | ||||
36 | 5.1 | 4.896041 | ||||
36 | 3.9 | 4.896041 | ||||
33 | 2.6 | 4.694595 | ||||
31 | 5.4 | 4.560297 | ||||
29 | 4.1 | 4.425999 | ||||
29 | 4.8 | 4.425999 | ||||
27 | 2.2 | 4.291701 | ||||
22 | 4.7 | 3.955957 | ||||
22 | 6.4 | 3.955957 | ||||
21 | 2.7 | 3.888808 | ||||
20 | 2.9 | 3.821659 | ||||
20 | 3 | 3.821659 |
..................
Please let me know in case of any doubt.
Thanks in advance!
Please upvote!