In: Statistics and Probability
Assignment #8
Problem A: Linear Regression
1) Below is some data from the late 1800s and early 1900s indicating a correlation between the number of ministers in New England and the amount of Cuban rum imported to Boston during those years.
Year |
Ministers |
Cuban rum |
1860 |
8376 |
63 |
1865 |
6406 |
48 |
1870 |
7005 |
53 |
1875 |
8486 |
64 |
1880 |
9595 |
72 |
1885 |
10643 |
80 |
1890 |
11265 |
85 |
1895 |
10071 |
76 |
1900 |
10547 |
80 |
1905 |
11008 |
83 |
1910 |
13885 |
105 |
1915 |
18559 |
140 |
1920 |
23024 |
175 |
1925 |
24185 |
183 |
1930 |
25434 |
192 |
1935 |
29238 |
221 |
1940 |
34705 |
262 |
the values of ?0 and ?1?
5. Find the residual for 23024 ministers.
Soln
Ministers (X) |
Cuban rum (Y) |
X2 |
Y2 |
XY |
|
8376 |
63 |
70157376 |
3969 |
527688 |
|
6406 |
48 |
41036836 |
2304 |
307488 |
|
7005 |
53 |
49070025 |
2809 |
371265 |
|
8486 |
64 |
72012196 |
4096 |
543104 |
|
9595 |
72 |
92064025 |
5184 |
690840 |
|
10643 |
80 |
1.13E+08 |
6400 |
851440 |
|
11265 |
85 |
1.27E+08 |
7225 |
957525 |
|
10071 |
76 |
1.01E+08 |
5776 |
765396 |
|
10547 |
80 |
1.11E+08 |
6400 |
843760 |
|
11008 |
83 |
1.21E+08 |
6889 |
913664 |
|
13885 |
105 |
1.93E+08 |
11025 |
1457925 |
|
18559 |
140 |
3.44E+08 |
19600 |
2598260 |
|
23024 |
175 |
5.3E+08 |
30625 |
4029200 |
|
24185 |
183 |
5.85E+08 |
33489 |
4425855 |
|
25434 |
192 |
6.47E+08 |
36864 |
4883328 |
|
29238 |
221 |
8.55E+08 |
48841 |
6461598 |
|
34705 |
262 |
1.2E+09 |
68644 |
9092710 |
|
Total |
262432 |
1982 |
5.26E+09 |
300140 |
39721046 |
Using the above formula and values, we get
Correlation Coefficient (r) = 0.99999
The positive sign of Correlation Coefficient indicates direct relationship between Ministers and Cuban rum ie as one variable increases, another also increases and vice versa. Also, the magnitude of r indicates very strong relationship between the two variables
2)
Let the regression equation be: Y = a + bX
Where
Slope(b) = {n*∑XY - ∑X *∑Y}/{n*∑X2 – (∑X)2 } = 0.5
and a = ∑Y/n – b*∑X/n = 4
Hence,
Cuban Rum = 0.01 * Ministers – 0.25
Where
β0 = -0.25
β1 = 0.01
3)
Interpretation of Slope
For every 1 unit increase in Ministers, Cuban Rum increases by 0.01 units
4)
When Ministers = 8000
Using the regression equation, we get:
Cuban Rum = 0.01 * 8000 – 0.25 = 60.30
This is an example of extrapolation
5)
From the above scatterplot we can conclude that there is a linear relationship between the two variables
Correlation Hypothesis Test
Alpha = 0.05
df = n-2 = 17-2 = 15
Null and Alternate Hypothesis
H0: ρ = 0 (No Correlation)
Ha: ρ <> 0 (Correlation is present)
n = 17
we will be doing a two-tailed hypothesis test
Test Statistic
t = r* (n-2)1/2/(1-r2)1/2 = 0.99999 *(17-2)1/2 /(1-0.999992)1/2 = 742.16
p-value = TDIST(742.16,17-2,2) = 1.16018E-35
Result
Since the p-value is less than 0.05, we reject the null hypothesis.
Conclusion
We conclude that there is correlation between Ministers and Cuban Rum