In: Statistics and Probability
Assignment #8
Problem A: Linear Regression
1) Below is some data from the late 1800s and early 1900s indicating a correlation between the number of ministers in New England and the amount of Cuban rum imported to Boston during those years.
| 
 Year  | 
 Ministers  | 
 Cuban rum  | 
| 
 1860  | 
 8376  | 
 63  | 
| 
 1865  | 
 6406  | 
 48  | 
| 
 1870  | 
 7005  | 
 53  | 
| 
 1875  | 
 8486  | 
 64  | 
| 
 1880  | 
 9595  | 
 72  | 
| 
 1885  | 
 10643  | 
 80  | 
| 
 1890  | 
 11265  | 
 85  | 
| 
 1895  | 
 10071  | 
 76  | 
| 
 1900  | 
 10547  | 
 80  | 
| 
 1905  | 
 11008  | 
 83  | 
| 
 1910  | 
 13885  | 
 105  | 
| 
 1915  | 
 18559  | 
 140  | 
| 
 1920  | 
 23024  | 
 175  | 
| 
 1925  | 
 24185  | 
 183  | 
| 
 1930  | 
 25434  | 
 192  | 
| 
 1935  | 
 29238  | 
 221  | 
| 
 1940  | 
 34705  | 
 262  | 
the values of ?0 and ?1?
5. Find the residual for 23024 ministers.
Soln
| 
 Ministers (X)  | 
 Cuban rum (Y)  | 
 X2  | 
 Y2  | 
 XY  | 
|
| 
 8376  | 
 63  | 
 70157376  | 
 3969  | 
 527688  | 
|
| 
 6406  | 
 48  | 
 41036836  | 
 2304  | 
 307488  | 
|
| 
 7005  | 
 53  | 
 49070025  | 
 2809  | 
 371265  | 
|
| 
 8486  | 
 64  | 
 72012196  | 
 4096  | 
 543104  | 
|
| 
 9595  | 
 72  | 
 92064025  | 
 5184  | 
 690840  | 
|
| 
 10643  | 
 80  | 
 1.13E+08  | 
 6400  | 
 851440  | 
|
| 
 11265  | 
 85  | 
 1.27E+08  | 
 7225  | 
 957525  | 
|
| 
 10071  | 
 76  | 
 1.01E+08  | 
 5776  | 
 765396  | 
|
| 
 10547  | 
 80  | 
 1.11E+08  | 
 6400  | 
 843760  | 
|
| 
 11008  | 
 83  | 
 1.21E+08  | 
 6889  | 
 913664  | 
|
| 
 13885  | 
 105  | 
 1.93E+08  | 
 11025  | 
 1457925  | 
|
| 
 18559  | 
 140  | 
 3.44E+08  | 
 19600  | 
 2598260  | 
|
| 
 23024  | 
 175  | 
 5.3E+08  | 
 30625  | 
 4029200  | 
|
| 
 24185  | 
 183  | 
 5.85E+08  | 
 33489  | 
 4425855  | 
|
| 
 25434  | 
 192  | 
 6.47E+08  | 
 36864  | 
 4883328  | 
|
| 
 29238  | 
 221  | 
 8.55E+08  | 
 48841  | 
 6461598  | 
|
| 
 34705  | 
 262  | 
 1.2E+09  | 
 68644  | 
 9092710  | 
|
| 
 Total  | 
 262432  | 
 1982  | 
 5.26E+09  | 
 300140  | 
 39721046  | 
Using the above formula and values, we get
Correlation Coefficient (r) = 0.99999
The positive sign of Correlation Coefficient indicates direct relationship between Ministers and Cuban rum ie as one variable increases, another also increases and vice versa. Also, the magnitude of r indicates very strong relationship between the two variables
2)
Let the regression equation be: Y = a + bX
Where
Slope(b) = {n*∑XY - ∑X *∑Y}/{n*∑X2 – (∑X)2 } = 0.5
and a = ∑Y/n – b*∑X/n = 4
Hence,
Cuban Rum = 0.01 * Ministers – 0.25
Where
β0 = -0.25
β1 = 0.01
3)
Interpretation of Slope
For every 1 unit increase in Ministers, Cuban Rum increases by 0.01 units
4)
When Ministers = 8000
Using the regression equation, we get:
Cuban Rum = 0.01 * 8000 – 0.25 = 60.30
This is an example of extrapolation
5)
From the above scatterplot we can conclude that there is a linear relationship between the two variables
Correlation Hypothesis Test
Alpha = 0.05
df = n-2 = 17-2 = 15
Null and Alternate Hypothesis
H0: ρ = 0 (No Correlation)
Ha: ρ <> 0 (Correlation is present)
n = 17
we will be doing a two-tailed hypothesis test
Test Statistic
t = r* (n-2)1/2/(1-r2)1/2 = 0.99999 *(17-2)1/2 /(1-0.999992)1/2 = 742.16
p-value = TDIST(742.16,17-2,2) = 1.16018E-35
Result
Since the p-value is less than 0.05, we reject the null hypothesis.
Conclusion
We conclude that there is correlation between Ministers and Cuban Rum