In: Statistics and Probability
The following dataset contains a random sample of countries. Two variables are included: GDP per capita and infant mortality rate per 1,000 live births. Determine the equation of the best fit line and calculate the r-squared. Interpret all findings.
If you do not show your work for obtaining each portion of the regression equation and r-squared, you will lose extensive points on this exercise.
Country |
GDP per Capita (USD) |
Infant Mortality Rate |
Malaysia |
9766.166 |
6 |
Slovak Republic |
15962.57 |
5.8 |
Central African Republic |
306.7788 |
91.5 |
Cabo Verde |
3131.131 |
20.7 |
Denmark |
52002.15 |
2.9 |
Barbados |
15660.68 |
12 |
Uganda |
675.5735 |
37.7 |
Maldives |
7681.076 |
7.4 |
Kiribati |
1291.88 |
43.6 |
Palau |
13498.66 |
14.2 |
Solution:
Let independent variable X = GDP per Capita (USD)
Dependent variable Y = Infant Mortality Rate
Required formulas are given as below:
Correlation coefficient = r = [n∑xy - ∑x∑y]/sqrt[(n∑x^2 – (∑x)^2)*(n∑y^2 – (∑y)^2)]
Slope b = (∑XY – n*Xbar*Ybar)/(∑X^2 – n*Xbar^2)
Intercept a = Ybar – b*Xbar
The calculation table is given as below:
No. |
x |
Y |
X^2 |
Y^2 |
XY |
1 |
9766 |
6 |
95377998.34 |
36.00 |
58597.00 |
2 |
15963 |
5.8 |
254803641.00 |
33.64 |
92582.91 |
3 |
306.8 |
91.5 |
94113.23 |
8372.25 |
28070.26 |
4 |
3131 |
20.7 |
9803981.34 |
428.49 |
64814.41 |
5 |
52002 |
2.9 |
2704223604.62 |
8.41 |
150806.24 |
6 |
15661 |
12 |
245256898.06 |
144.00 |
187928.16 |
7 |
675.6 |
37.7 |
456399.55 |
1421.29 |
25469.12 |
8 |
7681 |
7.4 |
58998928.52 |
54.76 |
56839.96 |
9 |
1292 |
43.6 |
1668953.93 |
1900.96 |
56325.97 |
10 |
13499 |
14.2 |
182213821.80 |
201.64 |
191680.97 |
Total |
119976.7 |
241.8 |
3552898340.40 |
12601.44 |
913114.99 |
n = 10
∑X = 119976.7
∑Y = 241.8
∑X^2 = 3552898340.40
∑Y^2 = 12601.44
∑XY = 913114.99
Xbar = ∑X / n = 119976.7 / 10 = 11997.67
Ybar = ∑Y / n = 241.8 / 10 = 24.18
r = [n∑xy - ∑x∑y]/sqrt[(n∑x^2 – (∑x)^2)*(n∑y^2 – (∑y)^2)]
r = [10*913114.99 - 119976.7*241.8]/sqrt[(10*3552898340.40 – (119976.7)^2)*(10*12601.44 – (241.8)^2)]
r = -19879216 /sqrt[(10*3552898340.40 – (119976.7)^2)*(10*12601.44 – (241.8)^2)]
r = -19879216 / 37783360
r = -0.52614
Slope b = (∑XY – n*Xbar*Ybar)/(∑X^2 – n*Xbar^2)
Slope b = (913114.99– 10*11997.67*24.18)/( 3552898340.40 – 10*11997.67^2)
Slope b = -0.00094
Intercept a = Ybar – b*Xbar
Intercept a = 24.18 – (-0.00094)* 11997.67
Intercept a = 35.46502
Regression equation is given as below:
Y = a + b*X
Y = 35.46502 - 0.00094*X
Coefficient of determination = r^2 = r*r = -0.52614^2 = 0.276823 = 27.68%