In: Statistics and Probability
3. The following data represents data collected by a social researcher to examine the relationship
between “HAPPINESS” and “INCOME PER YEAR for workers 30 to 40 years old living on Long
Island, NY. Each worker completed a personality inventory scale that yields an HAPPINESS LEVEL
Scale Value(1 is LOW and 5 HIGH) and reported their respective yearly INCOME.
HAPPINESS
LEVEL
3
1
3
4
3
2
5
1
3
4
2
4
5
3
4
5
4
4
5
2
YEARLY
INCOME (1000$)
46 24 36 74 75 19 137 23 54 85 25 55 88 56 21 91 99 105 79 82
Using the data:
a. Plot a scatter diagram of YEARLY INCOME (x) against HAPPINESS LEVEL (y). (4pts)
b. Compute a Pearson correlation coefficient for part a. (2pts)
c. Would you consider the correlation coefficient weak, moderately weak, moderately strong
or strong? Why? (2pts)
d. Based on the data, do you believe an HAPPINESS LEVEL is related to YEARLY INCOME ?
Explain using your answer to part c (4pt)
e. Calculate the coefficient of determination for the correlation coefficient computed in part b.
Explain the meaning of the coefficient of determination in the context of the variables. Be
sure to identify the independent variable and dependent variable(4pts)
f. Determine the LINEAR regression Equation for the variables in this problem. Be sure to be
consistent in your use of the independent and dependent variables (4pts)
g. Using the regression formula from part f predict HAPPINESS LEVEL for a YEARLY INCOME
value of $150,000. (2pts)
h. Is the prediction value in part g an example of Interpolation or Extrapolation? Explain.
(2pts)
X | Y | XY | X² | Y² |
46 | 3 | 138 | 2116 | 9 |
24 | 1 | 24 | 576 | 1 |
36 | 3 | 108 | 1296 | 9 |
74 | 4 | 296 | 5476 | 16 |
75 | 3 | 225 | 5625 | 9 |
19 | 2 | 38 | 361 | 4 |
137 | 5 | 685 | 18769 | 25 |
23 | 1 | 23 | 529 | 1 |
54 | 3 | 162 | 2916 | 9 |
85 | 4 | 340 | 7225 | 16 |
25 | 2 | 50 | 625 | 4 |
55 | 4 | 220 | 3025 | 16 |
88 | 5 | 440 | 7744 | 25 |
56 | 3 | 168 | 3136 | 9 |
21 | 4 | 84 | 441 | 16 |
91 | 5 | 455 | 8281 | 25 |
99 | 4 | 396 | 9801 | 16 |
105 | 4 | 420 | 11025 | 16 |
79 | 5 | 395 | 6241 | 25 |
82 | 2 | 164 | 6724 | 4 |
Ʃx = | 1274 |
Ʃy = | 67 |
Ʃxy = | 4831 |
Ʃx² = | 101932 |
Ʃy² = | 255 |
Sample size, n = | 20 |
x̅ = Ʃx/n = 1274/20 = | 63.7 |
y̅ = Ʃy/n = 67/20 = | 3.35 |
SSxx = Ʃx² - (Ʃx)²/n = 101932 - (1274)²/20 = | 20778.2 |
SSyy = Ʃy² - (Ʃy)²/n = 255 - (67)²/20 = | 30.55 |
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 4831 - (1274)(67)/20 = | 563.1 |
a) Scatter pot:
b) Correlation coefficient, r = SSxy/√(SSxx*SSyy) = 563.1/√(20778.2*30.55) = 0.7068
c) the correlation coefficient moderately strong.
d)
Null and alternative hypothesis:
Ho: ρ = 0 ; Ha: ρ ≠ 0
Test statistic :
t = r*√(n-2)/√(1-r²) = 0.7068 *√(20 - 2)/√(1 - 0.7068²) = 4.2386
df = n-2 = 18
p-value = T.DIST.2T(ABS(4.2386), 18) = 0.0005
Conclusion:
p-value < α Reject the null hypothesis. There is a correlation between x and y.
e)
Coefficient of determination, r² = (SSxy)²/(SSxx*SSyy) = (563.1)²/(20778.2*30.55) = 0.4995
49.95% variation in y is explained by the least squares model.
f)
Slope, b = SSxy/SSxx = 563.1/20778.2 = 0.0271005
y-intercept, a = y̅ -b* x̅ = 3.35 - (0.0271)*63.7 = 1.623697
Regression equation :
ŷ = 1.6237 + (0.0271) x
g)
Predicted value of y at x = 150
ŷ = 1.6237 + (0.0271) * 150 = 5.6888
h) Extrapolation