In: Statistics and Probability
Develop estimated regression equations, first using annual income as the independent variable and then using household size as the independent variable. Which variable is the better predictor of annual credit card charges? Discuss your findings -
Income ($1000s) |
Household Size |
Amount Charged ($) |
54 | 3 | 4,016 |
30 | 2 | 3,159 |
32 | 4 | 5,100 |
50 | 5 | 4,742 |
31 | 2 | 1,864 |
55 | 2 | 4,070 |
37 | 1 | 2,731 |
40 | 2 | 3,348 |
66 | 4 | 4,764 |
51 | 3 | 4,110 |
25 | 3 | 4,208 |
48 | 4 | 4,219 |
27 | 1 | 2,477 |
33 | 2 | 2,514 |
65 | 3 | 4,214 |
63 | 4 | 4,965 |
42 | 6 | 4,412 |
21 | 2 | 2,448 |
44 | 1 | 2,995 |
37 | 5 | 4,171 |
62 | 6 | 5,678 |
21 | 3 | 3,623 |
55 | 7 | 5,301 |
42 | 2 | 3,020 |
41 | 7 | 4,828 |
54 | 6 | 5,573 |
30 | 1 | 2,583 |
48 | 2 | 3,866 |
34 | 5 | 3,586 |
67 | 4 | 5,037 |
50 | 2 | 3,605 |
67 | 5 | 5,345 |
55 | 6 | 5,370 |
52 | 2 | 3,890 |
62 | 3 | 4,705 |
64 | 2 | 4,157 |
22 | 3 | 3,579 |
29 | 4 | 3,890 |
39 | 2 | 2,972 |
35 | 1 | 3,121 |
39 | 4 | 4,183 |
54 | 3 | 3,730 |
23 | 6 | 4,127 |
27 | 2 | 2,921 |
26 | 7 | 4,603 |
61 | 2 | 4,273 |
30 | 2 | 3,067 |
22 | 4 | 3,074 |
46 | 5 | 4,820 |
66 | 4 | 5,149 |
INCOME---------------------------------------------------
ΣX | ΣY | Σ(x-x̅)² | Σ(y-ȳ)² | Σ(x-x̅)(y-ȳ) | |
total sum | 2174 | 198203 | 10374.48 | 42699148.8 | 419956.56 |
mean | 43.48 | 3964.06 | SSxx | SSyy | SSxy |
sample size , n = 50
here, x̅ = Σx / n= 43.48 ,
ȳ = Σy/n =
3964.06
SSxx = Σ(x-x̅)² = 10374.4800
SSxy= Σ(x-x̅)(y-ȳ) = 419956.6
estimated slope , ß1 = SSxy/SSxx = 419956.6
/ 10374.480 = 40.4798
intercept, ß0 = y̅-ß1* x̄ =
2203.9996
so, regression line is Ŷ =
2203.9996 + 40.4798 *x
SSE= (SSxx * SSyy - SS²xy)/SSxx =
25699404.034
std error ,Se = √(SSE/(n-2)) =
731.713
correlation coefficient , r = Sxy/√(Sx.Sy)
= 0.6310
R² = (Sxy)²/(Sx.Sy) =
0.3981
HOUSEHOLD-----------------------------------------------------------------------
ΣX | ΣY | Σ(x-x̅)² | Σ(y-ȳ)² | Σ(x-x̅)(y-ȳ) | |
total sum | 171 | 198203 | 148.18 | 42699148.8 | 59883.74 |
mean | 3.42 | 3964.06 | SSxx | SSyy | SSxy |
sample size , n = 50
here, x̅ = Σx / n= 3.42 ,
ȳ = Σy/n =
3964.06
SSxx = Σ(x-x̅)² = 148.1800
SSxy= Σ(x-x̅)(y-ȳ) = 59883.7
estimated slope , ß1 = SSxy/SSxx = 59883.7
/ 148.180 = 404.1284
intercept, ß0 = y̅-ß1* x̄ =
2581.9410
so, regression line is Ŷ =
2581.9410 + 404.1284 *x
SSE= (SSxx * SSyy - SS²xy)/SSxx =
18498431.339
std error ,Se = √(SSE/(n-2)) =
620.793
correlation coefficient , r = Sxy/√(Sx.Sy)
= 0.7528
R² = (Sxy)²/(Sx.Sy) = 0.5668
----------------------------------------------------------------------------
As you can see Household R sqyare is better and hence better predictor of annual credit card charges
Please revert back in case of any doubt.
Please upvote. Thanks in advance.