In: Statistics and Probability
use methods of descriptive statistics to summarize the data and comment on your findings -
Income ($1000s) |
Household Size |
Amount Charged ($) |
54 | 3 | 4,016 |
30 | 2 | 3,159 |
32 | 4 | 5,100 |
50 | 5 | 4,742 |
31 | 2 | 1,864 |
55 | 2 | 4,070 |
37 | 1 | 2,731 |
40 | 2 | 3,348 |
66 | 4 | 4,764 |
51 | 3 | 4,110 |
25 | 3 | 4,208 |
48 | 4 | 4,219 |
27 | 1 | 2,477 |
33 | 2 | 2,514 |
65 | 3 | 4,214 |
63 | 4 | 4,965 |
42 | 6 | 4,412 |
21 | 2 | 2,448 |
44 | 1 | 2,995 |
37 | 5 | 4,171 |
62 | 6 | 5,678 |
21 | 3 | 3,623 |
55 | 7 | 5,301 |
42 | 2 | 3,020 |
41 | 7 | 4,828 |
54 | 6 | 5,573 |
30 | 1 | 2,583 |
48 | 2 | 3,866 |
34 | 5 | 3,586 |
67 | 4 | 5,037 |
50 | 2 | 3,605 |
67 | 5 | 5,345 |
55 | 6 | 5,370 |
52 | 2 | 3,890 |
62 | 3 | 4,705 |
64 | 2 | 4,157 |
22 | 3 | 3,579 |
29 | 4 | 3,890 |
39 | 2 | 2,972 |
35 | 1 | 3,121 |
39 | 4 | 4,183 |
54 | 3 | 3,730 |
23 | 6 | 4,127 |
27 | 2 | 2,921 |
26 | 7 | 4,603 |
61 | 2 | 4,273 |
30 | 2 | 3,067 |
22 | 4 | 3,074 |
46 | 5 | 4,820 |
66 | 4 | 5,149 |
> Income=scan()
1: 54 30 32 50
31 55 37 40
66 51 25 48
27 33 65 63
42 21 44 37
62 21 55 42
41 54 30 48
34 67 50 67
55 52 62 64
22 29 39 35
39 54 23 27
26 61 30 22
46 66
51:
Read 50 items
> Household=scan()
1: 3 2 4 5
2 2 1 2
4 3 3 4
1 2 3 4
6 2 1 5
6 3 7 2
7 6 1 2
5 4 2 5
6 2 3 2
3 4 2 1
4 3 6 2
7 2 2 4
5 4
51:
Read 50 items
> Amount=scan()
1: 4016 3159 5100
4742 1864 4070 2731
3348 4764 4110 4208
4219 2477 2514 4214
4965 4412 2448 2995
4171 5678 3623 5301
3020 4828 5573 2583
3866 3586 5037 3605
5345 5370 3890 4705
4157 3579 3890 2972
3121 4183 3730 4127
2921 4603 4273 3067
3074 4820 5149
51:
Read 50 items
> d=cbind(Income,Household,Amount)
> colMeans(d)
Income Household Amount
43.48 3.42 3964.06
> c(median(d[,1]),median(d[,2]),median(d[,3]))#medians
[1] 42 3 4090
> hist(d[,1])
> hist(d[,2])
> ##household is positively skewed i.e most of people have small
houses. Number of people having large house size are less in
number
> hist(d[,3])
> ##Amount is slightly negatively skewed
> cor(d)
Income Household Amount
Income 1.0000000 0.1725335 0.6309742
Household 0.1725335 1.0000000 0.7528432
Amount 0.6309742 0.7528432 1.0000000
> #household size and amount are highly correlated. Income and
amount are more correlated as compare to that income and household
size
> sqrt(round(c(var(d[,1]),var(d[,2]),var(d[,3])),5))#standard
deviation
[1] 14.550742 1.738988 933.494082
> sqrt(round(c(var(d[,1]),var(d[,2]),var(d[,3])),5))#standard
deviation
[1] 14.550742 1.738988 933.494082
>
sqrt(round(c(var(d[,1]),var(d[,2]),var(d[,3])),5))/colMeans(d)#coefficient
of variation
Income Household Amount
0.3346537 0.5084761 0.2354894
> #regression on amount charged using income and household as
predictors
> m=lm(Amount~Income+Household)
> m
Call:
lm(formula = Amount ~ Income + Household)
Coefficients:
(Intercept) Income Household
1304.90 33.13 356.30
> summary(m)
Call:
lm(formula = Amount ~ Income + Household)
Residuals:
Min 1Q Median 3Q Max
-1180.62 -155.31 7.05 194.56 1309.66
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1304.905 197.655 6.602 3.29e-08 ***
Income 33.133 3.968 8.350 7.68e-11 ***
Household 356.296 33.201 10.732 3.12e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 398.1 on 47 degrees of freedom
Multiple R-squared: 0.8256, Adjusted R-squared:
0.8181
F-statistic: 111.2 on 2 and 47 DF, p-value: < 2.2e-16
>
>
> #model is Amount = 1304.905 + 33.133*Income +
356.296*Household
> #for unit increase in household size amount charged increases
by 356.296 and for unit increase in Income amount increases by
33.133