In: Statistics and Probability
You have been given data on n = 20 male customers. You are to use the data to determine the best regression model for predicting the amount spent in the current year using some or all of the other variables as predictors.
This question centers on the response variable. Question: Using the program Minitab, provide the mean, standard deviation, and five number summary, as well as a minimum of two graphs. Then use the statistics and graphs to address the shape of the distribution, as part of the answer to this question.
Data on n = 20 male customers.
Row |
Customer |
Age |
Salary |
Children |
PrevSpent |
Emails |
AmountSpent |
1 |
718 |
29 |
$20,100 |
1 |
$2 |
12 |
$271 |
2 |
452 |
40 |
$52,400 |
0 |
$0 |
18 |
$857 |
3 |
673 |
44 |
$53,300 |
0 |
$1,148 |
18 |
$1,720 |
4 |
399 |
26 |
$12,000 |
0 |
$313 |
24 |
$344 |
5 |
197 |
68 |
$20,800 |
0 |
$0 |
6 |
$260 |
6 |
367 |
53 |
$55,800 |
1 |
$0 |
6 |
$821 |
7 |
339 |
36 |
$74,600 |
2 |
$846 |
18 |
$705 |
8 |
1000 |
55 |
$102,500 |
1 |
$1,615 |
24 |
$2,464 |
9 |
548 |
73 |
$43,300 |
0 |
$0 |
12 |
$1,584 |
10 |
484 |
29 |
$17,300 |
1 |
$677 |
18 |
$381 |
11 |
853 |
25 |
$18,400 |
1 |
$974 |
6 |
$303 |
12 |
7 |
33 |
$34,700 |
0 |
$0 |
18 |
$1,615 |
13 |
848 |
68 |
$78,900 |
0 |
$2,422 |
18 |
$2,233 |
14 |
146 |
32 |
$73,300 |
0 |
$0 |
12 |
$1,694 |
15 |
614 |
70 |
$74,900 |
0 |
$0 |
12 |
$1,804 |
16 |
986 |
55 |
$123,600 |
1 |
$0 |
12 |
$1,892 |
17 |
361 |
33 |
$122,100 |
0 |
$2,963 |
12 |
$2,408 |
18 |
170 |
44 |
$84,800 |
1 |
$0 |
12 |
$1,826 |
19 |
683 |
49 |
$89,500 |
0 |
$2,333 |
24 |
$5,209 |
20 |
630 |
43 |
$48,700 |
2 |
$356 |
12 |
$528 |
summary statistic
Customer | Age | Salary | Children | PrevSpent | Emails | AmountSpent | |
mean | 523.75 | 45.25 | 60050 | 0.55 | 682.45 | 14.7 | 1445.95 |
Standard deviation | 281.7741 | 15.539 | 34271.05 | 0.686333 | 947.2159 | 5.667079 | 1169.089186 |
CV | 53.79934 | 34.34033 | 57.07085 | 124.7878 | 138.7964 | 38.55156 | 80.85267025 |
median | 516 | 43.5 | 54550 | 0 | 157.5 | 12 | 1599.5 |
third quartile | 709.25 | 55 | 83325 | 1 | 1104.5 | 18 | 1875.5 |
first quartile | 364.75 | 22.75 | 59050 | 1 | 1104.5 | 6 | 1457.75 |
min | 7 | 25 | 12000 | 0 | 0 | 6 | 260 |
max | 1000 | 73 | 123600 | 2 | 2963 | 24 | 5209 |
range | 993 | 48 | 111600 | 2 | 2963 | 18 | 4949 |
The regression equation is
AmountSpent = - 753 - 0.089 Customer + 10.1 Age + 0.0172 Salary
- 437 Children
+ 0.279 PrevSpent + 54.6 Emails
Here from summary statistics
mean > median
Also from below graph i.e from given histogram we can conclude that the given data is positively skewed.
Also the normal probability plot tell us residuals follows normal distribution.