In: Statistics and Probability
Using the table below
NATIONALITY |
TOTAL PERSONS |
TOT_SPEND |
DAYS |
FIRST-TIME |
GERMAN |
4 |
1095 |
9 |
yes |
ITALIAN |
5 |
867 |
8 |
no |
FRENCH |
4 |
689 |
10 |
no |
BRITISH |
5 |
876 |
16 |
yes |
FRENCH |
5 |
1280 |
17 |
yes |
FRENCH |
4 |
755 |
10 |
no |
FRENCH |
3 |
375 |
9 |
no |
BRITISH |
4 |
275 |
4 |
no |
ITALIAN |
5 |
1166 |
11 |
yes |
GERMAN |
4 |
1741 |
13 |
yes |
BRITISH |
4 |
302 |
5 |
yes |
BRITISH |
4 |
423 |
11 |
no |
BRITISH |
4 |
659 |
7 |
no |
BRITISH |
3 |
358 |
10 |
no |
ITALIAN |
4 |
785 |
9 |
yes |
BRITISH |
5 |
1046 |
15 |
no |
BRITISH |
4 |
373 |
5 |
no |
GERMAN |
5 |
517 |
6 |
yes |
ITALIAN |
3 |
486 |
12 |
yes |
BRITISH |
4 |
357 |
6 |
no |
ITALIAN |
5 |
436 |
5 |
yes |
FRENCH |
3 |
800 |
13 |
yes |
BRITISH |
4 |
613 |
7 |
no |
GERMAN |
3 |
688 |
11 |
yes |
BRITISH |
3 |
433 |
7 |
no |
BRITISH |
3 |
239 |
6 |
yes |
BRITISH |
4 |
464 |
8 |
no |
FRENCH |
4 |
923 |
14 |
no |
BRITISH |
3 |
187 |
5 |
no |
FRENCH |
3 |
357 |
12 |
no |
Suppose you want to use the non-numeric variables “Nationality” and “First time visit”, as explanatory variables in a regression model regarding “Total spending”
i. How many “dummy variables” and which ones do you need to create?
ii. State a regression model of Total spending using as explanatory variables “PersonDays” (new variable computes as Total Persons X Days), “Nationality”, and “First time visit”. You believe that “Nationality” affects the slope coefficient of “PersonsDays”, while “First time visit” affects the intercept.
iii. Run the regression and state the estimated model for each Nationality distinguishing between first time and repeated visitors.
iv. Which coefficients are statistically significant?
v. Run the model after you remove the non-significant coefficients. Compare with the model in (ii) and comment
i. How many “dummy variables” and which ones do you need to create?
2
ii. State a regression model of Total spending using as explanatory variables “PersonDays” (new variable computes as Total Persons X Days), “Nationality”, and “First time visit”. You believe that “Nationality” affects the slope coefficient of “PersonsDays”, while “First time visit” affects the intercept.
R² | 0.723 | |||||
Adjusted R² | 0.691 | |||||
R | 0.850 | |||||
Std. Error | 198.825 | |||||
n | 30 | |||||
k | 3 | |||||
Dep. Var. | TOT_SPEND | |||||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 26,87,115.0495 | 3 | 8,95,705.0165 | 22.66 | 2.01E-07 | |
Residual | 10,27,811.1172 | 26 | 39,531.1968 | |||
Total | 37,14,926.1667 | 29 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=26) | p-value | 95% lower | 95% upper |
Intercept | 604.6127 | |||||
PERSON*DAYS | 13.7111 | 2.1053 | 6.513 | 6.65E-07 | 9.3835 | 18.0387 |
NATIONALITY | -147.0028 | 41.5606 | -3.537 | .0015 | -232.4318 | -61.5738 |
FIRST-TIME | -39.3412 | 92.1592 | -0.427 | .6730 | -228.7772 | 150.0947 |
The regression model is:
TOT_SPEND = 604.6127 + 13.7111*PERSON*DAYS - 147.0028*NATIONALITY - 39.3412*FIRST-TIME
iii. Run the regression and state the estimated model for each Nationality distinguishing between first time and repeated visitors.
R² | 0.272 | |||||
Adjusted R² | 0.218 | |||||
R | 0.522 | |||||
Std. Error | 316.489 | |||||
n | 30 | |||||
k | 2 | |||||
Dep. Var. | TOT_SPEND | |||||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 10,10,466.8942 | 2 | 5,05,233.4471 | 5.04 | .0138 | |
Residual | 27,04,459.2724 | 27 | 1,00,165.1582 | |||
Total | 37,14,926.1667 | 29 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=27) | p-value | 95% lower | 95% upper |
Intercept | 1,070.6006 | |||||
NATIONALITY | -148.2985 | 66.1554 | -2.242 | .0334 | -284.0381 | -12.5589 |
FIRST-TIME | 72.4729 | 144.1308 | 0.503 | .6192 | -223.2590 | 368.2048 |
The regression model is:
TOT_SPEND = 1,070.6006 - 148.2985*NATIONALITY + 72.4729*FIRST-TIME
iv. Which coefficients are statistically significant?
NATIONALITY
v. Run the model after you remove the non-significant coefficients. Compare with the model in (ii) and comment
r² | 0.265 | |||||
r | -0.515 | |||||
Std. Error | 312.238 | |||||
n | 30 | |||||
k | 1 | |||||
Dep. Var. | TOT_SPEND | |||||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 9,85,141.5833 | 1 | 9,85,141.5833 | 10.10 | .0036 | |
Residual | 27,29,784.5834 | 28 | 97,492.3066 | |||
Total | 37,14,926.1667 | 29 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=28) | p-value | 95% lower | 95% upper |
Intercept | 1,161.3127 | |||||
NATIONALITY | -167.8503 | 52.8029 | -3.179 | .0036 | -276.0122 | -59.6884 |