In: Math
An assistant in the district sales office of a national cosmetics firm obtained data on advertising expenditures and sales last year in the district’s 44 territories. Data is consmetics.csv. Use R. I don't want answers in Excel or SAS :)
X1: expenditures for point-of-sale displays in beauty salons and department stores (X$1000).
X2: expenditures for local media advertising.
X3: expenditures for prorated share of national media advertising.
Y: Sales (X$1000).
6. (4) Are there any influential points?
7. Is there a serious multicollinearity problem?
(3) Include an appropriate scatterplot and correlation values between the explanatory variables.
(3) Judge by VIF, do you think there is a problem with multicollinearity? (Hint: VIP or tolerance)
(3) Compare your answers in parts i and ii. Are your conclusions the same or different? Please explain your answer.
Data:
y | x1 | x2 | x3 |
12.85 | 5.6 | 5.6 | 3.8 |
11.55 | 4.1 | 4.8 | 4.8 |
12.78 | 3.7 | 3.5 | 3.6 |
11.19 | 4.8 | 4.5 | 5.2 |
9 | 3.4 | 3.7 | 2.9 |
9.34 | 6.1 | 5.8 | 3.4 |
13.8 | 7.7 | 7.2 | 3.8 |
8.79 | 4 | 4 | 3.8 |
8.54 | 2.8 | 2.3 | 2.9 |
6.23 | 3.2 | 3 | 2.8 |
11.77 | 4.2 | 4.5 | 5.1 |
8.04 | 2.7 | 2.1 | 4.3 |
5.8 | 1.8 | 2.5 | 2.3 |
11.57 | 5 | 4.6 | 3.6 |
7.03 | 2.9 | 3.2 | 4 |
0.27 | 0 | 0.2 | 2.7 |
5.1 | 1.4 | 2.2 | 3.8 |
9.91 | 4.2 | 4.3 | 4.3 |
6.56 | 2.4 | 2.2 | 3.7 |
14.17 | 4.7 | 4.7 | 3.4 |
8.32 | 4.5 | 4.4 | 2.7 |
7.32 | 3.6 | 2.9 | 2.8 |
3.45 | 0.6 | 0.8 | 3.4 |
13.73 | 5.6 | 4.7 | 5.3 |
8.06 | 3.2 | 3.3 | 3.6 |
9.94 | 3.7 | 3.5 | 4.3 |
11.54 | 5.5 | 4.9 | 3.2 |
10.8 | 3 | 3.6 | 4.6 |
12.33 | 5.8 | 5 | 4.5 |
2.96 | 3.5 | 3.1 | 3 |
7.38 | 2.3 | 2 | 2.2 |
8.68 | 2 | 1.8 | 2.5 |
11.51 | 4.9 | 5.3 | 3.8 |
1.6 | 0.1 | 0.3 | 2.7 |
10.93 | 3.6 | 3.8 | 3.8 |
11.61 | 4.9 | 4.4 | 2.5 |
17.99 | 8.4 | 8.2 | 3.9 |
9.58 | 2.1 | 2.3 | 3.9 |
7.05 | 1.9 | 1.8 | 3.8 |
8.85 | 2.4 | 2 | 2.4 |
7.53 | 3.6 | 3.5 | 2.4 |
10.47 | 3.6 | 3.7 | 4.4 |
11.03 | 3.9 | 3.6 | 2.9 |
12.31 | 5.5 | 5 | 5.5 |
Plot data:
y=c(12.85,11.55,12.78,11.19,9,9.34,13.8,8.79,8.54,6.23,11.77,8.04,5.8,11.57,7.03,0.27,5.1,9.91,6.56,14.17,8.32,7.32,3.45,13.73,8.06,9.94,11.54,10.8,12.33,2.96,7.38,8.68,11.51,1.6,10.93,11.61,17.99,9.58,7.05,8.85,7.53,10.47,11.03,12.31)
x1=c(5.6,4.1,3.7,4.8,3.4,6.1,7.7,4,2.8,3.2,4.2,2.7,1.8,5,2.9,0,1.4,4.2,2.4,4.7,4.5,3.6,0.6,5.6,3.2,3.7,5.5,3,5.8,3.5,2.3,2,4.9,0.1,3.6,4.9,8.4,2.1,1.9,2.4,3.6,3.6,3.9,5.5)
x2=c(5.6,4.8,3.5,4.5,3.7,5.8,7.2,4,2.3,3,4.5,2.1,2.5,4.6,3.2,0.2,2.2,4.3,2.2,4.7,4.4,2.9,0.8,4.7,3.3,3.5,4.9,3.6,5,3.1,2,1.8,5.3,0.3,3.8,4.4,8.2,2.3,1.8,2,3.5,3.7,3.6,5)
x3=c(3.8,4.8,3.6,5.2,2.9,3.4,3.8,3.8,2.9,2.8,5.1,4.3,2.3,3.6,4,2.7,3.8,4.3,3.7,3.4,2.7,2.8,3.4,5.3,3.6,4.3,3.2,4.6,4.5,3,2.2,2.5,3.8,2.7,3.8,2.5,3.9,3.9,3.8,2.4,2.4,4.4,2.9,5.5)
# fit the model using R
fit=lm(y~x1+x2+x3)
fit
Output:
Call:
lm(formula = y ~ x1 + x2 + x3)
Coefficients:
(Intercept) x1 x2 x3
1.0233 0.9657 0.6292 0.6760
6. (4) Are there any influential points?
To detect the influential points there is direct command in R
influence.measures(fit)
Output:
Influence measures of
lm(formula = y ~ x1 + x2 + x3) :
dfb.1_ dfb.x1 dfb.x2 dfb.x3 dffit cov.r cook.d hat inf
1 -0.004003 -0.01553 0.02367 -0.01018 0.04872 1.181 6.08e-04 0.0662
2 -0.031724 -0.06110 0.06057 0.02859 0.07981 1.322 1.63e-03 0.1655 *
3 0.070146 0.09176 -0.09474 0.02087 0.32625 0.755 2.46e-02 0.0248
4 0.118258 -0.04666 0.04956 -0.14036 -0.16740 1.219 7.15e-03 0.1128
5 0.035752 -0.04028 0.04277 -0.03719 0.06299 1.185 1.02e-03 0.0712
6 -0.079405 -0.04496 -0.08332 0.27876 -0.62822 0.786 9.09e-02 0.0823
7 0.054999 -0.08175 -0.01744 0.11740 -0.44643 1.170 4.97e-02 0.1536
8 0.000966 0.02744 -0.03046 -0.01174 -0.10705 1.088 2.91e-03 0.0263
9 0.129859 0.14546 -0.15695 -0.04914 0.22686 1.121 1.30e-02 0.0750
10 -0.160591 -0.03721 0.03594 0.11753 -0.19990 1.060 1.00e-02 0.0441
11 -0.053408 -0.03567 0.03294 0.05924 0.08495 1.242 1.85e-03 0.1140
12 -0.007740 0.02877 -0.03375 0.02603 0.04375 1.293 4.91e-04 0.1454
13 -0.015649 0.01800 -0.01732 0.01390 -0.02408 1.334 1.49e-04 0.1709 *
14 0.003829 0.02078 -0.01477 -0.00666 0.04516 1.149 5.22e-04 0.0419
15 0.016795 0.11599 -0.09709 -0.06883 -0.19640 1.084 9.71e-03 0.0510
16 -0.390214 0.08274 0.03610 0.06160 -0.61973 0.978 9.23e-02 0.1279
17 -0.025477 0.23978 -0.20246 -0.04896 -0.29653 1.226 2.22e-02 0.1440
18 0.037280 0.03407 -0.03469 -0.04316 -0.09325 1.136 2.22e-03 0.0442
19 -0.012876 -0.03027 0.04342 -0.03343 -0.08752 1.152 1.96e-03 0.0527
20 0.107908 -0.13563 0.19282 -0.18189 0.42142 0.801 4.15e-02 0.0449
21 -0.163736 0.03623 -0.07073 0.20341 -0.26079 1.092 1.71e-02 0.0725
22 -0.087741 -0.12738 0.12487 0.05224 -0.16931 1.195 7.30e-03 0.0983
23 -0.060632 0.02725 0.01142 -0.04266 -0.18278 1.193 8.50e-03 0.1006
24 -0.139466 0.17062 -0.16852 0.16797 0.24624 1.385 1.55e-02 0.2185 *
25 -0.012813 0.02110 -0.01806 -0.00178 -0.05279 1.128 7.13e-04 0.0283
26 -0.010481 0.00798 -0.00989 0.01950 0.02861 1.158 2.10e-04 0.0462
27 -0.001794 -0.00372 0.00266 0.00269 -0.00671 1.201 1.15e-05 0.0787
28 -0.108793 -0.23083 0.20589 0.15425 0.32599 1.165 2.67e-02 0.1214
29 0.041056 -0.07707 0.06971 -0.04308 -0.10309 1.248 2.72e-03 0.1206
30 -0.465021 -0.41726 0.40908 0.28135 -0.76987 0.405 1.17e-01 0.0480 *
31 0.231108 0.07521 -0.08395 -0.16011 0.25762 1.146 1.67e-02 0.0947
32 0.407048 0.11020 -0.14871 -0.22532 0.47779 0.901 5.45e-02 0.0740
33 0.001028 0.02057 -0.02336 0.00520 -0.02853 1.231 2.09e-04 0.1017
34 -0.213931 0.04803 0.01519 0.03823 -0.33437 1.163 2.81e-02 0.1225
35 0.003315 -0.09516 0.09307 0.01485 0.16039 1.074 6.48e-03 0.0371
36 0.165522 0.09625 -0.05904 -0.19597 0.27011 1.153 1.84e-02 0.1015
37 -0.054049 -0.05080 0.13001 -0.10605 0.35941 1.376 3.28e-02 0.2304 *
38 0.009109 -0.10633 0.04840 0.14990 0.34029 0.965 2.83e-02 0.0562
39 0.006866 0.01507 -0.02848 0.03436 0.07403 1.178 1.40e-03 0.0675
40 0.389962 0.21424 -0.23514 -0.23780 0.48302 0.962 5.64e-02 0.0901
41 -0.106345 0.00994 -0.01905 0.10628 -0.12748 1.172 4.15e-03 0.0749
42 -0.035322 -0.02262 0.01698 0.05469 0.08450 1.148 1.82e-03 0.0493
43 0.166767 0.07541 -0.05798 -0.14659 0.24637 1.018 1.51e-02 0.0451
44 0.169796 -0.09530 0.09362 -0.18822 -0.23263 1.285 1.38e-02 0.1630
> # Yes there is influential points, (*) indicates the influential points . 2,13,24,30,37 these are influential points . we also find influential points using cooks distance, A common rule of thumb is that an observation with a value of Cook's D over 1.0 has too much influence.
7. Is there a serious multicollinearity problem?
Using correlation matrix we find multicollinearity
D=data.frame(x1,x2,x3)
cor(D) # to obtain correlation matrix
Output:
x1 x2 x3
x1 1.0000000 0.9744313 0.3759509
x2 0.9744313 1.0000000 0.4099208
x3 0.3759509 0.4099208 1.0000000
After looking correlation matrix we say that x1 and x2 are highly collinear but there is no multicollinearity problem.
Include an appropriate scatterplot and correlation values between the explanatory variables.
Command:
D=data.frame(x1,x2,x3)
plot(D) # to plot scatter plot
cor(D) # to plot correlation values
Output:
Correlation matrix of all explanatory variables
x1 x2 x3 |
x1 1.0000000 0.9744313 0.3759509 |
x2 0.9744313 1.0000000 0.4099208 |
x3 0.3759509 0.4099208 1.0000000 |
Judge by VIF, do you think there is a problem with multicollinearity? (Hint: VIP or tolerance)
We obtain VIF using R
fit=lm(y~x1+x2+x3)
summary(fit) # to obtain summary of model
Output:
Call:
lm(formula = y ~ x1 + x2 + x3)
Residuals:
Min 1Q Median 3Q Max
-5.4217 -0.9115 0.0703 1.1420 3.5479
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.0233 1.2029 0.851 0.4000
x1 0.9657 0.7092 1.362 0.1809
x2 0.6292 0.7783 0.808 0.4237
x3 0.6760 0.3557 1.900 0.0646 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.825 on 40 degrees of freedom
Multiple R-squared: 0.7417, Adjusted R-squared: 0.7223
F-statistic: 38.28 on 3 and 40 DF, p-value: 7.821e-12
From output we get R-square value which is : 0.7417 hence
Cammand:
R2=0.7417
(1-R2) # to get tolarance
[1] 0.2583 #Output
VIF=(1/(1-R2))
VIF
[1] 3.871467
#VIF<5
sqrt(VIF)
[1] 1.967604
Here VIF<5 and square root of VIF is also less than 2 hence we say that there is no multicollinearity problem.
Compare your answers in parts i and ii. Are your conclusions the same or different? Please explain your answer.
Ans: Yes we are getting same result using correlation and VIF there is no multicollinearity problem.