Question

In: Math

An assistant in the district sales office of a national cosmetics firm obtained data on advertising...

An assistant in the district sales office of a national cosmetics firm obtained data on advertising expenditures and sales last year in the district’s 44 territories. Data is consmetics.csv. Use R. I don't want answers in Excel or SAS :)

X1: expenditures for point-of-sale displays in beauty salons and department stores (X$1000).

X2: expenditures for local media advertising.

X3: expenditures for prorated share of national media advertising.

Y: Sales (X$1000).

6. (4) Are there any influential points?

7. Is there a serious multicollinearity problem?

(3) Include an appropriate scatterplot and correlation values between the explanatory variables.

(3) Judge by VIF, do you think there is a problem with multicollinearity? (Hint: VIP or tolerance)

(3) Compare your answers in parts i and ii. Are your conclusions the same or different? Please explain your answer.

Data:

y x1 x2 x3
12.85 5.6 5.6 3.8
11.55 4.1 4.8 4.8
12.78 3.7 3.5 3.6
11.19 4.8 4.5 5.2
9 3.4 3.7 2.9
9.34 6.1 5.8 3.4
13.8 7.7 7.2 3.8
8.79 4 4 3.8
8.54 2.8 2.3 2.9
6.23 3.2 3 2.8
11.77 4.2 4.5 5.1
8.04 2.7 2.1 4.3
5.8 1.8 2.5 2.3
11.57 5 4.6 3.6
7.03 2.9 3.2 4
0.27 0 0.2 2.7
5.1 1.4 2.2 3.8
9.91 4.2 4.3 4.3
6.56 2.4 2.2 3.7
14.17 4.7 4.7 3.4
8.32 4.5 4.4 2.7
7.32 3.6 2.9 2.8
3.45 0.6 0.8 3.4
13.73 5.6 4.7 5.3
8.06 3.2 3.3 3.6
9.94 3.7 3.5 4.3
11.54 5.5 4.9 3.2
10.8 3 3.6 4.6
12.33 5.8 5 4.5
2.96 3.5 3.1 3
7.38 2.3 2 2.2
8.68 2 1.8 2.5
11.51 4.9 5.3 3.8
1.6 0.1 0.3 2.7
10.93 3.6 3.8 3.8
11.61 4.9 4.4 2.5
17.99 8.4 8.2 3.9
9.58 2.1 2.3 3.9
7.05 1.9 1.8 3.8
8.85 2.4 2 2.4
7.53 3.6 3.5 2.4
10.47 3.6 3.7 4.4
11.03 3.9 3.6 2.9
12.31 5.5 5 5.5

Solutions

Expert Solution

Plot data:

y=c(12.85,11.55,12.78,11.19,9,9.34,13.8,8.79,8.54,6.23,11.77,8.04,5.8,11.57,7.03,0.27,5.1,9.91,6.56,14.17,8.32,7.32,3.45,13.73,8.06,9.94,11.54,10.8,12.33,2.96,7.38,8.68,11.51,1.6,10.93,11.61,17.99,9.58,7.05,8.85,7.53,10.47,11.03,12.31)

x1=c(5.6,4.1,3.7,4.8,3.4,6.1,7.7,4,2.8,3.2,4.2,2.7,1.8,5,2.9,0,1.4,4.2,2.4,4.7,4.5,3.6,0.6,5.6,3.2,3.7,5.5,3,5.8,3.5,2.3,2,4.9,0.1,3.6,4.9,8.4,2.1,1.9,2.4,3.6,3.6,3.9,5.5)

x2=c(5.6,4.8,3.5,4.5,3.7,5.8,7.2,4,2.3,3,4.5,2.1,2.5,4.6,3.2,0.2,2.2,4.3,2.2,4.7,4.4,2.9,0.8,4.7,3.3,3.5,4.9,3.6,5,3.1,2,1.8,5.3,0.3,3.8,4.4,8.2,2.3,1.8,2,3.5,3.7,3.6,5)

x3=c(3.8,4.8,3.6,5.2,2.9,3.4,3.8,3.8,2.9,2.8,5.1,4.3,2.3,3.6,4,2.7,3.8,4.3,3.7,3.4,2.7,2.8,3.4,5.3,3.6,4.3,3.2,4.6,4.5,3,2.2,2.5,3.8,2.7,3.8,2.5,3.9,3.9,3.8,2.4,2.4,4.4,2.9,5.5)

# fit the model using R

fit=lm(y~x1+x2+x3)

fit

Output:

Call:

lm(formula = y ~ x1 + x2 + x3)

Coefficients:

(Intercept)           x1           x2           x3

     1.0233       0.9657       0.6292       0.6760

6. (4) Are there any influential points?

To detect the influential points there is direct command in R

influence.measures(fit)

Output:

Influence measures of

         lm(formula = y ~ x1 + x2 + x3) :

      dfb.1_   dfb.x1   dfb.x2   dfb.x3    dffit cov.r   cook.d    hat inf

1 -0.004003 -0.01553 0.02367 -0.01018 0.04872 1.181 6.08e-04 0.0662   

2 -0.031724 -0.06110 0.06057 0.02859 0.07981 1.322 1.63e-03 0.1655   *

3   0.070146 0.09176 -0.09474 0.02087 0.32625 0.755 2.46e-02 0.0248   

4   0.118258 -0.04666 0.04956 -0.14036 -0.16740 1.219 7.15e-03 0.1128   

5   0.035752 -0.04028 0.04277 -0.03719 0.06299 1.185 1.02e-03 0.0712   

6 -0.079405 -0.04496 -0.08332 0.27876 -0.62822 0.786 9.09e-02 0.0823   

7   0.054999 -0.08175 -0.01744 0.11740 -0.44643 1.170 4.97e-02 0.1536   

8   0.000966 0.02744 -0.03046 -0.01174 -0.10705 1.088 2.91e-03 0.0263   

9 0.129859 0.14546 -0.15695 -0.04914 0.22686 1.121 1.30e-02 0.0750   

10 -0.160591 -0.03721 0.03594 0.11753 -0.19990 1.060 1.00e-02 0.0441   

11 -0.053408 -0.03567 0.03294 0.05924 0.08495 1.242 1.85e-03 0.1140   

12 -0.007740 0.02877 -0.03375 0.02603 0.04375 1.293 4.91e-04 0.1454   

13 -0.015649 0.01800 -0.01732 0.01390 -0.02408 1.334 1.49e-04 0.1709   *

14 0.003829 0.02078 -0.01477 -0.00666 0.04516 1.149 5.22e-04 0.0419   

15 0.016795 0.11599 -0.09709 -0.06883 -0.19640 1.084 9.71e-03 0.0510   

16 -0.390214 0.08274 0.03610 0.06160 -0.61973 0.978 9.23e-02 0.1279   

17 -0.025477 0.23978 -0.20246 -0.04896 -0.29653 1.226 2.22e-02 0.1440   

18 0.037280 0.03407 -0.03469 -0.04316 -0.09325 1.136 2.22e-03 0.0442   

19 -0.012876 -0.03027 0.04342 -0.03343 -0.08752 1.152 1.96e-03 0.0527   

20 0.107908 -0.13563 0.19282 -0.18189 0.42142 0.801 4.15e-02 0.0449   

21 -0.163736 0.03623 -0.07073 0.20341 -0.26079 1.092 1.71e-02 0.0725   

22 -0.087741 -0.12738 0.12487 0.05224 -0.16931 1.195 7.30e-03 0.0983   

23 -0.060632 0.02725 0.01142 -0.04266 -0.18278 1.193 8.50e-03 0.1006   

24 -0.139466 0.17062 -0.16852 0.16797 0.24624 1.385 1.55e-02 0.2185   *

25 -0.012813 0.02110 -0.01806 -0.00178 -0.05279 1.128 7.13e-04 0.0283   

26 -0.010481 0.00798 -0.00989 0.01950 0.02861 1.158 2.10e-04 0.0462   

27 -0.001794 -0.00372 0.00266 0.00269 -0.00671 1.201 1.15e-05 0.0787   

28 -0.108793 -0.23083 0.20589 0.15425 0.32599 1.165 2.67e-02 0.1214   

29 0.041056 -0.07707 0.06971 -0.04308 -0.10309 1.248 2.72e-03 0.1206   

30 -0.465021 -0.41726 0.40908 0.28135 -0.76987 0.405 1.17e-01 0.0480   *

31 0.231108 0.07521 -0.08395 -0.16011 0.25762 1.146 1.67e-02 0.0947   

32 0.407048 0.11020 -0.14871 -0.22532 0.47779 0.901 5.45e-02 0.0740   

33 0.001028 0.02057 -0.02336 0.00520 -0.02853 1.231 2.09e-04 0.1017   

34 -0.213931 0.04803 0.01519 0.03823 -0.33437 1.163 2.81e-02 0.1225   

35 0.003315 -0.09516 0.09307 0.01485 0.16039 1.074 6.48e-03 0.0371   

36 0.165522 0.09625 -0.05904 -0.19597 0.27011 1.153 1.84e-02 0.1015   

37 -0.054049 -0.05080 0.13001 -0.10605 0.35941 1.376 3.28e-02 0.2304   *

38 0.009109 -0.10633 0.04840 0.14990 0.34029 0.965 2.83e-02 0.0562   

39 0.006866 0.01507 -0.02848 0.03436 0.07403 1.178 1.40e-03 0.0675   

40 0.389962 0.21424 -0.23514 -0.23780 0.48302 0.962 5.64e-02 0.0901   

41 -0.106345 0.00994 -0.01905 0.10628 -0.12748 1.172 4.15e-03 0.0749   

42 -0.035322 -0.02262 0.01698 0.05469 0.08450 1.148 1.82e-03 0.0493   

43 0.166767 0.07541 -0.05798 -0.14659 0.24637 1.018 1.51e-02 0.0451   

44 0.169796 -0.09530 0.09362 -0.18822 -0.23263 1.285 1.38e-02 0.1630   

> # Yes there is influential points, (*) indicates the influential points . 2,13,24,30,37 these are influential points . we also find influential points using cooks distance, A common rule of thumb is that an observation with a value of Cook's D over 1.0 has too much influence.

7. Is there a serious multicollinearity problem?

Using correlation matrix we find multicollinearity

D=data.frame(x1,x2,x3)

cor(D) # to obtain correlation matrix

Output:

         x1        x2        x3

x1 1.0000000 0.9744313 0.3759509

x2 0.9744313 1.0000000 0.4099208

x3 0.3759509 0.4099208 1.0000000

After looking correlation matrix we say that x1 and x2 are highly collinear but there is no multicollinearity problem.

Include an appropriate scatterplot and correlation values between the explanatory variables.

Command:

D=data.frame(x1,x2,x3)

plot(D)    # to plot scatter plot

cor(D)    # to plot correlation values

Output:

Correlation matrix of all explanatory variables

              x1                         x2                       x3

x1      1.0000000        0.9744313         0.3759509

x2       0.9744313      1.0000000          0.4099208

x3       0.3759509       0.4099208         1.0000000

Judge by VIF, do you think there is a problem with multicollinearity? (Hint: VIP or tolerance)

We obtain VIF using R

fit=lm(y~x1+x2+x3)

summary(fit)    # to obtain summary of model

Output:

Call:

lm(formula = y ~ x1 + x2 + x3)

Residuals:

    Min      1Q Median      3Q     Max

-5.4217 -0.9115 0.0703 1.1420 3.5479

Coefficients:

            Estimate Std. Error t value Pr(>|t|)

(Intercept)   1.0233     1.2029   0.851   0.4000

x1            0.9657     0.7092   1.362   0.1809

x2            0.6292     0.7783   0.808   0.4237

x3            0.6760     0.3557   1.900   0.0646 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.825 on 40 degrees of freedom

Multiple R-squared: 0.7417,    Adjusted R-squared: 0.7223

F-statistic: 38.28 on 3 and 40 DF, p-value: 7.821e-12

From output we get R-square value which is : 0.7417 hence

Cammand:

R2=0.7417

(1-R2)       # to get tolarance

[1] 0.2583   #Output

VIF=(1/(1-R2))

VIF

[1] 3.871467

#VIF<5

sqrt(VIF)

[1] 1.967604

Here VIF<5 and square root of VIF is also less than 2 hence we say that there is no multicollinearity problem.

Compare your answers in parts i and ii. Are your conclusions the same or different? Please explain your answer.

Ans: Yes we are getting same result using correlation and VIF there is no multicollinearity problem.


Related Solutions

An assistant in the district sales office of a national cosmetics firm obtained data on advertising...
An assistant in the district sales office of a national cosmetics firm obtained data on advertising expenditures and sales last year in the district’s 44 territories. X1: expenditures for point-of-sale displays in beauty salons and department stores (X$1000). X2: expenditures for local media advertising. X3: expenditures for prorated share of national media advertising. Y: Sales (X$1000). y x1 x2 x3 12.85 5.6 5.6 3.8 11.55 4.1 4.8 4.8 12.78 3.7 3.5 3.6 11.19 4.8 4.5 5.2 9 3.4 3.7 2.9...
Ken is a self-employed architect in a small firm with four employees: himself, his office assistant,...
Ken is a self-employed architect in a small firm with four employees: himself, his office assistant, and two drafters, all of whom have worked for Ken full-time for the last four years. The office assistant earns $40,000 per year and each drafter earns $60,000. Ken’s net earnings from self-employment (after deducting all expenses and one-half of self-employment taxes) are $330,000. Ken is considering whether to establish an SIMPLE plan and has a few questions. Is he eligible to establish an...
Data obtained from the National Center for Health Statisticsshow that men between the ages of...
Data obtained from the National Center for Health Statistics show that men between the ages of 20 and 29 have a mean height of 69.3 inches, with a standard deviation of 2.9 inches. A baseball analyst wonders whether the standard deviation of heights of major-league baseball players is less than 2.9 inches. The heights (in inches) of 20 randomly selected players are given below.7274717276707775727277727570737375737474Use Minitab Express to perform a Normality Test on this data. Report your answers rounded to three...
Data obtained from the National Center for Health Statistics show that men between the ages of...
Data obtained from the National Center for Health Statistics show that men between the ages of 20 and 29 have a mean height of 69.3 inches, with a standard deviation of 2.9 inches. A baseball analyst wonders whether the standard deviation of heights of major-league baseball players is less than 2.9 inches. The heights (in inches) of 20 randomly selected players are given below. 72 74 71 72 76 70 77 75 72 72 77 72 75 70 73 73...
Data from the Office for National Statistics show that the mean age at which men in...
Data from the Office for National Statistics show that the mean age at which men in Great Britain get married was 32.5. A news reporter noted that this represents a continuation of the trend of waiting until a later age to wed. A new sample of 47 recently wed British men provided their age at the time of marriage. These data are contained in the Excel Online file below. Construct a spreadsheet to answer the following questions. Open spreadsheet Do...
Regression analysis was applied between sales data (y in $1000s) and advertising data (x in $100s)...
Regression analysis was applied between sales data (y in $1000s) and advertising data (x in $100s) and the following information was obtained. ​ = 30 + 2x ​ n = 17 SSR = 450 SSE = 150 The critical t value for testing the significance of the slope, at a .05 level of significance, is: 1.746. 2.131. 1.753. 2.120.
A data miner wants to identify how price and advertising drive sales for the company and...
A data miner wants to identify how price and advertising drive sales for the company and wants to forecast but does not like to use algorithms. Which of the methods below represents the best solution.         a. regression               b. Clustering               c. segmentation           d. Neural Nets
Consider the following sample data for the relationship between advertising budget and sales for Product A:...
Consider the following sample data for the relationship between advertising budget and sales for Product A: Observation 1 2 3 4 5 6 7 8 9 10 Advertising ($) 50,000 60,000 60,000 70,000 70,000 80,000 90,000 90,000 100,000 110,000 Sales ($) 299,001 371,000 364,000 430,000 440,000 485,000 535,000 546,000 595,000 675,000 What is the predicted sales quantity for an advertising budget of $76,000? Please round your answer to the nearest integer.
Consider the following sample data for the relationship between advertising budget and sales for Product A:...
Consider the following sample data for the relationship between advertising budget and sales for Product A: Observation 1 2 3 4 5 6 7 8 9 10 Advertising ($) 50,000 60,000 60,000 70,000 70,000 80,000 90,000 90,000 100,000 110,000 Sales ($) 299,001 371,000 364,000 430,000 440,000 485,000 535,000 546,000 595,000 675,000 What is the correlation value for the relationship between advertising and sales? Please round your answer to the nearest hundredth.
Consider the following sample data for the relationship between advertising budget and sales for Product A:...
Consider the following sample data for the relationship between advertising budget and sales for Product A: Observation 1 2 3 4 5 6 7 8 9 10 Advertising ($) 50,000 60,000 60,000 70,000 70,000 80,000 90,000 90,000 100,000 110,000 Sales ($) 299,001 371,000 364,000 430,000 440,000 485,000 535,000 546,000 595,000 675,000 What is the predicted sales quantity for an advertising budget of $76,000? Please round your answer to the nearest integer. Note that the correct answer will be evaluated based...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT