Question

In: Math

An assistant in the district sales office of a national cosmetics firm obtained data on advertising...

An assistant in the district sales office of a national cosmetics firm obtained data on advertising expenditures and sales last year in the district’s 44 territories. Data is consmetics.csv. Use R. I don't want answers in Excel or SAS :)

X1: expenditures for point-of-sale displays in beauty salons and department stores (X$1000).

X2: expenditures for local media advertising.

X3: expenditures for prorated share of national media advertising.

Y: Sales (X$1000).

6. (4) Are there any influential points?

7. Is there a serious multicollinearity problem?

(3) Include an appropriate scatterplot and correlation values between the explanatory variables.

(3) Judge by VIF, do you think there is a problem with multicollinearity? (Hint: VIP or tolerance)

(3) Compare your answers in parts i and ii. Are your conclusions the same or different? Please explain your answer.

Data:

y	x1	x2	x3
12.85	5.6	5.6	3.8
11.55	4.1	4.8	4.8
12.78	3.7	3.5	3.6
11.19	4.8	4.5	5.2
9	3.4	3.7	2.9
9.34	6.1	5.8	3.4
13.8	7.7	7.2	3.8
8.79	4	4	3.8
8.54	2.8	2.3	2.9
6.23	3.2	3	2.8
11.77	4.2	4.5	5.1
8.04	2.7	2.1	4.3
5.8	1.8	2.5	2.3
11.57	5	4.6	3.6
7.03	2.9	3.2	4
0.27	0	0.2	2.7
5.1	1.4	2.2	3.8
9.91	4.2	4.3	4.3
6.56	2.4	2.2	3.7
14.17	4.7	4.7	3.4
8.32	4.5	4.4	2.7
7.32	3.6	2.9	2.8
3.45	0.6	0.8	3.4
13.73	5.6	4.7	5.3
8.06	3.2	3.3	3.6
9.94	3.7	3.5	4.3
11.54	5.5	4.9	3.2
10.8	3	3.6	4.6
12.33	5.8	5	4.5
2.96	3.5	3.1	3
7.38	2.3	2	2.2
8.68	2	1.8	2.5
11.51	4.9	5.3	3.8
1.6	0.1	0.3	2.7
10.93	3.6	3.8	3.8
11.61	4.9	4.4	2.5
17.99	8.4	8.2	3.9
9.58	2.1	2.3	3.9
7.05	1.9	1.8	3.8
8.85	2.4	2	2.4
7.53	3.6	3.5	2.4
10.47	3.6	3.7	4.4
11.03	3.9	3.6	2.9
12.31	5.5	5	5.5

Expert Solution

Plot data:

y=c(12.85,11.55,12.78,11.19,9,9.34,13.8,8.79,8.54,6.23,11.77,8.04,5.8,11.57,7.03,0.27,5.1,9.91,6.56,14.17,8.32,7.32,3.45,13.73,8.06,9.94,11.54,10.8,12.33,2.96,7.38,8.68,11.51,1.6,10.93,11.61,17.99,9.58,7.05,8.85,7.53,10.47,11.03,12.31)

x1=c(5.6,4.1,3.7,4.8,3.4,6.1,7.7,4,2.8,3.2,4.2,2.7,1.8,5,2.9,0,1.4,4.2,2.4,4.7,4.5,3.6,0.6,5.6,3.2,3.7,5.5,3,5.8,3.5,2.3,2,4.9,0.1,3.6,4.9,8.4,2.1,1.9,2.4,3.6,3.6,3.9,5.5)

x2=c(5.6,4.8,3.5,4.5,3.7,5.8,7.2,4,2.3,3,4.5,2.1,2.5,4.6,3.2,0.2,2.2,4.3,2.2,4.7,4.4,2.9,0.8,4.7,3.3,3.5,4.9,3.6,5,3.1,2,1.8,5.3,0.3,3.8,4.4,8.2,2.3,1.8,2,3.5,3.7,3.6,5)

x3=c(3.8,4.8,3.6,5.2,2.9,3.4,3.8,3.8,2.9,2.8,5.1,4.3,2.3,3.6,4,2.7,3.8,4.3,3.7,3.4,2.7,2.8,3.4,5.3,3.6,4.3,3.2,4.6,4.5,3,2.2,2.5,3.8,2.7,3.8,2.5,3.9,3.9,3.8,2.4,2.4,4.4,2.9,5.5)

# fit the model using R

fit=lm(y~x1+x2+x3)

fit

Output:

Call:

lm(formula = y ~ x1 + x2 + x3)

Coefficients:

(Intercept) x1 x2 x3

1.0233 0.9657 0.6292 0.6760

6. (4) Are there any influential points?

To detect the influential points there is direct command in R

influence.measures(fit)

Output:

Influence measures of

lm(formula = y ~ x1 + x2 + x3) :

dfb.1_ dfb.x1 dfb.x2 dfb.x3 dffit cov.r cook.d hat inf

1 -0.004003 -0.01553 0.02367 -0.01018 0.04872 1.181 6.08e-04 0.0662

2 -0.031724 -0.06110 0.06057 0.02859 0.07981 1.322 1.63e-03 0.1655 *

3 0.070146 0.09176 -0.09474 0.02087 0.32625 0.755 2.46e-02 0.0248

4 0.118258 -0.04666 0.04956 -0.14036 -0.16740 1.219 7.15e-03 0.1128

5 0.035752 -0.04028 0.04277 -0.03719 0.06299 1.185 1.02e-03 0.0712

6 -0.079405 -0.04496 -0.08332 0.27876 -0.62822 0.786 9.09e-02 0.0823

7 0.054999 -0.08175 -0.01744 0.11740 -0.44643 1.170 4.97e-02 0.1536

8 0.000966 0.02744 -0.03046 -0.01174 -0.10705 1.088 2.91e-03 0.0263

9 0.129859 0.14546 -0.15695 -0.04914 0.22686 1.121 1.30e-02 0.0750

10 -0.160591 -0.03721 0.03594 0.11753 -0.19990 1.060 1.00e-02 0.0441

11 -0.053408 -0.03567 0.03294 0.05924 0.08495 1.242 1.85e-03 0.1140

12 -0.007740 0.02877 -0.03375 0.02603 0.04375 1.293 4.91e-04 0.1454

13 -0.015649 0.01800 -0.01732 0.01390 -0.02408 1.334 1.49e-04 0.1709 *

14 0.003829 0.02078 -0.01477 -0.00666 0.04516 1.149 5.22e-04 0.0419

15 0.016795 0.11599 -0.09709 -0.06883 -0.19640 1.084 9.71e-03 0.0510

16 -0.390214 0.08274 0.03610 0.06160 -0.61973 0.978 9.23e-02 0.1279

17 -0.025477 0.23978 -0.20246 -0.04896 -0.29653 1.226 2.22e-02 0.1440

18 0.037280 0.03407 -0.03469 -0.04316 -0.09325 1.136 2.22e-03 0.0442

19 -0.012876 -0.03027 0.04342 -0.03343 -0.08752 1.152 1.96e-03 0.0527

20 0.107908 -0.13563 0.19282 -0.18189 0.42142 0.801 4.15e-02 0.0449

21 -0.163736 0.03623 -0.07073 0.20341 -0.26079 1.092 1.71e-02 0.0725

22 -0.087741 -0.12738 0.12487 0.05224 -0.16931 1.195 7.30e-03 0.0983

23 -0.060632 0.02725 0.01142 -0.04266 -0.18278 1.193 8.50e-03 0.1006

24 -0.139466 0.17062 -0.16852 0.16797 0.24624 1.385 1.55e-02 0.2185 *

25 -0.012813 0.02110 -0.01806 -0.00178 -0.05279 1.128 7.13e-04 0.0283

26 -0.010481 0.00798 -0.00989 0.01950 0.02861 1.158 2.10e-04 0.0462

27 -0.001794 -0.00372 0.00266 0.00269 -0.00671 1.201 1.15e-05 0.0787

28 -0.108793 -0.23083 0.20589 0.15425 0.32599 1.165 2.67e-02 0.1214

29 0.041056 -0.07707 0.06971 -0.04308 -0.10309 1.248 2.72e-03 0.1206

30 -0.465021 -0.41726 0.40908 0.28135 -0.76987 0.405 1.17e-01 0.0480 *

31 0.231108 0.07521 -0.08395 -0.16011 0.25762 1.146 1.67e-02 0.0947

32 0.407048 0.11020 -0.14871 -0.22532 0.47779 0.901 5.45e-02 0.0740

33 0.001028 0.02057 -0.02336 0.00520 -0.02853 1.231 2.09e-04 0.1017

34 -0.213931 0.04803 0.01519 0.03823 -0.33437 1.163 2.81e-02 0.1225

35 0.003315 -0.09516 0.09307 0.01485 0.16039 1.074 6.48e-03 0.0371

36 0.165522 0.09625 -0.05904 -0.19597 0.27011 1.153 1.84e-02 0.1015

37 -0.054049 -0.05080 0.13001 -0.10605 0.35941 1.376 3.28e-02 0.2304 *

38 0.009109 -0.10633 0.04840 0.14990 0.34029 0.965 2.83e-02 0.0562

39 0.006866 0.01507 -0.02848 0.03436 0.07403 1.178 1.40e-03 0.0675

40 0.389962 0.21424 -0.23514 -0.23780 0.48302 0.962 5.64e-02 0.0901

41 -0.106345 0.00994 -0.01905 0.10628 -0.12748 1.172 4.15e-03 0.0749

42 -0.035322 -0.02262 0.01698 0.05469 0.08450 1.148 1.82e-03 0.0493

43 0.166767 0.07541 -0.05798 -0.14659 0.24637 1.018 1.51e-02 0.0451

44 0.169796 -0.09530 0.09362 -0.18822 -0.23263 1.285 1.38e-02 0.1630

> # Yes there is influential points, (*) indicates the influential points . 2,13,24,30,37 these are influential points . we also find influential points using cooks distance, A common rule of thumb is that an observation with a value of Cook's D over 1.0 has too much influence.

7. Is there a serious multicollinearity problem?

Using correlation matrix we find multicollinearity

D=data.frame(x1,x2,x3)

cor(D) # to obtain correlation matrix

Output:

x1 x2 x3

x1 1.0000000 0.9744313 0.3759509

x2 0.9744313 1.0000000 0.4099208

x3 0.3759509 0.4099208 1.0000000

After looking correlation matrix we say that x1 and x2 are highly collinear but there is no multicollinearity problem.

Include an appropriate scatterplot and correlation values between the explanatory variables.

Command:

D=data.frame(x1,x2,x3)

plot(D) # to plot scatter plot

cor(D) # to plot correlation values

Output:

Correlation matrix of all explanatory variables

x1 x2 x3

x1 1.0000000 0.9744313 0.3759509

x2 0.9744313 1.0000000 0.4099208

x3 0.3759509 0.4099208 1.0000000

Judge by VIF, do you think there is a problem with multicollinearity? (Hint: VIP or tolerance)

We obtain VIF using R

fit=lm(y~x1+x2+x3)

summary(fit) # to obtain summary of model

Output:

Call:

lm(formula = y ~ x1 + x2 + x3)

Residuals:

Min 1Q Median 3Q Max

-5.4217 -0.9115 0.0703 1.1420 3.5479

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.0233 1.2029 0.851 0.4000

x1 0.9657 0.7092 1.362 0.1809

x2 0.6292 0.7783 0.808 0.4237

x3 0.6760 0.3557 1.900 0.0646 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.825 on 40 degrees of freedom

Multiple R-squared: 0.7417, Adjusted R-squared: 0.7223

F-statistic: 38.28 on 3 and 40 DF, p-value: 7.821e-12

From output we get R-square value which is : 0.7417 hence

Cammand:

R2=0.7417

(1-R2) # to get tolarance

[1] 0.2583 #Output

VIF=(1/(1-R2))

VIF

[1] 3.871467

#VIF<5

sqrt(VIF)

[1] 1.967604

Here VIF<5 and square root of VIF is also less than 2 hence we say that there is no multicollinearity problem.

Compare your answers in parts i and ii. Are your conclusions the same or different? Please explain your answer.

Ans: Yes we are getting same result using correlation and VIF there is no multicollinearity problem.

milcah answered 5 hours ago

An assistant in the district sales office of a national cosmetics firm obtained data on advertising...

An assistant in the district sales office of a national cosmetics firm obtained data on advertising expenditures and sales last year in the district’s 44 territories. X1: expenditures for point-of-sale displays in beauty salons and department stores (X$1000). X2: expenditures for local media advertising. X3: expenditures for prorated share of national media advertising. Y: Sales (X$1000). y x1 x2 x3 12.85 5.6 5.6 3.8 11.55 4.1 4.8 4.8 12.78 3.7 3.5 3.6 11.19 4.8 4.5 5.2 9 3.4 3.7 2.9...

Ken is a self-employed architect in a small firm with four employees: himself, his office assistant,...

Ken is a self-employed architect in a small firm with four employees: himself, his office assistant, and two drafters, all of whom have worked for Ken full-time for the last four years. The office assistant earns $40,000 per year and each drafter earns $60,000. Ken’s net earnings from self-employment (after deducting all expenses and one-half of self-employment taxes) are $330,000. Ken is considering whether to establish an SIMPLE plan and has a few questions. Is he eligible to establish an...

Data obtained from the National Center for Health Statisticsshow that men between the ages of...

Data obtained from the National Center for Health Statistics show that men between the ages of 20 and 29 have a mean height of 69.3 inches, with a standard deviation of 2.9 inches. A baseball analyst wonders whether the standard deviation of heights of major-league baseball players is less than 2.9 inches. The heights (in inches) of 20 randomly selected players are given below.7274717276707775727277727570737375737474Use Minitab Express to perform a Normality Test on this data. Report your answers rounded to three...

Data obtained from the National Center for Health Statistics show that men between the ages of...

Data obtained from the National Center for Health Statistics show that men between the ages of 20 and 29 have a mean height of 69.3 inches, with a standard deviation of 2.9 inches. A baseball analyst wonders whether the standard deviation of heights of major-league baseball players is less than 2.9 inches. The heights (in inches) of 20 randomly selected players are given below. 72 74 71 72 76 70 77 75 72 72 77 72 75 70 73 73...

Data from the Office for National Statistics show that the mean age at which men in...

Data from the Office for National Statistics show that the mean age at which men in Great Britain get married was 32.5. A news reporter noted that this represents a continuation of the trend of waiting until a later age to wed. A new sample of 47 recently wed British men provided their age at the time of marriage. These data are contained in the Excel Online file below. Construct a spreadsheet to answer the following questions. Open spreadsheet Do...

Regression analysis was applied between sales data (y in $1000s) and advertising data (x in $100s)...

Regression analysis was applied between sales data (y in $1000s) and advertising data (x in $100s) and the following information was obtained. = 30 + 2x n = 17 SSR = 450 SSE = 150 The critical t value for testing the significance of the slope, at a .05 level of significance, is: 1.746. 2.131. 1.753. 2.120.

A data miner wants to identify how price and advertising drive sales for the company and...

A data miner wants to identify how price and advertising drive sales for the company and wants to forecast but does not like to use algorithms. Which of the methods below represents the best solution. a. regression b. Clustering c. segmentation d. Neural Nets

Consider the following sample data for the relationship between advertising budget and sales for Product A:...

Consider the following sample data for the relationship between advertising budget and sales for Product A: Observation 1 2 3 4 5 6 7 8 9 10 Advertising ($) 50,000 60,000 60,000 70,000 70,000 80,000 90,000 90,000 100,000 110,000 Sales ($) 299,001 371,000 364,000 430,000 440,000 485,000 535,000 546,000 595,000 675,000 What is the predicted sales quantity for an advertising budget of $76,000? Please round your answer to the nearest integer.

Consider the following sample data for the relationship between advertising budget and sales for Product A:...

Consider the following sample data for the relationship between advertising budget and sales for Product A: Observation 1 2 3 4 5 6 7 8 9 10 Advertising ($) 50,000 60,000 60,000 70,000 70,000 80,000 90,000 90,000 100,000 110,000 Sales ($) 299,001 371,000 364,000 430,000 440,000 485,000 535,000 546,000 595,000 675,000 What is the correlation value for the relationship between advertising and sales? Please round your answer to the nearest hundredth.

Consider the following sample data for the relationship between advertising budget and sales for Product A:...

Consider the following sample data for the relationship between advertising budget and sales for Product A: Observation 1 2 3 4 5 6 7 8 9 10 Advertising ($) 50,000 60,000 60,000 70,000 70,000 80,000 90,000 90,000 100,000 110,000 Sales ($) 299,001 371,000 364,000 430,000 440,000 485,000 535,000 546,000 595,000 675,000 What is the predicted sales quantity for an advertising budget of $76,000? Please round your answer to the nearest integer. Note that the correct answer will be evaluated based...

Question

An assistant in the district sales office of a national cosmetics firm obtained data on advertising...

Solutions

Expert Solution

Related Solutions

An assistant in the district sales office of a national cosmetics firm obtained data on advertising...

Ken is a self-employed architect in a small firm with four employees: himself, his office assistant,...

Data obtained from the National Center for Health Statisticsshow that men between the ages of...

Data obtained from the National Center for Health Statistics show that men between the ages of...

Data from the Office for National Statistics show that the mean age at which men in...

Regression analysis was applied between sales data (y in $1000s) and advertising data (x in $100s)...

A data miner wants to identify how price and advertising drive sales for the company and...

Consider the following sample data for the relationship between advertising budget and sales for Product A:...

Consider the following sample data for the relationship between advertising budget and sales for Product A:...

Consider the following sample data for the relationship between advertising budget and sales for Product A:...