In: Math
Problem 2: (Revised 6.3) Magazine Advertising: In a study of revenue from advertising, data were collected for 41 magazines list as follows. The variables observed are number of pages of advertising and advertising revenue. The names of the magazines are listed as:
Here is the code help you to paste data into your R:
data6<-'Adv Revenue
25 50
15 49.7
20 34
17 30.7
23 27
17 26.3
14 24.6
22 16.9
12 16.7
15 14.6
8 13.8
7 13.2
9 13.1
12 10.6
1 8.8
6 8.7
12 8.5
9 8.3
7 8.2
9 8.2
7 7.3
1 7
77 6.6
13 6.2
5 5.8
7 5.1
13 4.1
4 3.9
6 3.9
3 3.5
6 3.3
4 3
3 2.5
3 2.3
5 2.3
4 1.8
4 1.5
3 1.3
3 1.3
4 1
2 0.3
'
data6n<-read.table(textConnection(object=data6),
header=TRUE,
sep="",
stringsAsFactors = FALSE)
a. You should not be surprised by the presence of a large number of outliers because the magazines are highly heterogeneous and it is unrealistic to expect a single relationship to connect all of them. Find outliers and high leverage points. Delete the outliers and obtain an acceptable regression equation that relates advertising revenue to advertising pages.
b. For the deleted data, check the homogeneity of the variance. Choose an appropriate transformation of the data and fit the model to the transformed data. Evaluate the fit.
A -
Descriptive Statistics for advertising revenue
Mean: | 11.363 |
SD: | 12.253 |
# of values: | 41 |
Outlier detected? | Yes |
Significance level: | 0.05 (two-sided) |
Critical value of Z: |
3.04657009375 |
Rev | Z | Significant Outlier? |
---|---|---|
50.0 | 3.153 | Significant outlier. P < 0.05 |
49.7 | 3.129 | Significant outlier. P < 0.05 |
34.0 | 1.847 | |
30.7 | 1.578 | |
27.0 | 1.276 | |
26.3 | 1.219 | |
24.6 | 1.080 | |
16.9 | 0.452 | |
16.7 | 0.436 | |
14.6 | 0.264 | |
13.8 | 0.199 | |
13.2 | 0.150 | |
13.1 | 0.142 | |
10.6 | 0.062 | |
8.8 | 0.209 | |
8.7 | 0.217 | |
8.5 | 0.234 | |
8.3 | 0.250 | |
8.2 | 0.258 | |
8.2 | 0.258 | |
7.3 | 0.332 | |
7.0 | 0.356 | |
6.6 | 0.389 | |
6.2 | 0.421 | |
5.8 | 0.454 | |
5.1 | 0.511 | |
4.1 | 0.593 | |
3.9 | 0.609 | |
3.9 | 0.609 | |
3.5 | 0.642 | |
3.3 | 0.658 | |
3.0 | 0.683 | |
2.5 | 0.723 | |
2.3 | 0.740 | |
2.3 | 0.740 | |
1.8 | 0.781 | |
1.5 | 0.805 | |
1.3 | 0.821 | |
1.3 | 0.821 | |
1.0 | 0.846 | |
0.3 | 0.903 |
Outliers can be found using outlier package's outlier function or other similar functions
We will remove 50 and 49.7 from our observations and fit the regression line
Using the code:
lm(data6n$Revenue ~ data6n$Adv) {after removing the outliers)
We get the following result -
Coefficients:
(Intercept) data6n$Adv
6.9718 0.2375
Which indicates an equation -
Revenue = 6.9718 + 0.2375*Pages
Using abline function on the lm function, we see the line to be a good fit, no need to go into R2 value
B -
Since the outliers are close to each other 50 and 49.7, they can be considered as homogeneous and no even if no transformation is applied, they separately can fit the regression line to an extent.
However, if we wish to include outliers with data, we should treat them and then continue analysis
Since there are only two observations and both are close to each other, we expect the line to be a good fit, no need for test statistic or R2 value for measure of fit