Question

In: Math

Problem 2: (Revised 6.3) Magazine Advertising: In a study of revenue from advertising, data were collected...

Problem 2: (Revised 6.3) Magazine Advertising: In a study of revenue from advertising, data were collected for 41 magazines list as follows. The variables observed are number of pages of advertising and advertising revenue. The names of the magazines are listed as:

Here is the code help you to paste data into your R:

data6<-'Adv Revenue
25 50
15 49.7
20 34
17 30.7
23 27
17 26.3
14 24.6
22 16.9
12 16.7
15 14.6
8 13.8
7 13.2
9 13.1
12 10.6
1 8.8
6 8.7
12 8.5
9 8.3
7 8.2
9 8.2
7 7.3
1 7
77 6.6
13 6.2
5 5.8
7 5.1
13 4.1
4 3.9
6 3.9
3 3.5
6 3.3
4 3
3 2.5
3 2.3
5 2.3
4 1.8
4 1.5
3 1.3
3 1.3
4 1
2 0.3
'
data6n<-read.table(textConnection(object=data6),
header=TRUE,
sep="",
stringsAsFactors = FALSE)

a. You should not be surprised by the presence of a large number of outliers because the magazines are highly heterogeneous and it is unrealistic to expect a single relationship to connect all of them. Find outliers and high leverage points. Delete the outliers and obtain an acceptable regression equation that relates advertising revenue to advertising pages.

b. For the deleted data, check the homogeneity of the variance. Choose an appropriate transformation of the data and fit the model to the transformed data. Evaluate the fit.

Solutions

Expert Solution

A -

Descriptive Statistics for advertising revenue

Mean: 11.363
SD: 12.253
# of values: 41
Outlier detected? Yes
Significance level: 0.05 (two-sided)
Critical value of Z:

3.04657009375

Rev Z Significant Outlier?
50.0 3.153 Significant outlier. P < 0.05
49.7 3.129 Significant outlier. P < 0.05
34.0 1.847
30.7 1.578
27.0 1.276
26.3 1.219
24.6 1.080
16.9 0.452
16.7 0.436
14.6 0.264
13.8 0.199
13.2 0.150
13.1 0.142
10.6 0.062
8.8 0.209
8.7 0.217
8.5 0.234
8.3 0.250
8.2 0.258
8.2 0.258
7.3 0.332
7.0 0.356
6.6 0.389
6.2 0.421
5.8 0.454
5.1 0.511
4.1 0.593
3.9 0.609
3.9 0.609
3.5 0.642
3.3 0.658
3.0 0.683
2.5 0.723
2.3 0.740
2.3 0.740
1.8 0.781
1.5 0.805
1.3 0.821
1.3 0.821
1.0 0.846
0.3 0.903

Outliers can be found using outlier package's outlier function or other similar functions

We will remove 50 and 49.7 from our observations and fit the regression line

Using the code:

lm(data6n$Revenue ~ data6n$Adv) {after removing the outliers)

We get the following result -

Coefficients:
(Intercept) data6n$Adv
6.9718 0.2375

Which indicates an equation -

Revenue = 6.9718 + 0.2375*Pages

Using abline function on the lm function, we see the line to be a good fit, no need to go into R2 value

B -

Since the outliers are close to each other 50 and 49.7, they can be considered as homogeneous and no even if no transformation is applied, they separately can fit the regression line to an extent.

However, if we wish to include outliers with data, we should treat them and then continue analysis

Since there are only two observations and both are close to each other, we expect the line to be a good fit, no need for test statistic or R2 value for measure of fit


Related Solutions

Problem 2: (Revised 6.3) Magazine Advertising: In a study of revenue from advertising, data were collected...
Problem 2: (Revised 6.3) Magazine Advertising: In a study of revenue from advertising, data were collected for 41 magazines list as follows. The variables observed are number of pages of advertising and advertising revenue. The names of the magazines are listed as: (use sas) Adv Revenue 25 50 15 49.7 20 34 17 30.7 23 27 17 26.3 14 24.6 22 16.9 12 16.7 15 14.6 8 13.8 7 13.2 9 13.1 12 10.6 1 8.8 6 8.7 12 8.5...
USE R AND SHOW CODES 2. The following data were collected in a multisite observational study...
USE R AND SHOW CODES 2. The following data were collected in a multisite observational study of medical effectiveness in Type II diabetes. These sites were involved: a healthy maintenance organization (HMO), a university teaching hospital (UTH), and an independent practice assumption (IPA). The following data display the treatment regimens of patients measured at baseline by site. Use the data to test that no difference in treatment regimens across sites. (in addition, calculate the expected frequency for each cell.)                                                              ...
The data in the table below were collected from a repeated‑measures study of muscle growth related...
The data in the table below were collected from a repeated‑measures study of muscle growth related to exercise. There were 8 people in the study altogether! They each got the four exercise conditions. Fill in the table below Interval training Aerobic exercise Weight training No exercise SSwithin for the study SSwithin for each group 36 18 26 15 95 Source SS DF MS F Between groups Within groups 95 ----- ---- Between persons 16 ----- ---- Error ---- Total 112...
The following data were collected from a repeated-measures study: Determine if there are any significant differences...
The following data were collected from a repeated-measures study: Determine if there are any significant differences among the four treatments. Use a .05 level of significance. Remember to; 1) State the null hypothesis, 2) Show all of your calculations, 3) Make a decision about your null hypothesis, 4) Make a conclusion including an APA format summary of your findings (include a measure of effect size if necessary), and 5) Indicate what you would do next given your findings. Participant Treatments...
1) The following data were collected from a repeated-measures study investigating the effects of 4 treatment...
1) The following data were collected from a repeated-measures study investigating the effects of 4 treatment conditions on test performance. Determine if there are any significant differences among the four treatments. State the null hypothesis. If you determine a significant treatment effect, use Tukey’s HSD test (overall α = .05) to determine which treatments differ from which other treatments. Also, compute the percentage of variance explained by the treatment effect (η2). Conclude with an appropriate summary describing what you found....
To study the physical fitness of a sample of 28 people, the data below were collected...
To study the physical fitness of a sample of 28 people, the data below were collected representing the number of sit-ups that a person could do in one minute. 42, 70, 81, 48, 40, 63, 58, 54, 29, 66, 49, 48, 76, 42, 65, 57, 46, 57, 55, 60, 34, 40, 32, 27, 40, 9, 68, 120 Determine the lower and upper fences. Are there any outliers according to this criterion? This needs to be completed by hand and show...
2. Data concerning employment status were collected from a sample of 50 World Campus students. In...
2. Data concerning employment status were collected from a sample of 50 World Campus students. In that sample of 50 students, 33 students reported they were employed full-time. A. Use Minitab Express to construct a 95% confidence interval to estimate the proportion of all World Campus students who are employed full-time. If assumptions were met, use the normal approximation method. Remember to include all relevant Minitab Express output and to clearly identify your answer. [15 points] B. What sample size...
​Wellness, a healthy living​ magazine, collected $ 540, 000 in subscription revenue on May 31. Each...
​Wellness, a healthy living​ magazine, collected $ 540, 000 in subscription revenue on May 31. Each subscriber will receive an issue of the magazine in each of the next 12​ months, beginning with the June issue. The company uses the accrual method of accounting. What is the balance in the Unearned Revenue account as of December​ 31?
The following data were collected as part of a study of coffee consumption among graduate students....
The following data were collected as part of a study of coffee consumption among graduate students. The following reflect cups per day consumed: 3          4          6          8          2          1          0          2 X X2 0 0 1 1 2 4 2 4 3 9 4 16 6 36 8 64 26 134 Compute the sample mean. Compute the sample standard deviation. Compute the median. Compute the first and third quartiles. Which measure, the mean or median, is a better measure of...
Revenue Cycle Management Data is collected at each step of the revenue cycle, and an error...
Revenue Cycle Management Data is collected at each step of the revenue cycle, and an error or lack of action at any step in the cycle may result in delayed or lost revenue. Discuss three steps in the revenue cycle, explaining what action occurs; provide an example for each step. Describe a negative result, for each of your selected three steps, which may occur if the action is completed incorrectly or not at all. Select one impact, from those you...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT