In: Statistics and Probability
Data Analysis Projects
A national distributor of "Eagle" brand snacks is attempting to develop a model to explain sales of their product. To do so, data have been gathered on monthly sales (measured in hundreds of dollars) from its many marketing areas. From these, sixty observations have been selected at random.
The variables in the data set are in columns, one observation per line (row). The first column is variable number 1, the second is variable number 2, etc. You must download your data set from the link on our class web page.
Your downloaded data set will contain only the data records without header lines.
The data reported in the data set are as follows:
Variable #1 = Monthly sales in thousands of dollars per marketing area.
This is the variable in which the company is primarily interested. They would like to determine what factors have an effect on their sales. They would also like to be able to predict sales based upon these factors, once they are identified. This variable will be the dependent variable in your analyses.
The company has also identified several potential explanatory factors that may have an effect on Eagle Chip sales. These are identified below and available in your data set. Variables 2 through 6 are quantitative high-level measurements. Variable 7 is qualitative having four nominal levels identified below.
Variable #2 = Promotional budget for the sales
area, in thousands of dollars.
3 = Median family income in the sales area, in thousands of
dollars.
4 = Product recognition index, proportion of respondents to a
marketing survey in the market
district that recognized the Eagle brand name (reported as a
decimal value).
5 = Average retail price of product in dollars.
6 = Average retail price of leading competitor brand in
dollars.
7 = A coded variable representing the advertising method employed
in the sales area.
There are four levels of this categorical variable labeled A, B, C,
and D as follows:
A
= Sports Magazine only
B
= Radio Sport Show Ads
C
= TV Sports Show Ad
D = TV General Advertising
Download your data set, import your data set into Excel, label
your variables, and save your Excel file. You will use this Excel
file for the next three computer projects.
ANOVA Project
The company is primarily interested in developing a model to explain and predict sales. Company sales (variable number 1) will be your dependent variable in these analyses.
Part A.
Provide summary descriptive statistics for company sales.
Part B.
The company is interested in determining which, if any, of the advertising methods used might lead to the highest average sales level and has asked you to do the analysis to provide this information. It will be possible to make this determination since each observation in the sample was generated using only one of the four advertising methods.
Perform an analysis of variance on Excel as demonstrated in class using Sales (variable number 1) as the dependent variable and Advertising Method (variable 7) as the explanatory variable.
Present your ANOVA results from your Excel printout using a “cut and paste” or screen capture method and respond to the following:
Specify the null and alternative hypotheses for the F-test for this model.
Report the critical value of the test statistic -or- interpret the p-level for this test.
State your conclusion regarding the test result.
Interpret your finding; in particular, make certain that you respond to the company’s inquiry.
Data set:
95.2 44.6 41.3 0.78 2.50 3.05 D
86.2 51.6 42.1 0.78 3.09 1.67 A
112.8 52.9 38.9 0.68 2.57 3.38 C
95.4 43.5 36.7 0.73 2.24 3.01 D
95.0 48.7 38.8 0.78 3.04 2.57 C
96.1 48.7 40.6 0.75 2.45 3.38 D
82.4 49.2 37.8 0.76 2.70 2.94 B
91.9 47.7 38.5 0.75 2.67 2.33 C
91.8 49.7 38.6 0.75 2.54 1.62 D
115.5 54.5 40.6 0.74 2.55 1.98 C
111.4 43.4 42.6 0.74 2.64 3.45 C
79.6 51.2 39.7 0.73 2.80 2.60 A
90.9 42.7 44.7 0.77 2.73 2.13 A
105.4 49.9 43.1 0.80 2.68 2.30 D
74.2 50.3 35.3 0.73 2.88 3.02 B
115.6 55.1 44.7 0.77 2.57 3.50 B
115.2 49.2 43.4 0.70 2.35 2.42 B
92.9 56.9 37.2 0.71 2.86 2.38 D
113.0 53.4 38.8 0.71 2.45 1.75 B
78.2 43.4 43.8 0.82 2.69 2.46 A
92.2 46.2 41.4 0.78 2.98 1.60 B
90.6 53.5 40.9 0.78 2.77 2.46 D
120.0 45.6 37.3 0.68 2.38 2.42 C
95.5 43.4 44.2 0.78 2.54 3.02 B
96.0 53.5 40.7 0.72 2.51 2.41 B
74.2 47.2 41.2 0.82 2.78 2.59 B
105.5 42.7 35.9 0.80 2.30 2.00 C
125.0 52.9 38.5 0.79 2.32 2.16 B
86.3 50.7 42.2 0.75 2.76 3.16 B
84.5 44.5 43.3 0.74 2.76 2.89 B
80.3 51.8 39.8 0.81 2.86 3.37 A
80.8 40.9 42.9 0.69 2.83 2.07 B
99.9 59.4 37.2 0.79 2.64 2.84 B
116.4 62.7 35.5 0.76 2.65 3.50 C
112.6 47.7 40.8 0.87 2.33 2.14 D
79.1 48.1 37.3 0.79 2.54 3.40 B
111.0 45.3 42.4 0.73 2.73 3.10 C
81.0 44.5 35.3 0.65 3.07 3.17 C
79.1 45.6 37.2 0.66 2.46 2.99 D
104.9 48.3 43.9 0.68 3.17 2.65 C
89.2 49.7 43.4 0.73 2.98 2.29 B
114.2 54.5 37.1 0.77 2.65 2.75 C
96.2 51.9 37.7 0.82 2.74 3.40 B
109.2 58.7 42.0 0.72 2.74 1.51 D
37.8 48.3 37.0 0.74 3.41 2.16 A
73.7 50.8 43.1 0.65 2.97 3.22 B
64.7 46.8 41.2 0.72 2.97 2.28 A
94.7 43.5 40.0 0.77 2.36 1.86 B
48.8 44.9 42.3 0.75 3.32 3.42 A
93.1 46.2 42.2 0.79 2.29 1.75 A
78.3 51.1 39.9 0.72 2.95 2.22 B
83.1 44.9 37.5 0.79 2.45 3.43 B
69.8 46.4 35.7 0.78 2.93 1.80 D
121.3 43.9 43.6 0.76 2.84 2.43 C
84.3 45.0 39.8 0.77 2.65 2.01 D
89.6 40.5 44.7 0.79 2.84 2.54 A
89.2 45.1 35.3 0.66 3.16 1.74 C
75.3 50.1 35.5 0.80 3.04 1.99 B
120.0 46.8 42.7 0.84 2.87 3.07 C
89.5 50.8 44.8 0.76 3.11 2.07 D
Note: Per the problem...
Perform an analysis of variance on Excel as demonstrated in class
using Sales (variable number 1) as the dependent variable and
Advertising Method (variable 7) as the explanatory variable.
Note: I would totally disregard JoeJoepp's solutions for 2
reasons.
(1) He inappropriately performed a simple linear regression
analysis. To perform a SLR analysis, the dependent variable and
explanatory variable both must be quantitative. Here the DV = Sales
is quantitative, but the EV = Advertising Method is categorical.
So, the correct analysis method is the ANOVA for means.
(2) You wanted the analysis performed using Excel. He used some
other program.
-----------------------------
Part A. Provide summary descriptive statistics for company
sales.
-----------------------------
Part B.
1.Specify the null and alternative hypotheses for the F-test for
this model.
Ho: muA = muB = muC = muD
Ha: at least 1 population mean significantly differs from the
others.
-----------------------------
2.Report the critical value of the test statistic -or- interpret
the p-level for this test.
Answers:
Critical value: F = 2.769
p-value = 4.31E-09 = .00000000431
--> This p-value tells us that the probability is essentially 0
to observe sample data this extreme if Ho is true, i.e. if the
population mean sales are truly equal.
-----------------------------
3.State your conclusion regarding the test result.
Answer: Since the p-value is less than any reasonable alpha level
(.10, .05 and .01), we can reject Ho and conclude that at least one
of the advertising methods produce significantly different sales
than the others.