In: Statistics and Probability
A
new retail store is analyzing their monthly revenues per shopper to quantify the effect of the age of the shopper and the number of (monthly) shoppers on their monthly revenue. The owner feels that the revenue received per shopper increases with the age of the shopper and with the number of shoppers but wants a more quantitative explanation. The multiple regression output is shown below. answer with the help of excel
Summary output
Multiple R
0.8391
R-Square
0.7841
Adj R-Square
0.7683
StErr of Estimate
150.828
Regression output
Coefficient
Std Err
t-value
p-value
Constant
-54.986
331.204
…
0.0010
Age of shopper
79.017
10.647
Not provided
0.0000
Number of shoppers
14.973
10.443
…
0.1940
(1) Which input variable (i.e., explanatory variable) might you consider dropping based on a t-test? Why? Please explain convincingly.
(2) If you wanted to understand whether the shoppers aged 50 and
above impact the monthly revenue differently than those aged below
50, how would you proceed with the analysis? Would you define any
new variables and, if so, what variables? Please provide
details.
(3) What Excel formulas would you use to create the additional variables? Please provide details.
Let
Y= Monthly revenue per shopper
X1 = Age of the shopper
X2= Number of shoppers per month
The linear model that is being estimated is
where is the intercept
are the slope coefficients corresponding to X1 and X2 respectively
is a random disturbance
The following estimates of the coefficients are from the regression output
the estimate of intercept is
the estimates of slopes are
The estimated regression equation is
1) We will drop a variable if the slope coefficient is equal to zero.
We will test the following hypotheses for the 2 slope coefficients.
coefficient for X1
The test statistics to test this hypothesis is
The corresponding p-value is given in the table and it is 0.
Since the p-value is less than the significance level alpha=0.05, we reject the null hypothesis.
We conclude that there is sufficient evidence to support the cliam that variable age of the shoppers explains the monthly revenue.
Next we test the slope coefficient for X2
The p-value of the test statistics is already provided in the excel output. The p-value is 0.194
Since this p-value is greater than the significance level alpha=0.05, we can not reject the null hypothesis.
We need to conclude that there is no sufficient evidence to support the claim that number of shoppers explains the monthly revenue.
Based on the t-test we can consider dropping the variable number of shoppers.
2) We need to define a dummy variable which takes a value of 0 if the age of the shopper is less than 50 and 1 if the age of the shopper is 50 and above.
Let us say the new dummy variable is named D
The new regression model that we want to estimate is
We can add any other input variable (such as number of shoppers) as needed to the above model.
3) Suppose column B contains the variable age of shopper. In a new column enter the formula =IF(B2<50,0,1)
An example is given below
The values would look like below
Use the new variable created to estimate regression