In: Statistics and Probability
Suppose you are the Chief Marketing Officer for a retailer that has data on the home addresses of its
1,000,000 most active customers. You hope to determine whether sending out “20% off your entire
purchase” coupons by mail will increase revenues.
You conjecture that customers who have access to this coupon will spend more in the store over the
next year. However, skeptics in your company argue that the coupons will just allow customers to
spend less on items they would have purchased anyway. This is a debate that an experiment can
resolve.
In thinking about how large of an experiment you need to have enough statistical power, you realize
that many of the customers you send the coupons to in the mail will not open the mail and so will not
realize they received the coupon.
1. The CEO argues that to estimate the effects of the coupons on revenue, you should compare
the difference in revenues from a) people you sent the coupons and who used them and b)
people you sent the coupons but who did not use them. Write a response to your CEO:
describe the flaw with this plan in language the CEO will understand, and advocate for your
proposed experiment.
2. To avoid the cost of sending out coupons you do not need to, you ask the data science team
to plan an experiment just large enough (with just enough statistical power) to reliably detect
a treatment effect if the true effect on those who open the mail and realize they have the
coupon is a $2 increase in revenue over the next year. The data science team tells you that
an experiment with 100,000 people in the treatment group (leaving the remaining 900,000 in
the control group) will be well-powered to detect an overall difference between the entire
treatment and control groups of $2 in revenue over the next year. To send out the minimum
number of coupons required while still having enough statistical power to detect a $2 effect of
opening the mail, can you send out fewer, the same number of, or more coupons than
100,000?
1) The CEO argues that one should compare the difference in revenues from people whom the coupons were sent and used it to the people whom coupons were sent but didn't use it.
This hypothesis is not what one requires to test in the given experiment. The objective of the coupon distribution is to determine whether there is any increase int he overall revenue of the retailer or not. It doesn't matter whether the differences in the use of the coupons by the customers is significant or not. In fact it may be that the customers who use the coupons spend less than the customers who don't use the coupon. In such a case one might get a significant difference in the two samples but the main objective of the entire experiment would not be achieved, which is to figure out whether the overall revenue increases with the distribution of the coupons or not.
In other words the hypothesis that the revenue generated from the customers who used the coupons is higher on an average for the same customers who didn't use coupon before will be appropriate for the given experiment.
2) Since the data science company has suggested a treatment group of 100,000 people to have enough statistical power, it would be required to send out more coupons than 100,000 since there is a possibility that the customer will not open the mail and consequently not use the coupon. Thus, it is better to have a sample size of more than 100,000 to give the test enough statistical power.