In: Statistics and Probability
Suppose you are part of the analytics team for the online retailer Macha Bucks which sells two types of tea to its online visitors: Rouge Roma (RR) and Emerald Earl (EE). Everyday approximately 10,000 people visit the site over a 24 hour period. For simplicity suppose we consider the “buy one or don’t buy” (BODB) market segment of customers which when they visit the site will conduct one of the following actions: (a) buy one order of RR, (b) buy one order of EE, or (c) don’t buy (DB) anything. You have been tasked with determining customer behavior on the website for the BODB segment using a random sample of 35 visits.
In the dataset for the random sample, each row corresponds to a random visitor. For each visitor we provide both the visitor’s action as well as the profit earned on the transaction. In the action column:
if the visitor buys one order of RR, we see a RR,
if the visitor buys one order of EE, we see an EE,
if the visitor doesn’t buy anything, we see a DB.
Note that even if two customers buy the same product, the profit can differ due to the shipping costs, promotions, or coupons that are applied
Random Sample of Data
1=yes, 0 = no
Transaction ID
Action
Profit ($)
Bought RR?
Bought EE?
Didn't Buy?
Profit RR ($)
Profit EE ($)
1
RR
8.43
1
0
0
.
0.00
2
DB
0.00
0
0
1
0.00
0.00
3
EE
1.75
0
1
0
0.00
1.75
4
DB
0.00
0
0
1
0.00
0.00
5
EE
4.37
0
1
0
0.00
4.37
6
EE
5.79
0
1
0
0.00
5.79
7
RR
6.27
1
0
0
6.27
0.00
8
RR
6.22
1
0
0
6.22
0.00
9
DB
0.00
0
0
1
0.00
0.00
10
EE
4.49
0
1
0
0.00
4.49
11
RR
10.54
1
0
0
10.54
0.00
12
EE
3.79
0
1
0
0.00
3.79
13
DB
0.00
0
0
1
0.00
0.00
14
DB
0.00
0
0
1
0.00
0.00
15
RR
9.03
1
0
0
9.03
0.00
16
EE
3.54
0
1
0
0.00
3.54
17
DB
0.00
0
0
1
0.00
0.00
18
DB
0.00
0
0
1
0.00
0.00
19
EE
5.02
0
1
0
0.00
5.02
20
DB
0.00
0
0
1
0.00
0.00
21
EE
3.60
0
1
0
0.00
3.60
22
DB
0.00
0
0
1
0.00
0.00
23
EE
2.61
0
1
0
0.00
2.61
24
RR
11.75
1
0
0
11.75
0.00
25
RR
12.22
1
0
0
12.22
0.00
26
DB
0.00
0
0
1
0.00
0.00
27
DB
0.00
0
0
1
0.00
0.00
28
EE
6.17
0
1
0
0.00
6.17
29
RR
8.83
1
0
0
8.83
0.00
30
DB
0.00
0
0
1
0.00
0.00
31
DB
0.00
0
0
1
0.00
0.00
32
DB
0.00
0
0
1
0.00
0.00
33
DB
0.00
0
0
1
0.00
0.00
34
RR
14.16
1
0
0
14.16
0.00
35
EE
6.06
0
1
0
0.00
6.06
PARTS
Using the sample data, obtain a point estimate for the proportion
of customers in this BODB market segment that
a) purchase EE:
b) purchase RR:
c) don’t buy:
a) What is the name of the model/distribution that would be
appropriate to use for the probability distribution of the sample
proportion of the BODB market segment that purchases EE?
b) Please provide as much information as you can about the relevant
parameters for the distribution (e.g., mean and standard
deviation).
Please provide a 95% confidence interval for population proportion
of the BODB market segment that
a) purchase EE:
b) purchase RR:
c) don’t buy:
a) What does the 95% confidence interval mean intuitively? Please
provide an interpretation.
b) What could you do to obtain a narrower 95% confidence
interval?
c) What would you need to do to have a margin of error of 0.05?
Please do the calculation.
a) Please provide a 99% confidence interval for the population
proportion of the BODB market segment that purchases EE.
b) When would you prefer a 99% confidence interval rather than a
95% confidence interval?
What is the 95% confidence interval for the average profit from
a
a) EE customer (i.e., a customer in the BODB market segment that
buys EE):
b) RR customer (i.e., a customer in the BODB market segment that
buys RR):
c) Clearly state any assumptions you make about the sampling
distribution.
From the given sample data we observe that No.of customers opted RR =9
No.of customers opted EE= 11
No.of customers opted DB= 15
proportion of customers opted RR = 9/35 = 0.2571
proportion of customers opted EE = 11/35= 0.3143
proportion of customers opted DB = 15/35= 0.4286
a. Binomial distribution is an appropriate distribution used to find the proportion of customers opted EE
since EE is an attribute
but the test we have to apply is Normal test for proportion
b.1. Parameters of the distribution are n= sample size and p= proportion of customers opted EE (in case of EE)
Mean no.of customers opted EE = np = 35x0.3143= 11
Its standard deviation =SQRT of ( npq) =SQRT(35x0.3143x0.6857)= 2.7464
2. a.95% cofidence limits for proportion of customers opted EE are given as
[(p - 1.96xS.E(p) , p+1.96 S.E(p)]
[0.3143 - 0.1538 ,0.3143+0.1538]
[ 0.1605 , 0.4681]
population proportion of customers opted EE is lies in between 0.1605 and 0.4681 under 95% confidence interval
b.Simillarly 95% confidence limits for proportion of customers opted RR are given as
[0.2571- 0.1447 , 0.2571+ 0.1447]
[0.1124 ,0.4019]
95% confidence limits for the population proportion of customers opted RR are 0.1124 and 0.4019
c. 95% confidence limits for the proportion of customers opted DB are given as
[0.4286 - 0.1639 , 0.4286+ 0.1639]
[0.2647 , 0.5926 ]
95& confidence limits for the population proportion of DB are 0.2647 and 0.5926
a. The purpose of finding the 95% confidence limits in each of the case is to find the general tendency
of EE, RR, DB it means these limits provide information about the population proportion of EE,RR and DB with the help of these limits we can understand the customers behaviour i.e how much percentage of customers opted RR, EE and DB
b.narrow confidence interval implies that there is a smaller chance of obtaining an observation within that interval. 95% confidence interval is narrower than 99% confidence interval
c. margin of error at 5% level
margin of error for EE is 1.96x S.E (EE) = 0.1538
margin of error for RR= 1.96xS.E(RR) = 0.1447
margin of error for DB= 1.96xS.E(DB) = 0.1639
a. 99% confidence interval for proportion of EE
[0.3143 - 2.58xS.E(EE) ,0.3143 +2.58xS.E(EE)]
[0.1118 ,0.5167]
b.99% confidence interval is more wider than the 95% confidence interval
to find more accurate results we can use 99% confidence limits
c. 95% confidence interval for average profit from EE customers
time out to answer remaining