In: Statistics and Probability
QUESTION 1
Suppose you manage an algorithmic trading operation where computers are trading in the stock market automatically without human intervention using algorithms. Suppose that your algorithm called “Shining Star” (SS) makes an average profit of $6000 each trading hour in the stock market. However, your gut instinct is that the algorithm’s performance has decreased recently. We have provided a random sample of 75 of the more recent hours of trading performance.
PARTS
Write down a hypothesis test for checking if the performance has
decreased. Suppose your significance level is 0.01 .
a) Write down the test statistic.
b) What is the name of the model/distribution that would be
appropriate to use for the probability distribution of the test
statistic? Also, please state your assumptions for picking that
distribution.
c) Please provide as much information as you can about the relevant
parameters for the distribution (e.g., mean and standard deviation)
under the status quo or null.
a) What is your p-value for this test?
b) What is the critical threshold for your test statistic?
b) Has the performance of your SS algorithm decreased? Why or why
not?
[
a) Now write down a hypothesis test for checking if the performance
has increased. Suppose your significance level is 0.01 .
b) What is your p-value for this test?
c) Has the performance of your SS algorithm increased? Why or why
not?
QUESTION 2
Internet retailers have so much data! Epods.com
finds that there is a 10% chance that a visitor buys a product.
Using sales data over the past two years, you’ve helped them
realize that if a visitor spends a long amount of time on the site
(i.e., more than 5 minutes), it effects the probability of the
visitor buying a product. In particular, among all the buys made by
visitors, 80% of time the visitor spent a large amount of time on
the site. Among all the visits that resulted in no buy or purchase,
25% of the time the visitor spent a large amount of time on the
site.
For all the parts below, if there is a formula involved, please
show your work or the relevant formula using probabilistic
notation, e.g., P( something1 | something2 ) = … .
Hint:
Define the events
B = visitor buys a product
L = visitor spends a long amount of time on the site (i.e., more
than 5 minutes)
What is the prior probability of a visitor buying a product?
What is the conditional probability of a visitor spending a long amount of time on the site, given that the visitor makes a purchase?
What is the conditional probability that a visitor buys a product given that the visitor spends a long amount of time on the site?
What is the conditional probability that a visitor does not make a purchase given that the visitor has spent a long time on the site?
What is the conditional probability that a visitor makes a purchase given that the visitor does not spend a long time on the site?
QUESTION 3 and QUESTION 4: Context
Suppose you are part of the analytics team for the online retailer Macha Bucks which sells two types of tea to its online visitors: Rouge Roma (RR) and Emerald Earl (EE). Everyday approximately 10,000 people visit the site over a 24 hour period. For simplicity suppose we consider the “buy one or don’t buy” (BODB) market segment of customers which when they visit the site will conduct one of the following actions: (a) buy one order of RR, (b) buy one order of EE, or (c) don’t buy (DB) anything. You have been tasked with determining customer behavior on the website for the BODB segment using a random sample of 35 visits.
In the dataset for the random sample, each row corresponds to a random visitor. For each visitor we provide both the visitor’s action as well as the profit earned on the transaction. In the action column:
if the visitor buys one order of RR, we see a RR,
if the visitor buys one order of EE, we see an EE,
if the visitor doesn’t buy anything, we see a DB.
Note that even if two customers buy the same product, the profit can differ due to the shipping costs, promotions, or coupons that are applied.
QUESTION 3
PARTS
Using the sample data, obtain a point estimate for the proportion
of customers in this BODB market segment that
a) purchase EE:
b) purchase RR:
c) don’t buy:
a) What is the name of the model/distribution that would be
appropriate to use for the probability distribution of the sample
proportion of the BODB market segment that purchases EE?
b) Please provide as much information as you can about the relevant
parameters for the distribution (e.g., mean and standard
deviation).
Please provide a 95% confidence interval for population proportion
of the BODB market segment that
a) purchase EE:
b) purchase RR:
c) don’t buy:
a) What does the 95% confidence interval mean intuitively? Please
provide an interpretation.
b) What could you do to obtain a narrower 95% confidence
interval?
c) What would you need to do to have a margin of error of 0.05?
Please do the calculation.
a) Please provide a 99% confidence interval for the population
proportion of the BODB market segment that purchases EE.
b) When would you prefer a 99% confidence interval rather than a
95% confidence interval?
What is the 95% confidence interval for the average profit from
a
a) EE customer (i.e., a customer in the BODB market segment that
buys EE):
b) RR customer (i.e., a customer in the BODB market segment that
buys RR):
c) Clearly state any assumptions you make about the sampling
distribution.
QUESTION 4
PARTS
a) What could be an appropriate probability distribution to use for
modeling the number of visitors that the website has in an
hour?
b) What parameters would you use for the probability
distribution?
c) Using that distribution, determine the probability that more
than 600 people visit the site in an hour.
a) What could be an appropriate probability distribution to use for
modeling the number of seconds between customer visits?
b) What parameters would you use for the probability
distribution?
c) Using that distribution, determine the probability that the time
between customer visits to the website is less than 10 seconds.
a) What could be an appropriate probability distribution to use for
modeling the number of website visitors from 100 visitors that do
not buy anything?
b) What parameters would you use for the probability
distribution?
c) Using that distribution, determine the probability that from
among 100 customers, it turns out that 30 or more customers do not
buy anything.
d) What is the average number of visitors (from among 100
customers) that do not buy anything?
e) What is the standard deviation of the number of visitors (from among 100 customers) that do not buy anything?
What is the average profit from among 100 random customers that
visit the site?
Please explain your answer or show your calculations.
question 2
P(B) = proability of visitor buys a product=10%=0.1
P(L) = probability of visitor spends a long amount of time on the
site (i.e., more than 5 minutes)
1) prior probability of a visitor buying a product,P(B)=10%
2)
it is given that among all the buys made by visitors, 80% of time the visitor spent a large amount of time on the site
so,conditional probability of a visitor spending a long amount of time on the site, given that the visitor makes a purchase,P(L|B)=80%=0.80
3)proabability of not buying a product P() =1-0.1=0.9
it is given that Among all the visits that resulted in no buy or purchase, 25% of the time the visitor spent a large amount of time on the site
so,P(L | )= 25%=0.25
P(L | )=P(L )/P()=0.25
P(L )=0.25*P()
=0.25*0.9=0.225
SO,P(L )=0.225
NOW,P(L|B)=0.80
P(L|B)=P(L B)/P(B)=0.80
P(L B)=0.80*P(B)=0.80*0.1=0.08
SO, P(L)=P(L )+P(L B)
P(L)=0.225+0.08=0.305
conditional probability that a visitor buys a product given that the visitor spends a long amount of time on the site,P(B|L)=P(L B)/P(L)=0.08/0.305=0.262
4)
conditional probability that a visitor does not make a purchase given that the visitor has spent a long time on the site,P( |L)= P(L )/P(L)= 0.225/0.305 =0.738
5) probability of visitor does not spend a long time on the site,P()= 1 - 0.305=0.695
also,P(B)=P(B L) +P(B )
0.1=0.08+P(B )
P(B )=0.02
conditional probability that a visitor makes a purchase given that the visitor does not spend a long time on the site,P(B|) = P(B )/P()
=0.02/0.695=0.029