Question

In: Computer Science

In basket analysis, we want to find the dependence between two items X and Y. Given...

  1. In basket analysis, we want to nd the dependence between two items X and Y. Given a database of customer transactions, how can you nd these dependencies? How would you generalize this to more than two items?

Solutions

Expert Solution

Instead of just giving you the answer, I would like to walk you through the logic of what works and what does not based on my experience.

Let's say that you are faced with this exact problem. How about you start cracking with the data? Get those who bought product A. Easy enough. Now, see what other products they got. Easy, again. Now is the hard part - how do you interpret these results? Let's suppose B was number one on that list. Great! Correlation found! Or was it? Was B the most popular product in your store outside of A? Would you expect any customer to have B up there in the ranks, not just those who bought A?

In other words, you need to create a baseline "expected" list of products so you can compare those who bought A to that list and see if they are more likely to buy B then just a random customer. But before you do that, I believe I know what answer you are going to get. The answer is yes, they are more likely to buy B than a random customer. How do I know that? Experience.

We have to talk about things not directly related to your immediate question. Let's start with the general understanding of the database of transactions. What you will have a is a bunch of transactions by different customers, and some of them will have many transactions, while the majority will have very few or just one transaction. Wait, how does that relate to the problem at hand, am I just wasting your time? No. Let's assume customers are buying products in a completely randomized fashion, much like running random generator every time they make a transaction. Now, answer a simple question, who is likely to have bought both A and B products, a customer with one transaction or a customer with 10 transactions? Of course, a customer with more transactions is likely to have bought from more categories as bought many products, and thus your high transaction customers will over-weigh on every category you analyze.

Here is the crucial question - what kind of customer composition are you likely to get when you are running a query "customers who have bought A"? Are you going to get more low transaction number customer or more high transaction number customers compared to the total database? Of course, your sample is going to be biased toward the high-transaction customers. So, if you were to compare "customers who bought A" vs "all customers", then you are going to find out that the first group is more likely to buy... well, everything. Essentially, take a pick at your category, and these customers will be more likely to buy it than an average customer. This is how I know that customers who bought A in all likelihood are going to buy more B than an average customer.

The bottom line is every analysis that you run that includes multiple transactions per customer, you have to adjust for the number of transactions the customer made and/or items they bought. To accomplish that, you basically need to compare those who bought A and also had one more transaction (or item bought) to those who had just one transaction (or item bought), and then two to two, and so on. To get it done, you want to create a weight system (essentially, percentages of the total sample), which you will apply to the rankings you got for those who bought A.


Related Solutions

​A basket of goods for a given consumer includes two goods, X and Z
A basket of goods for a given consumer includes two goods, X and Z. Consumer income is equal to $1,000 and the prices of these two goods are as follows: Px = $20 Pz = $20 This consumer is consuming 10 units of good X. Suppose that over the course of a year, the price of good X changes by - 10% and the price of good Z changes by 10%. How much income would be required for the consumer to afford the same quantity...
Two goods are in the consumer basket, goods X and Y. The consumer income is I,...
Two goods are in the consumer basket, goods X and Y. The consumer income is I, price of good X is Px, and price of good Y is Py. Use the consumer model to derive the demand curve for good X. Explain your answer in details. b. Using your answer in part a, clearly explain and show the substitution effect and the income effect.
We wish to look at the relationship between y and x. Summary measures are given below:...
We wish to look at the relationship between y and x. Summary measures are given below: n=5, SSxx=137.2, SSyy=242.8, and SSxy=-169.4 Find the t test statistic for the hypothesis H0: β1=0 vs Ha: β1≠0. Please give detailed explanation
find the general solution of the given differential equation 1. 2y''+3y'+y=t^2 +3sint find the solution of...
find the general solution of the given differential equation 1. 2y''+3y'+y=t^2 +3sint find the solution of the given initial value problem 1. y''−2y'−3y=3te^2t, y(0) =1, y'(0) =0 2.  y''−2y'+y=te^t +4, y(0) =1, y'(0) =1
We got for x > 0 given a differential equation y’ = 1-y/x, with start value...
We got for x > 0 given a differential equation y’ = 1-y/x, with start value y(2)= 2 Find the Taylor polynomial of first and second degree for y(x) at x =2. Show that y(x) =x/2 +2/x solves the given equation.
This is the probability distribution between two random variables X and Y: Y \ X 0...
This is the probability distribution between two random variables X and Y: Y \ X 0 1 2 3 0.1 0.2 0.2 4 0.2 0.2 0.1 a) Are those variables independent? b) What is the marginal probability of X? c) Find E[XY]
A consumes two goods, x and y. A ’s utility function is given by u(x, y)...
A consumes two goods, x and y. A ’s utility function is given by u(x, y) = x 1/2y 1/2 The price of x is p and the price of y is 1. A has an income of M. (a) Derive A ’s demand functions for x and y. (b) Suppose M = 72 and p falls from 9 to 4. Calculate the income and substitution effects of the price change. (c) Calculate the compensating variation of the price change....
Given two functions, M(x, y) and N(x, y), suppose that (∂N/∂x − ∂M/∂y)/(M − N) is...
Given two functions, M(x, y) and N(x, y), suppose that (∂N/∂x − ∂M/∂y)/(M − N) is a function of x + y. That is, let f(t) be a function such that f(x + y) = (∂N/∂x − ∂M/∂y)/(M − N) Assume that you can solve the differential equation M dx + N dy = 0 by multiplying by an integrating factor μ that makes it exact and that it can also be written as a function of x + y,...
For dispersion modeling we want to find the concentration of a pollutant at a location (x,y,z)...
For dispersion modeling we want to find the concentration of a pollutant at a location (x,y,z) in 3-D space, relative to a point source at (0,0,H). How is the x distance downwind accounted for in the Gaussian dispersion equation?
find the solution of the given initial value problem 1. y''−2y'−3y=3te2t, y(0) =1, y'(0) =0 2....
find the solution of the given initial value problem 1. y''−2y'−3y=3te2t, y(0) =1, y'(0) =0 2.    y''+4y=3sin2t, y(0) =2, y'(0) =-1
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT