In: Computer Science
The Python Question is as follows:
You have been asked by a manager of a store to identify the items most commonly bought together (the manager would like to place these items in close proximity).
You are given a file, receipts.txt , that contains the receipts of the last 1000 transactions in the following format, where each line of the file is a single receipt:
eggs, bread, milk, lettuce
cheese, milk, apples
bread, milk
bread, cheese, milk
Write a function in python that will open the file, process it, and run the most common combinations(combinations that occur at least twice). For the example above, you should return a dictionary containing :
{ (bread, milk) :3, (cheese, milk) :2 }
NOTE: Code in python.
def createDictionary():
itemDict = dict()
with open('receipts.txt', 'r') as file:
#Read complete
content
content =
file.readlines()
#Now process line by
line
for line in
content:
#items will store each item in a single line
items = list()
#Store items in a temporary list first
temp = line.split(',')
for item in temp:
items.append(item.strip())
#In a dictionary we can have tuple as a key.
#Since (a,b) is same as (b,a).
#To avoid issues due to order of the pair generated,
#we will first create a pair
#Then sort it using sorted() function for lists
#Then convert it to tuple. Thus every pair of a & b will be
(a,b) only.
#Create item pairs and store in itemDict
for i in range(len(items)-1):
for j in range(i+1, len(items)):
pair = [items[i], items[j]]
pair = sorted(pair)
pair = tuple(pair)
#If pair found in itemDict, increment frequency else set it to
1
if pair in itemDict.keys():
itemDict[pair] = itemDict[pair] + 1
else:
itemDict[pair] = 1
#Generate Initial dictionary
for key, value in itemDict.items():
print(key, value)
#Create dictionary with frequencies more than or
equal to 2 only
finalDict = dict()
for key, value in itemDict.items():
if value>=2:
finalDict[key] = value
print()
#Print final dictionary
for key, value in finalDict.items():
print(key, ":",
value)
return(finalDict)
Initial outputs are the raw dictionary generated, includeing frequency of 1 also.
Then final dictionary has been generated and printed with frequency of 2 or more only.
Remove the print statements that you do not need.