In: Computer Science
Use Python programming to find most popular single products and co-purchased products from the large transaction data: retail.csv. Each row in the file is one purchase transaction from a customer, including a set of product ids separated by commas. The first column is transaction ID, column 2-3 are the products ID purchased in this transaction. It is worth mentioning that if the value in third column is zero, it means this customer only purchased one product (the one in second column).
Note:
• Co-purchased products is defined as a pair of products purchased in the same transaction. For example a row is: "2 24 35". Then 24 and 35 is a pair of copurchased products IDs,.
• To find co-purchased product in each transaction, you might use a nested loop.
• Write top 10 single products and top 10 co-purchased product pairs into a new file: output.txt
import csv
import operator
f = open('retail.csv')
csv_f = csv.reader(f)
row_data = []
co_purchased={} #dict for storing co_purchased items
singly_purchased={} #dict for storing singly purchased items
for row in csv_f:
if row[1]!=0 and row[2]!=0: #if two object are in same transaction
key= str(row[1])+ " " + str(row[2])
if key not in co_purchased:
co_purchased[key]=1
else
co_purchased[key]+=1
elif row[1]!=0: # if object 1 is only there
key=str(row[1])
if key not in singly_purchased:
singly_purchased[key]=1
else
singly_purchased[key]+=1
elif row[2]!=0: #if only obj 2 is there
key=str(row[2])
if key not in singly_purchased:
singly_purchased[key]=1
else
singly_purchased[key]+=1
#Now we need to sort the dictionaries according to frequency High to Low
co_purchased_sorted= dict( sorted(co_purchased.items(), key=operator.itemgetter(1),reverse=True))
top_10_co_purchased=[]
counter=0
for key in co_purchased_sorted:
top_10_co_purchased.append(key+"\n") #appending only keys that is item set into list
counter+=1
if counter==10:
break
file1 = open("output.txt","w") #file1 represent the reads empty output file output.txt
file1.writelines(top_10_co_purchased) #writes top 10 co purchased items in it
singly_purchased_sorted= dict( sorted(singly_purchased.items(), key=operator.itemgetter(1),reverse=True))
top_10_singly_purchased=[]
counter=0
for key in singly_purchased_sorted:
top_10_singly_purchased.append(key+"\n")
counter+=1
if counter==10:
break
file1.writelines(top_10_co_purchased) #writes top 10 co purchased items in it