In: Computer Science
Write a program to implement Apriori Algorithm on web log data?
do a google search for any keyword and store the results in a file or take some web log data from internet and apply apriori algorithm to get a meaningful conclusion from the data
Implementation of Apriori algorithm in Python:
Step 1: Importing libraries
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
Step 2: Loading the data
# Changing the working location to the location of the file that u can download from google or any other file location you can use
cd C:\Users\Dev\Desktop\Kaggle\Apriori Algorithm
# Loading the Data
data = pd.read_excel('Online_Retail.xlsx')
data.head()
Below is the data stored in the file:
Then we can find out the coloumns of the file using following method:
# Exploring columns of the data
data.columns
Exploring the different regions of transactions
# Exploring the different regions of transactions
data.Country.unique()
Step 3: Cleaning the Data
# Stripping extra spaces in the description
data['Description'] = data['Description'].str.strip()
# Dropping the rows without any invoice number
data.dropna(axis = 0, subset =['InvoiceNo'], inplace = True)
data['InvoiceNo'] = data['InvoiceNo'].astype('str')
# Dropping the transactions which were done at the credit
data = data[~data['InvoiceNo'].str.contains('C')]
Step 4: Splitting data according to region of transaction
# Transactions done in France
basket_France = (data[data['Country'] =="France"]
.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))
# Transactions done in the United Kingdom
basket_UK = (data[data['Country'] =="United Kingdom"]
.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))
# Transactions done in Portugal
basket_Por = (data[data['Country'] =="Portugal"]
.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))
basket_Sweden = (data[data['Country'] =="Sweden"]
.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))
Step 5: Hot encoding of Data
# Defining hot encoding function
def hot_encode(x):
if(x<= 0):
return 0
if(x>= 1):
return 1
# Encoding the datasets
basket_encoded = basket_France.applymap(hot_encode)
basket_France = basket_encoded
basket_encoded = basket_UK.applymap(hot_encode)
basket_UK = basket_encoded
basket_encoded = basket_Por.applymap(hot_encode)
basket_Por = basket_encoded
basket_encoded = basket_Sweden.applymap(hot_encode)
basket_Sweden = basket_encoded
Step 6: Analyzing the results by building the models
a) France:
# Building the model
frq_items = apriori(basket_France, min_support = 0.05, use_colnames = True)
# Collecting the inferred rules in a dataframe
rules = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])
print(rules.head())
b) United Kingdom:
frq_items = apriori(basket_UK, min_support = 0.01, use_colnames = True)
rules = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])
print(rules.head())
This are some of the models and results from the given file which I have used to analyse the algorithm u can use another file which ever you want.