In: Computer Science
Write a program to implement Apriori Algorithm on web log data?
do a google search for any keyword and store the results in a file or take some web log data from internet and apply apriori algorithm to get a meaningful conclusion from the data
Implementation of Apriori algorithm in Python:
Step 1: Importing libraries
import numpy as np 
import pandas as pd 
from mlxtend.frequent_patterns import apriori, association_rules 
Step 2: Loading the data
# Changing the working location to the location of the file that u can download from google or any other file location you can use
cd C:\Users\Dev\Desktop\Kaggle\Apriori Algorithm 
  
# Loading the Data 
data = pd.read_excel('Online_Retail.xlsx') 
data.head() 
Below is the data stored in the file:

Then we can find out the coloumns of the file using following method:
# Exploring columns of the data 
data.columns 

Exploring the different regions of transactions
# Exploring the different regions of transactions 
data.Country.unique() 

Step 3: Cleaning the Data
# Stripping extra spaces in the description 
data['Description'] = data['Description'].str.strip() 
# Dropping the rows without any invoice number 
data.dropna(axis = 0, subset =['InvoiceNo'], inplace = True) 
data['InvoiceNo'] = data['InvoiceNo'].astype('str') 
# Dropping the transactions which were done at the credit 
data = data[~data['InvoiceNo'].str.contains('C')] 
Step 4: Splitting data according to region of transaction
# Transactions done in France 
basket_France = (data[data['Country'] =="France"] 
                .groupby(['InvoiceNo', 'Description'])['Quantity'] 
                .sum().unstack().reset_index().fillna(0) 
                .set_index('InvoiceNo')) 
# Transactions done in the United Kingdom 
basket_UK = (data[data['Country'] =="United Kingdom"] 
                .groupby(['InvoiceNo', 'Description'])['Quantity'] 
                .sum().unstack().reset_index().fillna(0) 
                .set_index('InvoiceNo')) 
# Transactions done in Portugal 
basket_Por = (data[data['Country'] =="Portugal"] 
                .groupby(['InvoiceNo', 'Description'])['Quantity'] 
                .sum().unstack().reset_index().fillna(0) 
                .set_index('InvoiceNo')) 
basket_Sweden = (data[data['Country'] =="Sweden"] 
                .groupby(['InvoiceNo', 'Description'])['Quantity'] 
                .sum().unstack().reset_index().fillna(0) 
                .set_index('InvoiceNo')) 
Step 5: Hot encoding of Data
# Defining hot encoding function
def hot_encode(x): 
        if(x<= 0): 
                return 0
        if(x>= 1): 
                return 1
# Encoding the datasets 
basket_encoded = basket_France.applymap(hot_encode) 
basket_France = basket_encoded 
basket_encoded = basket_UK.applymap(hot_encode) 
basket_UK = basket_encoded 
basket_encoded = basket_Por.applymap(hot_encode) 
basket_Por = basket_encoded 
basket_encoded = basket_Sweden.applymap(hot_encode) 
basket_Sweden = basket_encoded 
Step 6: Analyzing the results by building the models
a) France:
# Building the model 
frq_items = apriori(basket_France, min_support = 0.05, use_colnames = True) 
# Collecting the inferred rules in a dataframe 
rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 
print(rules.head()) 

b) United Kingdom:
frq_items = apriori(basket_UK, min_support = 0.01, use_colnames = True) 
rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 
print(rules.head()) 

This are some of the models and results from the given file which I have used to analyse the algorithm u can use another file which ever you want.
