Question

In: Computer Science

Write a program to implement Apriori Algorithm on web log data?   do a google search for...

Write a program to implement Apriori Algorithm on web log data?  

do a google search for any keyword and store the results in a file or take some web log data from internet and apply apriori algorithm to get a meaningful conclusion from the data

Solutions

Expert Solution

Implementation of Apriori algorithm in Python:

Step 1: Importing libraries

import numpy as np 
import pandas as pd 
from mlxtend.frequent_patterns import apriori, association_rules 

Step 2: Loading the data

# Changing the working location to the location of the file that u can download from google or any other file location you can use
cd C:\Users\Dev\Desktop\Kaggle\Apriori Algorithm 
  
# Loading the Data 
data = pd.read_excel('Online_Retail.xlsx') 
data.head() 

Below is the data stored in the file:

Then we can find out the coloumns of the file using following method:

# Exploring columns of the data 
data.columns 

Exploring the different regions of transactions

# Exploring the different regions of transactions 
data.Country.unique() 

Step 3: Cleaning the Data

# Stripping extra spaces in the description 
data['Description'] = data['Description'].str.strip() 

# Dropping the rows without any invoice number 
data.dropna(axis = 0, subset =['InvoiceNo'], inplace = True) 
data['InvoiceNo'] = data['InvoiceNo'].astype('str') 

# Dropping the transactions which were done at the credit 
data = data[~data['InvoiceNo'].str.contains('C')] 

Step 4: Splitting data according to region of transaction

# Transactions done in France 
basket_France = (data[data['Country'] =="France"] 
                .groupby(['InvoiceNo', 'Description'])['Quantity'] 
                .sum().unstack().reset_index().fillna(0) 
                .set_index('InvoiceNo')) 

# Transactions done in the United Kingdom 
basket_UK = (data[data['Country'] =="United Kingdom"] 
                .groupby(['InvoiceNo', 'Description'])['Quantity'] 
                .sum().unstack().reset_index().fillna(0) 
                .set_index('InvoiceNo')) 

# Transactions done in Portugal 
basket_Por = (data[data['Country'] =="Portugal"] 
                .groupby(['InvoiceNo', 'Description'])['Quantity'] 
                .sum().unstack().reset_index().fillna(0) 
                .set_index('InvoiceNo')) 

basket_Sweden = (data[data['Country'] =="Sweden"] 
                .groupby(['InvoiceNo', 'Description'])['Quantity'] 
                .sum().unstack().reset_index().fillna(0) 
                .set_index('InvoiceNo')) 

Step 5: Hot encoding of Data

# Defining hot encoding function
def hot_encode(x): 
        if(x<= 0): 
                return 0
        if(x>= 1): 
                return 1

# Encoding the datasets 
basket_encoded = basket_France.applymap(hot_encode) 
basket_France = basket_encoded 

basket_encoded = basket_UK.applymap(hot_encode) 
basket_UK = basket_encoded 

basket_encoded = basket_Por.applymap(hot_encode) 
basket_Por = basket_encoded 

basket_encoded = basket_Sweden.applymap(hot_encode) 
basket_Sweden = basket_encoded 

Step 6: Analyzing the results by building the models

a) France:

# Building the model 
frq_items = apriori(basket_France, min_support = 0.05, use_colnames = True) 

# Collecting the inferred rules in a dataframe 
rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 
print(rules.head()) 

b) United Kingdom:

frq_items = apriori(basket_UK, min_support = 0.01, use_colnames = True) 
rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 
print(rules.head()) 

This are some of the models and results from the given file which I have used to analyse the algorithm u can use another file which ever you want.


Related Solutions

Write a program in C++ to implement Binary Search Algorithm. Assume the data given in an...
Write a program in C++ to implement Binary Search Algorithm. Assume the data given in an array. Use number of data N at least more than 10. The function Binary Search should return the index of the search key V.
Use Google to do a web search for the website of the corporation of your choice...
Use Google to do a web search for the website of the corporation of your choice that has not been researched by another student. Find that corporation's annual report to stock-holders on the website. Note that you can use other sites to find the following information but in either case include a link to your reference(s). Answer the following questions: 1. What does the report say about the corporation's view of future business challenges and the market in which it...
Big Data is increasingly important to companies and accountants. Using a web search on Google or...
Big Data is increasingly important to companies and accountants. Using a web search on Google or other search sites, find an article within the last 12 months on “Big Data” and accounting. Summarize how the article describes the use of Big Data in an accounting context.
(Write a C# program DO NOT USE CLASS)Implement the merge sort algorithm using a linked list...
(Write a C# program DO NOT USE CLASS)Implement the merge sort algorithm using a linked list instead of arrays. You can use any kind of a linked structure, such as single, double, circular lists, stacks and/or queues. You can populate your list from an explicitly defined array in your program. HINT: You will not be using low, middle and high anymore. For finding the middle point, traverse through the linked list while keeping count of the number of nodes. Break...
Make a Binary search program for C# and write algorithm and explain it in easy words...
Make a Binary search program for C# and write algorithm and explain it in easy words also show output and input
Write a Java program that implements the Depth-First Search (DFS) algorithm. Input format: This is a...
Write a Java program that implements the Depth-First Search (DFS) algorithm. Input format: This is a sample input from a user. 3 2 0 1 1 2 The first line (= 3 in the example) indicates that there are three vertices in the graph. You can assume that the first vertex starts from the number 0. The second line (= 2 in the example) represents the number of edges, and following two lines are the edge information. This is the...
Many consumer web services let users use their Google account or Facebook account to log in...
Many consumer web services let users use their Google account or Facebook account to log in to the service without having to create a new account. This is a form of access control covered in the lecture. Explain how this mechanism works. What are the protocols that are used? Please provide a detailed explanation, and also provide the source (web link) of information as well. Thank you so much.
Do a web search to find out information on the abuse of dextromethorphan. Write a brief...
Do a web search to find out information on the abuse of dextromethorphan. Write a brief summary 300 words of what you have learned. please I need quick help these question. its due after one hour .thanks
I need to implement an algorithm based on local search that are in the following list...
I need to implement an algorithm based on local search that are in the following list in Python or Matlab( simulated annealing, variable neighborhood search, variable neighborhood descent) PLEASE HELP I am really stuck!
I need to implement an algorithm based on local search that are in the following list...
I need to implement an algorithm based on local search that are in the following list in Python or Matlab( tabu search, simulated annealing, iterated local search, evolutionary local search, variable neighborhood search, variable neighborhood descent) PLEASE HELP :)
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT