In: Statistics and Probability
Marketing research
the answer should span between 1/2 and 1 document page.
Essay questions:
Briefly describe two methods of analysis of association. Discuss the relevance of analysis of association in general for marketing and provide examples.
Step 1: Data Preparation
The Association analysis process expects transactions to be in a particular format. The input grid should have binominal (true or false) data with items in the columns and each transaction as a row. If the data set contains transaction IDs or session IDs, they can either be ignored or tagged as a special attribute in RapidMiner. Data sets in any other format have to be converted to this transactional format using data transformation operators. In this example, we have used the data shown in Table 6.3, with a session ID on each row and content accessed in the columns, indicated by 1 and 0. This integer format has to be converted to a binomial format by a numerical to binominal operator. The output of Numerical to Binominal is then connected to the FP-Growth operator to generate frequent item sets. The data set and RapidMiner process for association analysis can be accessed from the companion site of the book at www.LearnPredictiveAnalytics.com. Figure 6.10 shows the RapidMiner process of Association analysis with FP Growth algorithm.
Figure 6.10. Data mining process for FP-Growth algorithm.
Step 2: Modeling Operator and Parameters
The FP-Growth operator in RapidMiner generates all the frequent item sets from the input data set meeting a certain parameter criterion. The modeling operator is available at Modeling > Association and Item Set Mining folder. This operator can work in two modes, one with a specified number of high support item sets (default) and the other with minimum support criteria. The following parameters can be set in this operator there by affecting the behavior of the model.
▪
Min Support: Threshold for support measure. All the frequent item sets passing this threshold will be provided in the output
▪
Max Items: Maximum number of items in an item set. Specifying this parameter limits too many items in an item set.
▪
Must Contain: Regular expression to filter item sets to contain specified items. Use this option to filter out items.
▪
Find Minimum Number of Item Sets: This option allows the FP-Growth operator to lower the support threshold, if fewer item sets are generated with the given threshold. The support threshold is decreased by 20% in each retry.
▪
Min Number of Item Sets: Value of minimum number of item sets to be generated.
▪
Max number of Retries: Number of retries allowed in achieving minimum item sets
In this example, we are setting Min Support to 0.25. The result of the FP-Growth operator is the set of item sets generated, which can be viewed in the results page. The reporting options include filtering based on the number of items and sorting based on the support threshold. Figure 6.11 shows the output of Frequent item sets operator where all possible item sets with support higher than the threshold can be seen.
Figure 6.11. Frequent item set output.
Step 3: Create Association Rules
The next step in association analysis is generation of the most interesting rules from the frequent item sets created from the FP-Growth operator. The Create Association Rules operator generate relevant rules from frequent item sets. The interest measure of the rule can be specified by providing the correct interest criterion based on the data set under investigation. The input of the Create Association Rules operator is frequent item sets of FP-Growth operator and the output generates all the association rules meeting the interest criterion. The following parameters govern the functionality of this operator:
▪
Criterion: Used to select the interest measure to filter the association rule. All other parameters change based on the criterion selection. Confidence, lift, and conviction are commonly used interest criterion.
▪
Min Criterion Value: Specifies the threshold. Rules not meeting the thresholds are discarded.
▪
The Gain theta and Laplace parameters are the values specified when using gain and Laplace for the interest measure.
In this example process, we are using confidence as the criterion and a confidence value of 0.5. Figure 6.10 shows the completed RapidMiner process for association analysis. The process can be saved and executed.
Step 4: Interpreting the Results
The filtered association analysis rules extracted from the input transactions can be viewed in the results window (Figure 6.12). The listed association rules are in a table with columns including the premise and conclusion of the rule, as well as the support, confidence, gain, lift, and conviction of the rule. The interactive control window on the left-hand side of the screen allows the users to filter the processed rules to contain the selected item and ther is a slide bar to increase the confidence or criterion threshold, thereby showing fewer rules.
Figure 6.12. Association rules output.
The main purpose of the association analysis is to understand the relationship between items. Since the items take the role of both premise and conclusion, a visual representation of relationships between all the items, through a rule, can help to comprehend the analysis. Figure 6.13 shows the rules in text format and by interconnected graph format through the results window, for selected items. Using the default option, the items selected are connected with the rules by arrows. The incoming item to a rule is the premise of the rule and the outgoing item is the conclusion of the association rule.