/    /  Machine Learning- FP-growth Algorithm

F-P Growth Algorithm

 

FP Growth Algorithm is abbreviated as Frequent pattern growth algorithm. It is an enhancement of Apriori algorithm in Association Rule Learning. FP growth algorithm is used for discovering frequent itemset in a transaction database without any generation of candidates. FP growth represents frequent items in frequent pattern trees which can also be called as FP-tree.

FP Growth Algorithm 1 (i2tutorials)

 

A frequent pattern is generated where the candidate generation is not needed. FP growth algorithm represents the database in a tree pattern called as frequent pattern tree or FP tree. FP Tree is a tree-like structure which is made with the initial item sets of the database. The purpose of the FP tree is to find out the most frequent pattern where each node of the FP tree represents an item of the itemset.

 

This tree structure will maintain the association or relation between the item sets. The database is separated by using one frequent item. This fragmented or separated part is called as pattern fragment. The item sets of these fragmented patterns are studied. Hence, the search for frequent item sets is decreased comparatively.

 

In FP Tree, the root node represents null while the lower nodes represent the item sets in Database. The relation of the nodes with the lower nodes that is the item sets with the other item sets are maintained while forming the tree.

FP Growth Algorithm 2 (i2tutorials)

 

Frequent Pattern Algorithm Steps

The frequent pattern growth algorithm helps us to discover the frequent pattern without the generation of candidates.

 

Let us see the steps followed in the frequent pattern by using frequent pattern growth algorithm:

 

Step 1)

The first step is to scan the entire database to find the possible occurrences of the item sets in the database. This step is the similar to the first step of Apriori algorithm. Number of 1-itemsets in the database is called support count or frequency of 1-itemset.

 

Step 2) 

The second step in the FP growth algorithm, is to construct the FP tree. Create the root of the tree where the root is represented by null.

 

Step 3) 

The next step is to scan the database once again and observe the transactions. Examine the first transaction and find the itemset in the database. The itemset with the maximum count is taken at the top and the itemset with lower count is taken at bottom and so on. Which means that the branch of the tree is constructed with transaction item sets in descending order of count.

 

Step 4) 

The next step is to examine the transaction in the database. The item sets are sorted in descending order of count. If any itemset of this transaction is already present in any other branch, then this transaction branch may share a common prefix to the root of the FP Growth algorithm. This means that the common itemset is connected to the new node of another itemset in this transaction.

 

Step 5) 

Next step is that the count of the itemset is increased as it occurs in the transactions. Both the common node and new node count is incremented by 1 as they are created and linked according to transactions.

 

Step 6) 

This step is done to mine the FP Tree which is created. In this step, the lowest node is examined first along with the connections of the lowest nodes. The lowest node represents the frequency pattern of length 1. Traverse the path in the FP Tree. These paths are called as a conditional pattern base.

 

Conditional pattern base is a sub-database consists of prefix paths in the FP tree with the lowest node as suffix.

 

Step 7) 

Next step is to construct a Conditional FP Tree, which is formed by a count of item sets in the path. The item sets which satisfies the threshold support are considered in the Conditional FP Tree.

 

Step 8) 

Final step is to generate Frequent Patterns from the conditional FP Tree.

FP Growth Algorithm 3 (i2tutorials)

 

Advantages

  1. FP algorithm is Faster than Apriori algorithm.
  2. Candidate generation is not needed.
  3. There will be only two passes over dataset.

 

Disadvantages

  1. FP tree may not be able to fit in memory.
  2. FP tree is expensive to build the algorithm.