fp - growth / fp 演算法簡介

FP- GrowthFP樹/FP演算法簡介

intro• 很新的演算法（約2000年）

• 優點：不⽤用產⽣生候選⼦子集，不⽤用多次反覆掃過資料庫。

• 做法：

1. build tree: 將transaction壓縮成⼀一棵頻繁模式樹(frequent pattern tree, FP tree)。

2. mining tree:遞迴的挖掘這棵樹

• 輸⼊入：事務資料庫D，最⼩小⽀支持度minSupport。輸出：頻繁模式的完全集。

問題• 設商品總數集合為{蘋果,橘⼦子,⾹香蕉,麵包,⽜牛奶,可樂}，分別代表{a,b,c,d,e,f}，過去的交易資料如左所⽰示。編號交易資料

1 {a,b}2 {b,c,d}3 {a,c,d,e}4 {a,d,e}5 {a,b,c}6 {a,b,c,d}7 {a,f}8 {a,b,c}9 {a,b,d}10 {b,c,e}

有哪些商品(集合）存在強關聯性呢？（min support = 0.2)

step1: build tree• A.建⽴立樹根null，掃過⼀一次資料庫，計算每個item的次數後進⾏行排序(由⾼高到低）。刪去⼩小餘min support count的項，min support count = 0.2 * 10 = 2。

• Ｂ.⼀一次讀取⼀一條 Transaction，將此transaction按照上述順序排序(ex:{e,d,a} =>{a,d,e})

• 把每個transaction依序插⼊入FP-tree中，並在每個點記錄出現次數。

• 若⼦子結點無對應項⺫⽬目，則創造⼀一個⼦子節點，並以linked list（虛線）串起相同的值。

step1: build treeTID 交易

1 {b,a}2 {b,d,c}3 {d,e,a,c}4 {a,d,e}5 {c,b,a}6 {a,c,b,d}7 {a,f}8 {b,a,c}9 {b,d,a}10 {c,e,b}

source: Pearson Education Taiwan

counta 8b 6c 5d 5e 3f 1

(head table)TID 交易（排序後）1 {a,b}2 {b,c,d}3 {a,c,d,e}4 {a,d,e}5 {a,b,c}6 {a,b,c,d}7 {a}8 {a,b,c}9 {a,b,d}10 {b,c,e}

去除 f 項 (support count =1 < 2)

step1: build tree

讀取 TID:1 {a,b} 插⼊入 a,b

TID 交易（排序）

1 {a,b}2 {b,c,d}3 {a,c,d,e}4 {a,d,e}5 {a,b,c}6 {a,b,c,d}7 {a}8 {a,b,c}9 {a,b,d}10 {b,c,e}


step1: build treeTID 交易（排序）


讀取 TID:2 {b,c,d} （虛線為item linked list，串起b)




讀取 TID 3:{a,c,d,e} （虛線為item linked list，串起相同的值)



1 {a,b}2 {b,c,d}3 {a,c,d,e}4 {a,d,e}5 {a,b,c}6 {a,b,c,d}7 {a}8 {a,b,c}9 {a,b,d}10 {b,c,e} 全部讀取完插⼊入FP-tree 後


build tree 整理1. 創造樹根null，建造head table記項⺫⽬目support

2. 刪去support count不⾜足要求的項⺫⽬目，將排序交易由support count⼤大到⼩小排序。

3. 由根依序插⼊入樹中，檢查每個⼦子節點是否有欲插⼊入之項⺫⽬目。若有，項⺫⽬目計數加⼀一，往下個值。若沒有，創造新⼦子節點並插⼊入。

4. 若是創造了新節點，則將上⼀一個，同個值得節點以linked list串起，未來才好找相同的值。

step 2: mining tree

original FP-tree

1. 由下⾄至上（bottom-up）進⾏行探索，依序檢視每個項⺫⽬目。

2. 找出包含該項⺫⽬目的所有路徑，並記錄次數(即為「條件模式基」）

3. 根據「條件模式基」，以FP-tree建⽴立的⽅方法，建構出條件⼦子樹。（以該項⺫⽬目為前提之投影FP-tree)

4. 重複這些程序, 直⾄至條件樹為空。


step 2: mining tree

包含節點 e 的路徑

1. 由最⼩小項⺫⽬目e開始，向上找出包含其e的路徑。

2. 根據路徑以及e的次數，列出條件模式基（conditional pattern base）

3. {e}的條件模式基為:{acde:1, ade:1, bce:1}，其意思為e發⽣生的條件下該路徑⾛走過了幾次（若e為2，則數字為2)。


step 2: mining tree

包含節點 c 的路徑包含節點 b的路徑包含節點 a 的路徑

包含節點d的路徑original FP-tree

ex:


step 2: mining tree

包含節點 e 的路徑 {e} 之條件FP-tree

1. {e}之條件模式基為: {acde:1, ade:1, bce:1} (acd:1, ad:1, bc:1 | {e})

2. 根據條件模式基，以構造FP-tree之⽅方法建構出「{e}之條件 FP-tree」(original FP-tree | {e})

3. 對「{e}之條件 FP-tree 」進⾏行挖掘 4. 直⾄至所有條件樹為空。

Notice: sup {b}<2


step 2: mining treeex:

{e} 之條件FP-tree (original FP-tree | {e})

{d,e} 結尾之路徑 {d,e} 之條件FP-tree (original FP-tree | {d,e})

Notice: sup {c}<2

{c,e} 結尾之路徑 {c,e} 之條件FP-tree (original FP-tree | {c,e})


演算法思想• 透過迭代的建構條件FP-tree 來⽣生成可能⼦子集

• 不產⽣生generated candidate

• 透過投影條件FP-tree避免多次資料庫掃瞄，節省I/O 時間以及成本。

fp - growth / fp 演算法簡介

Engineering