incremental mining of web sequential patterns using plwap tree on tolerance minsupport

29
Incremental Mining of Web Seq uential Patterns Using PLWAP Tree on Tolerance MinSupport C.I. Ezeife Min Chen IDEAS’04

Upload: flower

Post on 13-Jan-2016

29 views

Category:

Documents


1 download

DESCRIPTION

Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport. C.I. Ezeife Min Chen IDEAS ’ 04. Introduction. This paper proposes an algorithm, PL4UP , which uses the PLWAP tree structure to incrementally update web sequential patterns. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

C.I. Ezeife Min Chen IDEAS’04

Page 2: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

Introduction This paper proposes an algorithm, PL4UP,

which uses the PLWAP tree structure to incrementally update web sequential patterns.

The process of generating new patterns in the updated database (old + new data) using only the updated part (new data) and previously generated patterns is called incremental mining of sequential patterns.

Page 3: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

The Proposed Incremental PL4UP Algorithm

F :frequent items S: small items F’: updated large (frequent) items S’: updated small items in the updated database U (old + new data)

six categories of items :

(1) large in old database, DB and still large in updated database U (F→F’)(2) large in old DB but small in U (F→S’)(3) small in old DB but large in U(S→F’) (4) small in old DB and still small in U(S→S’)(5) new items that are large in U (∅ →F’)(6) new items that are small in U (∅ →S’).

The main idea of PL4UP is to avoid scanning the whole updated database, U (DB+db) but to scan only the changes to the database (db).

Page 4: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP

1. Construct initial PLWAP tree using tolerance minimum support, t

t = factor * s 0≤factor≤1. frequent 1-items (meeting support s), F1 frequent 1-items (meeting support t), Ft Small 1-items (not meeting support s), S1. potentially frequent 1-items (in S1 list but meeting support t),

PF1. potentially still small 1-items, SS1 (in S1 list but not meeting

the support t) we obtain two subsequences, the frequent subsequence mee

ting s requirements (seq s) and the frequent subsequence meeting the t requirements (seq t).

Page 5: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP

>3 >2

C1 ={a:5, b:5, c:3, d:1, e:2, f:2:, g:1, h:1}. F1 ={a, b, c} >=3Ft ={a, b, c, e, f} >=2S1 ={e, f, d, g, h}, <3PF ={e, f} 2=<X<3SS ={d, g, h} <2

Page 6: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP

Page 7: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP

After checking all patterns, the list ofTFP mined is:{a:5, aa:4, aac: 3, ab:5, aba:4, abac:3,abc:3, abcc:2, abe:2, abf:2, ac:3, acc:2, ae:2, af:2, b:5.ba:4, bac:3, bab:1, bc:3, bcc:2, be:2, bf:2, c:3, cc:2,e:2, f:2}. SFP ={a:5, aa:4, aac: 3, ab:5, ac:3, aba:4,abac:3, abc:3, b:5. ba:4, bac:3, bc:3, c:3}.

Page 8: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport
Page 9: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP

C’1 = C’1∪Cdb1 . C1 ={a:5, b:5, c:3, d:1, e:2, f:2:, g:1, h:1}.

Cdb1={a:2, b:1, e:2, f:2, g:2, h:2},

C’1 ={a :7,b : 6,c : 3,d : 1,e : 4,f : 4,g : 3,h : 3}. F’1 ={a : 7,b : 6,e : 4,f :4} >=4 F’t ={a : 7,b : 6,c : 3,e : 4,f : 4,g : 3,h : 3}.>=3 Fdb

1 = Cdb1∩F’1={a:2, b:1, e:2, f:2}.

Fdbt = Cdb

1∩F’t ={a : 2,b : 1,e : 2,f : 2,g : 2,h : 2}.Sdb

1 = Cdb1∩S’1={g:2}. PF’={c:3, g:3, h:3 }.

PFdb= Cdb1∩PF’={g:2, h:2}. SS’={d:1}. SSdb= Cdb

1∩SS’=∅

Page 10: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP

Fdbt = Cdb

1∩F’t ={a : 2,b : 1,e : 2,f : 2,g : 2,h : 2}.

Page 11: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP

The mined TFPdb={a:2, ae:2, aef:2, af:2, b:1, ba:1, bae:1, baef:1, be:1, bf:1, ..., e:2, ef:2, f:2} =SFPdb

SFP’=TFP∪ SFPdb ={a:7, aa:4, aac:3, ab:5,aba:4, abac:3, abc:3, ac:3, ae:4, af:4, b:6, ba:5, bac:3,bc:3, be:3, bf:3, c:3, e:4, f:4}

SFP’={a:7,aa:4, ab:5, aba:4, ae:4, af:4, b:6, ba:5, e:4, f:4} >4

TFP’=TFP∪ TFPdb ={a:7, aa:4, aac:3, ab:4, aba:4, abac:3, abc:3,ac:3, ae:4, af:4, b:6, ba:5, bac:3, bc:3, be:3, bf:3, c:3,e:4, f:4}.

Page 12: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

SSM : A Frequent Sequential Data

Stream Patterns Miner

C.I. Ezeife, Mostafa Monwar CIDM 2007

Page 13: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

IntroductionExisting work on mining frequent patterns on data streamsare mostly for non-sequential patterns. This paper proposesSSM-Algorithm (Sequential Stream Mining-algorithm), thatuses three types of data structures 1.D-List,2.PLWAP tree3.FSP-treeto handle the complexities of mining frequent sequentialpatterns in data streams.

Page 14: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

Example

Assume a continuous stream with rst stream batch B1 consisting of stream sequences as: [(abdac, abcac , babfae, afbacfcg)]. minimum support, s is 0.75tolerance error e, is 0.25

L1B1 = {a:4, b:4, c:3, f:2}. >=2

C1B1 = {a:4, b:4, c:3, d:1, e:1, f:2, g:1}

For the items (a, b, c, d, e, f, g) given as (2, 100, 0, 202, 10, 110, 99) are used in the hash function (mod 100)

Page 15: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

Example

Page 16: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

Example Step 3: Construct FSP-tree and insert all FPB1 into FSPtree without

pruning any items for the rst batch B1 to obtain Figure 5. FPB1 = {a:4, aa:4, aac:3, ab:4,aba:4, abac: 3, abc: 3, ac: 3, acc:2, af: 2,

afa: 2, b: 4, ba:4, bac: 3, bc: 3, c: 3, cc: 2, f: 2, fa: 2}.

Page 17: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

Example When the next batch, B2 of stream sequences, (abacg, a

bag, babag, abagh) arrives, 1.The SSM updating the D-List using the stream sequen

ces. 2.deletes all elements not meeting the minimum tolera

nce support of |D| * e (or 8 * 0.25 = 2) from the D-List, and keeps all items with|D| (s − e) = 8 0.5 = 4 frequenc∗ ∗y counts in the L1B2

3. construct PLW APB2 4.mined to generate the FSPB2 . This FSPB2 is used for

obtaining frequent sequential patterns on demand.

Page 18: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

Incremental Mining of Sequential Patterns over a Stream Sliding Window

Chin-Chuan Ho, Hua-Fu Li

*, Fang-Fei Kuo and Suh-Yin Lee

Page 19: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

introduction

For mining of sequential patterns over data streams, there are three models adopted by many researchers in ways of time spanning: landmark model

sliding window model, and damped window model IncSPAM= sliding window model+ SPAM

Page 20: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3. The IncSPAM Algorithm

Page 21: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3.1 A Brand-New Concept of Sliding Window

for Sequences

Page 22: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3.2 Customer Bit-Vector Array with SlidingWindow (CBASW)

Page 23: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3.2 Customer Bit-Vector Array with SlidingWindow (CBASW)

Page 24: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3.2 Customer Bit-Vector Array with SlidingWindow (CBASW)

Page 25: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3.2 Customer Bit-Vector Array with SlidingWindow (CBASW)

Page 26: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3.6 Incremental Mining of Sequential Patternsusing SPAM (IncSPAM)

Page 27: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3.6 Incremental Mining of Sequential Patternsusing SPAM (IncSPAM)

Page 28: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

3.7 Weight of Customer-Sequence

Page 29: Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

COMPARE優點 缺點

Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

有 incremental 觀念

1. 尚未應用在 stream 環境上面還是處理資料庫當中的資料2.Sequent pattern 事有順序性的不能將 pattern 加以排序在建樹當 pattern 不是這麼類似會導致樹會長的很大不能全部載入到記憶體

SSM : A Frequent Sequential Data Stream Patterns Miner

Trivial 1. 沒有 Incremental 觀念 , 每次讀取新的 block 資訊必須從新掃描計算其 support 並從新建樹2.Sequent pattern 事有順序性的不能將 pattern 加以排序在建樹 .當 pattern 不是這麼類似會導致樹會長的很大不能全部載入到記憶體

Incremental Mining of Sequential Patterns over a Stream Sliding Window

有 incremental 觀念

1. 儲存空間大 (item n transaction m window size w 就必須儲存 n*m*w 資訊 ) 又使用 SPAM 的樹狀架構 2. Incremental 觀念只用在計算 item support 上 , 當新資訊進來還是必須全部從新計算出所有 sequential pattern support3. 最後使用 decay 會造成少找