an efficient algorithm for mining time interval-based patterns in large databases yi-cheng chen,...

26
An Efficient Algorithm for Mining Time Interval- based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh- Yin Lee Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 {ejen.cs95g, perrys0620.cs96g}@nctu.edu.tw [email protected] [email protected] CIKM, 2010

Upload: nikki-tickner

Post on 31-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

An Efficient Algorithm for Mining Time Interval-

based Patterns in Large Databases

Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department of Computer Science

National Chiao Tung University Hsinchu, Taiwan 300

{ejen.cs95g, perrys0620.cs96g}@nctu.edu.tw [email protected] [email protected]

CIKM, 2010

Page 2: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

OUTLINE1. INTRODUCTION 2.PROBLEM DEFINITION 3.INCISION STRATEGY 4.COINCIDENCE REPRESENTATION5.CTMiner ALGORITHM 6.EXPERIMENTAL RESULTS 7.CONCLUSION AND FUTURE

WORK

Page 3: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

1. INTRODUCTION All related researches in this

domain are based on Allen’s temporal logics.

Which there are 13 temporal relations between any two event intervals .

Page 4: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

1. INTRODUCTION

Compare with previous works :Kam et al. - hierarchical representation.Hoppner - scan database by sliding

window.Papapetrou - Hybrid-DFS algorithm.Wu et al. - TPrefixSpan.Patel et al. - Augmented

Representation(By additional counting information ), and IEMiner.

Page 5: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

1. INTRODUCTION

Propose :Incision strategyCoincidence representationCTMiner (Coincidence Temporal

Miner)

Page 6: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

2.PROBLEM DEFINITION

Event interval and event sequenceE = {e1, e2,…, ek} be the set of event

symbols.(ei, si, fi), ei ∈ E,

si , fi ,are time points, si < fi

Event start : ei.ts

Event finish : ei.tf

{(e1, s1, f1), (e2, s2, f2), …, (en, sn, fn)} where si ≤ si+1 and si< fi

Page 7: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

2.PROBLEM DEFINITION

Temporal databaseDatabase D = {r1, r2, …, rm}, each

record ri, where 1≤ i≤ mA record ri consists of a sequence-id and

an event interval(start time and finish time).

Records in the database D with the same client-id are grouped together.

Database D can be viewed as a collection of event sequences.

Page 8: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

2.PROBLEM DEFINITION

Time set and time sequenceAn event sequence q = {(e1, s1, f1), (e2, s2,

f2), …, (en, sn, fn)}The set T ={s1, f1, s2, f2, …, si, fi,…, sn, fn}

is called a time set corresponding to sequence q.

Order all the elements in T and eliminate redundant element, we got sequence Ts.sequence Ts = {t1, t2, t3, …, tk}where ti ∈ T , ti < ti+1.

Page 9: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

2.PROBLEM DEFINITION Event slice

Page 10: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

2.PROBLEM DEFINITION Event slice

4 event intervals in sequence 2 (en, sn, fn)(B,1,5),(D,8,4),(E,10,13),(F,10,13)

Corresponding time set T={1,5,8,14,10,13,10,13}

{s1, f1, s2, f2, s3, f3, s4, f4 }Time sequence Ts ={1,5,8,10,13,14} {t1, t2, t3, …, tk}

Page 11: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

2.PROBLEM DEFINITION

Event sliceLet set L = { +, -, *, Φ},

a set of event sequences Q = {q1, q2, …, qi,…}, qi = {(e1, s1, f1), …, (ej, sj, fj) , … (en, sn, fn)}

Page 12: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

2.PROBLEM DEFINITION Event slice

start slice D + = (D, 8, 10)intermediate slice D* = (D, 10, 13)finish slice D - = (D, 13, 14)

The event interval B has only one intact slice B = (B, 1, 5)

Page 13: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

3.INCISION STRATEGY

Page 14: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

3.INCISION STRATEGY Incision example

Page 15: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

3.INCISION STRATEGY Incision example

The incision strategy can totally avoid the generation of intermediate slices. By trimming the intermediate slices, we can still express the relationship between any two intervals correctly.

Page 16: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

4.COINCIDENCE REPRESENTATION

Group simultaneously occurring slices together to form the coincidences.

Concatenation with all coincidences can describe an event sequence effectively.

Simplify the processing of complex pairwise relationships between all intervals efficiently.

Page 17: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

4.COINCIDENCE REPRESENTATION

Page 18: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

4.COINCIDENCE REPRESENTATION

Good scalabilityNonambiguity Simple is good Compact space usage

Page 19: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

5.CTMiner ALGORITHM

Page 20: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

5.CTMiner ALGORITHM

min_sup = 2

Page 21: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

5.CTMiner ALGORITHM

Page 22: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

5.CTMiner ALGORITHM

Page 23: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

6.EXPERIMENTAL RESULTS

Runtime performance on synthetic data sets

Page 24: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

6.EXPERIMENTAL RESULTS

Real world dataset analysis

Page 25: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

7.CONCLUSION AND FUTURE WORK

Coincidence representation is nonambiguous and has several advantages over existing representations .

Page 26: An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department

7.CONCLUSION AND FUTURE WORK

Further : mining closed and maximal temporal patterns, incremental temporal patterns mining, and the research of method toward data stream.