reporter: yu-ch lo

34
Huiping Cao, Nikos Mamoulis, and David W. Che ung Department of Computer Science The University of Hong Kong Pokfulam Road, Hong Kong {hpcao, nikos, dcheung}@cs.hku.hk Reporter: yu-ch Lo Mining Frequent Spatio-temporal Mining Frequent Spatio-temporal Sequential Patterns Sequential Patterns

Upload: analu

Post on 29-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Mining Frequent Spatio-temporal Sequential Patterns. Huiping Cao, Nikos Mamoulis, and David W. Cheung Department of Computer Science The University of Hong Kong Pokfulam Road, Hong Kong { hpcao, nikos, dcheung } @cs.hku.hk. Reporter: yu-ch Lo. Outline. Introduction Related work - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reporter: yu-ch Lo

Huiping Cao, Nikos Mamoulis, and David W. CheungDepartment of Computer Science

The University of Hong KongPokfulam Road, Hong Kong

{hpcao, nikos, dcheung}@cs.hku.hk

Reporter: yu-ch Lo

Mining Frequent Spatio-temporal SequeMining Frequent Spatio-temporal Sequential Patternsntial Patterns

Page 2: Reporter: yu-ch Lo

Outline

• Introduction• Related work• Spatio-temporal sequential patterns• Motivation• Problem definition• Growing• Experiments• Conclusion

Page 3: Reporter: yu-ch Lo

Introduction

• The movement of an object (i.e., trajectory) can be described by a sequence of spatial locations sampled at consecutive timestamps (e.g., with the use of Global Positioning System (GPS) devices).

Page 4: Reporter: yu-ch Lo

Introduction

• The main contributions of this paper are:The main contributions of this paper are:• (i) We propose a model for spatio-temporal sequ

ential patterns mining, based on appropriate definitions for pattern elements and pattern instances.

• (ii) We present an effective method for extracting pattern elements.

• (iii) We provide efficient pattern mining algorithms for discovering longer patterns.

Page 5: Reporter: yu-ch Lo

Related Work

• A fixed sliding window wA fixed sliding window w is used to extract segments (i.e., subsequences) in the event series, and the contribution of every segment to each candidate episode’s frequency is counted.

a a b b c d e f ga a b b c d e f g

W=5

(a_b_c, _ab_c, a_ _bc, _a_bc)

Page 6: Reporter: yu-ch Lo

Spatio-temporal Sequential Patterns

• A spatio-temporal sequence S is a list of locations,(x1, y1, t1), (x2, y2, t2), . . . , (xn, yn, tn), where ti represents the timestamp of location (xi, yi) (1 ≤ i ≤ n).

(1,1,0)

(2,3,1)

(4,2,2)

(6,3,3)

X

Y

Page 7: Reporter: yu-ch Lo
Page 8: Reporter: yu-ch Lo

Motivation

• A naive method is to use a regular grid (or some predefined spatial decomposition) to divide the space into regions by taking a user-defined parameter G

Grid I:c2c4c8c9c6c2 . . .

Grid II:c2c1c4c7c8c9c6 . . .

Page 9: Reporter: yu-ch Lo

Motivation

• Although intuitive, this method has two problems:

• i) We lose the information on how the object moves inside a cell, if the space decomposition is coarse.

• ii) For two instances of a pattern, the locations may not fall into the same cell.

Page 10: Reporter: yu-ch Lo

Problems

i) ii)

a

b

d

c

Page 11: Reporter: yu-ch Lo

Problem Definition

• A segmentsegment sij in a spatio-temporal sequence S (1 ≤ i < j ≤ n) is a contiguous subsequence of S, starting from (xi, yi, ti) and ending at (xj , yj , tj).

• Given sij , we define its representative line segment lij.

S

lij

Page 12: Reporter: yu-ch Lo

Problem Definition

• Let ε be a distance error threshold, sij complies with ~ lij with respect to ǫ and is denoted as sij ∝ ε lij , if dist((xk, yk), lij ) ≤ ε

• for all k(i ≤ k ≤ j), where dist((xk, yk), l ) is the distance between (xk, yk) and line segment l.

Page 13: Reporter: yu-ch Lo

Problem Definition

• When sij ∝ε lij , each point (xk, yk), i ≤ k ≤ j, in sij can be projected to a point (x′k,y′k) on ~ lij . (x′k, y′k) implicitly denotes the projection of (xk, yk) to lij .

Page 14: Reporter: yu-ch Lo

Problem Definition

L2L1 (ii)

<= f × max(lij .len, lgh.len)

(i)

<= θ

Page 15: Reporter: yu-ch Lo

Problem Definition

S1

L1

L2S2

∀(x′k, y′k) l∈ ij

Page 16: Reporter: yu-ch Lo

Problem Definition

Page 17: Reporter: yu-ch Lo

Problem Definition

Page 18: Reporter: yu-ch Lo

Problem Definition

• A contiguous subsequence of Ss, sisi+1 . . . si+q−1, is a pattern instance for P: r1r2 . . . rq if j(1 ≤ j ≤ q), if the representative lin∀e segment for segment si+j−1 is similarsimilar and closeclose to the central line segment of region rj .

Page 19: Reporter: yu-ch Lo

DP(Douglas-Peucker) Algorithm

• Our idea is to transform S to Ss using such a technique.

• The DP (Douglas-Peucker) algorithm [3] is a classical top down approach for this problem.

• We employ DP method because it has been proved to be the best algorithm in choosing splitting points [12].

Page 20: Reporter: yu-ch Lo

DP(Douglas-Peucker) Algorithm

B A

D

C

E

> threshold

B A

D

C

E

< threshold

< threshold

C

A

E

Threshold = ε

Page 21: Reporter: yu-ch Lo

Growing

• Discovering frequent singular patterns from Ss is a hard problem, since in the worst-case, case, all combinations of segments in Ss have to be considered as candidate. To expedite the process, we employ a heuristic, Growing.

Page 22: Reporter: yu-ch Lo

Growing

• Let Segs be a set initially containing all the segments in Ss.

• It selects the segment s with median length, i.e., the median of the lengths of the segments in Segs, as seed for the initial spatial region r.

• Then, r is grown by merging other gments in Segs through filtering and verification steps.

Page 23: Reporter: yu-ch Lo

Growing -filtering

Page 24: Reporter: yu-ch Lo
Page 25: Reporter: yu-ch Lo

ls

Page 26: Reporter: yu-ch Lo

Mining Using Substring Tree

• Create a substring tree structure

>> >> Let SR be r1 r2 r3 r4 r1r3 r4 r2 r3 r4 r1 r2 r3 r4

ROOT

1 r1

1 r2

1 r3

1 r4

1 r3

1 r4

1 r2

1 r3

1 r4

2 r1

1 r2

1 r2

3 r4

2 r1

1 r3 1 r2

1 r3

1 r2

1 r3

1 r2

1 r3

1 r4

1 r1

23

2

2

2

2

2

2

2

2

2

3

3

3

4

4

Page 27: Reporter: yu-ch Lo

Mining Using Substring Tree

• In the above example, there are initially four elements(root’s children.) in the stack.

• Let min sup = 2.

Page 28: Reporter: yu-ch Lo

r3(4)

r3r4(4)

Page 29: Reporter: yu-ch Lo

3

r3r4(4) r3r4(4)r3r4r2(3)

r3r4r1(2)

Page 30: Reporter: yu-ch Lo

Experiments

• Real datasets: The real data contain tracked bus movements in Patras, Greece. Each sequence is the movement of a bus in a single day.

• Locations were sampled every 30 seconds

• The series length varies in the range between 1000 to 7000..

Page 31: Reporter: yu-ch Lo

Experiments• we tune the parameters to ε = 20 (map size is 100 × 10

0), f = 0.2, θ = 0.3 radian, and min sup = 3.

Page 32: Reporter: yu-ch Lo

Experiments

Page 33: Reporter: yu-ch Lo

Experiments

Page 34: Reporter: yu-ch Lo

Conclusion

• In this paper, we modeled the problem of mining sequential patterns from spatio-temporal data by considering both spatial and temporal information.

• In addition, we employed special properties of the problem (spatial connectivity,closeness) and a newly proposed substring tree to accelerate search for longer patterns.