probabilistic skyline operator over sliding windows

32
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA) Jeffrey Xu Yu (CUHK)

Upload: maxime

Post on 09-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Probabilistic Skyline Operator over Sliding Windows. Wenjie Zhang University of New South Wales & NICTA, Australia. Joint work: Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA) Jeffrey Xu Yu (CUHK). Outline. Background Framework Algorithms Experiment Conclusion. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Probabilistic Skyline Operator over Sliding Windows

Probabilistic Skyline Operator over Sliding

WindowsWenjie Zhang

University of New South Wales & NICTA, AustraliaJoint work:

Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA)

Jeffrey Xu Yu (CUHK)

Page 2: Probabilistic Skyline Operator over Sliding Windows

Outline

Background Framework Algorithms Experiment Conclusion

2

Page 3: Probabilistic Skyline Operator over Sliding Windows

Background

Elements continuously arrive with occurrence probabilities

Problem : How to continuously compute skylines in a sliding window with size N (elements)?

11

22

33

55

44

0.1

0.1

0.4

0.1

0.8

66 0.5

1

Sliding window: N = 5

3

Page 4: Probabilistic Skyline Operator over Sliding Windows

Background

Multi-criteria decision making regarding uncertain data: Online auction Financial market … …

4

Page 5: Probabilistic Skyline Operator over Sliding Windows

Related work

Probabilistic skyline

computation

Uncertain stream

processing

Probabilistic skyline (VLDB07)

Probabilistic reverse skyline (SIGMOD08)

Probabilistic aggregates and sketches over uncertain streams (SIGMOD07, SODA07, PODS07)

Frequent items on uncertain streams (SIGMOD08)

Top-k queries over uncertain sliding window (VLDB08)

… …

5

Page 6: Probabilistic Skyline Operator over Sliding Windows

Models and Problem Definition Model: DS is a stream of elements, each

element a is in a d-dimensional space and with an occurrence probability P(a) ( in (0, 1])

The skyline probability of an element a is:

Problem Definition: retrieving elements from the most recent N elements, with skyline probability no less than a given threshold q

aaDSasky aPaPaP','

))'(1()()(

6

Page 7: Probabilistic Skyline Operator over Sliding Windows

Challenges and Contributions

Space efficiency: Contribution: Space reduction: O(N) to

O(lnd-1N)

Time efficiency Contribution: R-tree based efficient

incremental algorithms

7

Page 8: Probabilistic Skyline Operator over Sliding Windows

Outline

Background and Preliminaries Framework Algorithms Experiment Conclusion

8

Page 9: Probabilistic Skyline Operator over Sliding Windows

Framework: what to keep ?

11

22

33

55

44

0.1

0.1

0.4

0.1

0.8

Pnew (2) < q , element 2 will never become

skyline in the window

window size N : 5 probability threshold: 0.5

)()( aPaPsky

Pold (2) = 1 – P(1)

9

)(aPold )(aPnew

Pnew(2) = (1 – P(3)) * (1 – P(4))

Page 10: Probabilistic Skyline Operator over Sliding Windows

Framework: what to keep ?

Candidate set SN,q: Correctness: (1) no missing skyline points

(2) no false hits to determine SN, q

(3) no false positive to determine skyline results

(4) no false negative to determine skyline results

--- probability based on SN,q may not be accurate, but

satisfies the threshold requirement.

qaPnew )(

10

Page 11: Probabilistic Skyline Operator over Sliding Windows

Framework

Space required for SN,q: SN,q is the minimum information to be

maintained to get a correct answer.

11

44

22

0.3

0.8

0.4

33 0.9

window size N : 4 probability threshold q: 0.5

11

Psky(3) = 0.9 * (1 – 0.4) * (1- 0.3) < q

1

2

Psky(3) = 0.9 > q

Page 12: Probabilistic Skyline Operator over Sliding Windows

Space of Candidate Set

Theorem: Candidate Set requires a poly-logarithmic space on average case regarding uniform distributions, O(f(q)lnd-

1N).

12

Page 13: Probabilistic Skyline Operator over Sliding Windows

Outline

Background and Preliminaries Framework Algorithms Experiment Conclusion

13

Page 14: Probabilistic Skyline Operator over Sliding Windows

Algorithms

We maintain two R-trees R1: SKYN,q --- skylines

R2: SN,q - SKYN,q --- candidates – skylines

14

Page 15: Probabilistic Skyline Operator over Sliding Windows

Algorithms

1(.1)

2(.1)

3(.4)

4(.1)

5(.8)

6(.8)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

window size N : 13 probability threshold q: 0.2

15

not in SN,qR1: SKYN,q

R2: SN,q – SKYN,q

Page 16: Probabilistic Skyline Operator over Sliding Windows

Algorithms

New element arrives Check Psky & Pnew on R1

Check Pnew on R2 Handling elements with Pnew < q

Old element expires Update Pold

Check Psky on R2

16

Page 17: Probabilistic Skyline Operator over Sliding Windows

Algorithms: new elements arrives

2(.1)

3(.4)

4(.1)

5(.8)6(.8)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

R2: SN,q - SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

Before update:

Pnew : (1, 1)

Psky : (0.8, 0.8)

global Pnew = 1 – 0.2

After update:

global Pnew *= 1- 0.8

Delete from R1

17

Delete an Entry:

Page 18: Probabilistic Skyline Operator over Sliding Windows

Algorithms: new elements arrives

2(.1)

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

R2: SN,q - SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

Before update:

Pnew : (1, 1)

Psky : (0.24, 0.6)

global Pnew = 1

After update:

global Pnew *= 1 – 0.8

min Pnew = 0.2 ≥ q

max Psky = 0.12 < q

Move from R1 to R2

18

Move an Entry from R1 to R2:

Page 19: Probabilistic Skyline Operator over Sliding Windows

Algorithms: new elements arrives

2(.1)

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

R2: SN,q - SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

Before update:

Pnew : (0.9, 1)

global Pnew = 1

After update:

global Pnew *= 1 – 0.8

min Pnew < q;

max Pnew ≥ q

Drill down and delete 2

19

Page 20: Probabilistic Skyline Operator over Sliding Windows

Algorithms: new elements arrives

2(.1)

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

R2: SN,q - SKYN,q

Update Pold of 12 & 13

global Pold /= (1 – 0.1)

20

Update Pold:

Page 21: Probabilistic Skyline Operator over Sliding Windows

Algorithms: new elements arrives

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

R2: SN,q - SKYN,q

Insert new element:

Pnew = 1.

compute Psky

21

Page 22: Probabilistic Skyline Operator over Sliding Windows

Algorithm: old element expires Delete it from R1 or R2. Update Pold of remaining elements:

Record global Pold on intermediate entries fully dominated by it

Check Psky after update

22

Page 23: Probabilistic Skyline Operator over Sliding Windows

Algorithms: old element expires

3(.4)

4(.1)

7(.6)

8(.2)

9(.5)

10(.2)

11(.6)

12(.1)

13(.1)

R1: SKYN,q

R2: SKYN,q

window size N : 13 probability threshold q: 0.2

14(0.8)

Pold (7) /= 1 – P(3)

global Pold /= 1 – P(4)

23

Page 24: Probabilistic Skyline Operator over Sliding Windows

Algorithms: handling multiple thresholds Continuous queries

Users specify k probability thresholds q1, …, qk. (qi < qi-1)

Solution: instead of maintaining R1, we maintain R1, …, Rk, each corresponding to a confidence value.

Ad-hoc queries Users issue a query: retrieve skylines with

probability at least q’ (q’ ≥ qk) Solution: find an Ri with qi ≤ q’ < qi-1. Then all

elements in {Rj: j < i -1} are results. We search Ri-1 to output qualified skylines

24

Page 25: Probabilistic Skyline Operator over Sliding Windows

Experiment

Data set: Real: stock transactions. 2-d. probability

assigned randomly. Size: 2 million Synthetic: spatial location (independent or

anti-correlated); probability (uniform or normal); 2d to 5d; 2 million

Default values: p : 0.3; d: 3; N : 1M; spatial distribution: anti-correlated; probability: uniform;

25

Page 26: Probabilistic Skyline Operator over Sliding Windows

Experiment: space

0.1% to the sliding window size for 2-d data; save around 89% space even for 5-d data.

26

Page 27: Probabilistic Skyline Operator over Sliding Windows

Experiment: space

Size of SN,q deceases with the increase of Pu, while size of SKYN,q increases with it.

27

Page 28: Probabilistic Skyline Operator over Sliding Windows

Experiment: space28

Page 29: Probabilistic Skyline Operator over Sliding Windows

Experiment: time29

Page 30: Probabilistic Skyline Operator over Sliding Windows

Experiment: time

Maintenance time increases with # probability thresholds; query time deceases with it.

30

Page 31: Probabilistic Skyline Operator over Sliding Windows

Conclusion

We characterize a candidate set with minimum size and propose time efficient techniques.

We extend the framework to handle multiple thresholds.

31

Page 32: Probabilistic Skyline Operator over Sliding Windows

Thanks !

32