probabilistic skyline operator over sliding windows
DESCRIPTION
Probabilistic Skyline Operator over Sliding Windows. Wenjie Zhang University of New South Wales & NICTA, Australia. Joint work: Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA) Jeffrey Xu Yu (CUHK). Outline. Background Framework Algorithms Experiment Conclusion. Background. - PowerPoint PPT PresentationTRANSCRIPT
Probabilistic Skyline Operator over Sliding
WindowsWenjie Zhang
University of New South Wales & NICTA, AustraliaJoint work:
Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA)
Jeffrey Xu Yu (CUHK)
Outline
Background Framework Algorithms Experiment Conclusion
2
Background
Elements continuously arrive with occurrence probabilities
Problem : How to continuously compute skylines in a sliding window with size N (elements)?
11
22
33
55
44
0.1
0.1
0.4
0.1
0.8
66 0.5
1
Sliding window: N = 5
3
Background
Multi-criteria decision making regarding uncertain data: Online auction Financial market … …
4
Related work
Probabilistic skyline
computation
Uncertain stream
processing
Probabilistic skyline (VLDB07)
Probabilistic reverse skyline (SIGMOD08)
Probabilistic aggregates and sketches over uncertain streams (SIGMOD07, SODA07, PODS07)
Frequent items on uncertain streams (SIGMOD08)
Top-k queries over uncertain sliding window (VLDB08)
… …
5
Models and Problem Definition Model: DS is a stream of elements, each
element a is in a d-dimensional space and with an occurrence probability P(a) ( in (0, 1])
The skyline probability of an element a is:
Problem Definition: retrieving elements from the most recent N elements, with skyline probability no less than a given threshold q
aaDSasky aPaPaP','
))'(1()()(
6
Challenges and Contributions
Space efficiency: Contribution: Space reduction: O(N) to
O(lnd-1N)
Time efficiency Contribution: R-tree based efficient
incremental algorithms
7
Outline
Background and Preliminaries Framework Algorithms Experiment Conclusion
8
Framework: what to keep ?
11
22
33
55
44
0.1
0.1
0.4
0.1
0.8
Pnew (2) < q , element 2 will never become
skyline in the window
window size N : 5 probability threshold: 0.5
)()( aPaPsky
Pold (2) = 1 – P(1)
9
)(aPold )(aPnew
Pnew(2) = (1 – P(3)) * (1 – P(4))
Framework: what to keep ?
Candidate set SN,q: Correctness: (1) no missing skyline points
(2) no false hits to determine SN, q
(3) no false positive to determine skyline results
(4) no false negative to determine skyline results
--- probability based on SN,q may not be accurate, but
satisfies the threshold requirement.
qaPnew )(
10
Framework
Space required for SN,q: SN,q is the minimum information to be
maintained to get a correct answer.
11
44
22
0.3
0.8
0.4
33 0.9
window size N : 4 probability threshold q: 0.5
11
Psky(3) = 0.9 * (1 – 0.4) * (1- 0.3) < q
1
2
Psky(3) = 0.9 > q
Space of Candidate Set
Theorem: Candidate Set requires a poly-logarithmic space on average case regarding uniform distributions, O(f(q)lnd-
1N).
12
Outline
Background and Preliminaries Framework Algorithms Experiment Conclusion
13
Algorithms
We maintain two R-trees R1: SKYN,q --- skylines
R2: SN,q - SKYN,q --- candidates – skylines
14
Algorithms
1(.1)
2(.1)
3(.4)
4(.1)
5(.8)
6(.8)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
window size N : 13 probability threshold q: 0.2
15
not in SN,qR1: SKYN,q
R2: SN,q – SKYN,q
Algorithms
New element arrives Check Psky & Pnew on R1
Check Pnew on R2 Handling elements with Pnew < q
Old element expires Update Pold
Check Psky on R2
16
Algorithms: new elements arrives
2(.1)
3(.4)
4(.1)
5(.8)6(.8)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
R2: SN,q - SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
Before update:
Pnew : (1, 1)
Psky : (0.8, 0.8)
global Pnew = 1 – 0.2
After update:
global Pnew *= 1- 0.8
Delete from R1
17
Delete an Entry:
Algorithms: new elements arrives
2(.1)
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
R2: SN,q - SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
Before update:
Pnew : (1, 1)
Psky : (0.24, 0.6)
global Pnew = 1
After update:
global Pnew *= 1 – 0.8
min Pnew = 0.2 ≥ q
max Psky = 0.12 < q
Move from R1 to R2
18
Move an Entry from R1 to R2:
Algorithms: new elements arrives
2(.1)
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
R2: SN,q - SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
Before update:
Pnew : (0.9, 1)
global Pnew = 1
After update:
global Pnew *= 1 – 0.8
min Pnew < q;
max Pnew ≥ q
Drill down and delete 2
19
Algorithms: new elements arrives
2(.1)
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
R2: SN,q - SKYN,q
Update Pold of 12 & 13
global Pold /= (1 – 0.1)
20
Update Pold:
Algorithms: new elements arrives
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
R2: SN,q - SKYN,q
Insert new element:
Pnew = 1.
compute Psky
21
Algorithm: old element expires Delete it from R1 or R2. Update Pold of remaining elements:
Record global Pold on intermediate entries fully dominated by it
Check Psky after update
22
Algorithms: old element expires
3(.4)
4(.1)
7(.6)
8(.2)
9(.5)
10(.2)
11(.6)
12(.1)
13(.1)
R1: SKYN,q
R2: SKYN,q
window size N : 13 probability threshold q: 0.2
14(0.8)
Pold (7) /= 1 – P(3)
global Pold /= 1 – P(4)
23
Algorithms: handling multiple thresholds Continuous queries
Users specify k probability thresholds q1, …, qk. (qi < qi-1)
Solution: instead of maintaining R1, we maintain R1, …, Rk, each corresponding to a confidence value.
Ad-hoc queries Users issue a query: retrieve skylines with
probability at least q’ (q’ ≥ qk) Solution: find an Ri with qi ≤ q’ < qi-1. Then all
elements in {Rj: j < i -1} are results. We search Ri-1 to output qualified skylines
24
Experiment
Data set: Real: stock transactions. 2-d. probability
assigned randomly. Size: 2 million Synthetic: spatial location (independent or
anti-correlated); probability (uniform or normal); 2d to 5d; 2 million
Default values: p : 0.3; d: 3; N : 1M; spatial distribution: anti-correlated; probability: uniform;
25
Experiment: space
0.1% to the sliding window size for 2-d data; save around 89% space even for 5-d data.
26
Experiment: space
Size of SN,q deceases with the increase of Pu, while size of SKYN,q increases with it.
27
Experiment: space28
Experiment: time29
Experiment: time
Maintenance time increases with # probability thresholds; query time deceases with it.
30
Conclusion
We characterize a candidate set with minimum size and propose time efficient techniques.
We extend the framework to handle multiple thresholds.
31
Thanks !
32