top- k queries on uncertain data

15
Top-k Queries on Uncertain Data 指指指指 指指指 指指 指指指 指指指 97753034

Upload: omana

Post on 23-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Top- k Queries on Uncertain Data. 指導教授:陳良弼 老師 報告者:鄧雅文 97753034. Outline. Introduction Related Work Problem Formulation Future Work. Introduction. Top- k query on certain data Rank results according to a user-defined score Important for explore large databases - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Top- k  Queries on Uncertain Data

Top-k Queries on Uncertain Data

指導教授:陳良弼 老師報告者:鄧雅文 97753034

Page 2: Top- k  Queries on Uncertain Data

Introduction Related Work Problem Formulation Future Work

Outline

Page 3: Top- k  Queries on Uncertain Data

Top-k query on certain data◦ Rank results according to a user-defined score◦ Important for explore large databases◦ E.g., top-2 = {T1, T2}

Introduction

TID PID ScoreT1 A 100T2 B 90T3 C 80T4 D 70

Page 4: Top- k  Queries on Uncertain Data

Uncertain database◦ How to define top-k on uncertain data?◦ Mutually exclusive rules

E.g., T1♁T4

Introduction (cont.)

TID PID Score Pr.T1 A 100 0.2T2 B 90 0.9T3 C 80 0.6T4 A 70 0.8… … … …

Page 5: Top- k  Queries on Uncertain Data

C. C. Aggarwal and P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. In TKDE, 2009.◦ Causes:

Sensor networks, privacy, trajectories prediction…◦ The main areas of research on the uncertain data:

Modeling of uncertain data Uncertain data management

Top-k query, range query, NN query… Uncertain data mining

Clustering, classification, frequent pattern, outliers…

Related Work

Page 6: Top- k  Queries on Uncertain Data

M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, 2007.◦ Possible Worlds

Related Work (cont.)

Page 7: Top- k  Queries on Uncertain Data

◦ U-Topk query Return k tuples that can

co-exist in a possible worldwith the highest probability

E.g., {T1, T2} as U-Top2◦ U-kRanks query

Return k tuples each of whichis a clear winner in its rankover all possible worlds

E.g., {T2, T6} as U-2Ranks

Related Work (cont.)

Page 8: Top- k  Queries on Uncertain Data

M. Hua, J. Pei, W. Zhang, X. Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In SIGMOD, 2008.◦ PT-k query

Return a set of all tupleswhose top-k probabilityvalues are at least p

E.g., {T1, T2, T5} as PT-2(with p=0.4)

Related Work (cont.)

Page 9: Top- k  Queries on Uncertain Data

T. Ge, S. Zdonik, and S. Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In SIGMOD, 2009.◦ The tradeoff between reporting high-scoring

tuples and tuples with a high probability of being in the top-k

◦ Return a number of typical vectors that efficiently sample the distribution of all potential top-k tuple vectors

Related Work (cont.)

Page 10: Top- k  Queries on Uncertain Data

Example:◦ In an International Tenpin Bowling Championship,

the events include single, double, and trio. Due to the budget, the coach can only choose 3 players to attend. Therefore, we hope these 3 players can have relatively high probability to perform well over these 3 types of events.

Problem Formulation

Page 11: Top- k  Queries on Uncertain Data

◦ U-Top3={T2, T5, T6}

◦ But U-Top2={T1, T2}, U-Top1={T1}◦ How about also considering {T1, T2, T5} as top-3?

Problem Formulation (cont.)

TID Player Pr.T1 A 0.4100 T2 D 0.6200 T3 B 0.1400 T4 C 0.3400 T5 C 0.6600 T6 B 0.8600 T7 D 0.3800 T8 A 0.5900

Possible World Pr. Possible World Pr.

PW1 T1, T2, T3, T4 0.0121 PW9 T2, T3, T4, T8 0.0174

PW2 T1, T2, T3, T5 0.0235 PW10 T2, T3, T5, T8 0.0338

PW3 T1, T2, T4, T6 0.0743 PW11 T2, T4, T6, T8 0.1070

PW4 T1, T2, T5, T6 0.1443 PW12 T2, T5, T6, T8 0.2076

PW5 T1, T3, T4, T7 0.0074 PW13 T3, T4, T7, T8 0.0107

PW6 T1, T3, T5, T7 0.0144 PW14 T3, T5, T7, T8 0.0207

PW7 T1, T4, T6, T7 0.0456 PW15 T4, T6, T7, T8 0.0656

PW8 T1, T5, T6, T7 0.0884 PW16 T5, T6, T7, T8 0.1273

Page 12: Top- k  Queries on Uncertain Data

We choose the answers of a top-k query not only depending on the probability (P) but also on the confidence (C).◦ Confidence: to express the top-(k-1) probabilities of

the sets formed by k-1 tuples of this possible top-k answer E.g., k=3

{T1, T2, T3} as a possible top-k with P=0.0356C is composed in some way of  Pr({T1, T2}) to be top-2=0.2542 and its confidence,  Pr({T1, T3}) to be top-2=0.0218 and its confidence,  Pr({T2, T3}) to be top-2=0.0512 and its confidence

Problem Formulation (cont.)

Page 13: Top- k  Queries on Uncertain Data

Since every possible top-k answer has two features—probability (P) and confidence (C), we only return those non-dominated ones as a result set.◦ E.g.,  {T1, T3, T5}: P=0.8, C=0.4

  {T1, T4, T7}: P=0.5, C=0.7  {T2, T6, T7}: P=0.3, C=0.2 this will not be returned

Problem Formulation (cont.)

Page 14: Top- k  Queries on Uncertain Data

Formulate the confidence function Find an algorithm to generate the result set Try to calculate the confidence in an

efficient way Carry out an empirical study on datasets

Future Work

Page 15: Top- k  Queries on Uncertain Data

Thank you!