t oward p ractical q uery p ricing w ith q uery m arket paraschos koutris prasang upadhyaya...

27
TOWARD PRACTICAL QUERY PRICING WITH QUERYMARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington SIGMOD 2013

Upload: stanley-tate

Post on 19-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

TOWARD PRACTICAL QUERY PRICING WITH QUERYMARKET

Paraschos KoutrisPrasang UpadhyayaMagdalena BalazinskaBill HoweDan Suciu

University of WashingtonSIGMOD 2013

Page 2: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

MOTIVATION

• Data is increasingly sold and bought on the web• Websites that sell data:

– Xignite (financial)– Gnip (social)

• Data marketplace services:– Windows Azure Marketplace – Infochimps– Factual – DataMarket

2

Page 3: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

A PRICING SCENARIO (1)

3

English-German dictionary T

PRICING SCHEMES

Sell the whole table T for a fixed price• Q: translate only the word “thanks”• The user pays for redundant information

Price per output tuple• Q: Does the word “thanks” translate to “Auto” ?• An empty result still carries information

english german

thanks Danke

car Auto

day Tag

road Strasse

Road Weg

… …

Page 4: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

A PRICING SCENARIO (2)

4

English-German dictionary T Word Frequency Stats UF

word frequency genre rank

rock 0.025 music 20

pop 0.030 music 10

database 0.001 science 1453

… … .. …

• Current systems do not sell queries that combine datasets• Queries issued by a user may have overlapping content

Q1: Return all translations to German of top 10 words in the genre “music”

Q2: Return all translations to German of top 20 words in the genre “music”

english german

thanks Danke

car Auto

day Tag

road Strasse

Road Weg

… …

Page 5: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

HOW TO PRICE DATA

5

english german

thanks Danke

car Auto

day Tag

road Strasse

road Weg

… …

English-German dictionary T

p(σT.english=‘thanks’)=$0.1

p(σT.english=‘day’)=$0.1

p(σT.english=‘road’)=$0.15

p(σT.english=‘cat’)=$0.05

Price points• selection queries on single table• exhaust the possible values (ColA) of some attribute A• may select on values not in the active domain

p(σT.english=‘car’)=$0.1 p(σT.german=‘Auto’)=$0.5

Page 6: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

QUERYMARKET: CONTRIBUTIONS

• A formal pricing framework where:– sellers specify a set of price points as selection queries– buyers can purchase any query on the database– the system automatically computes the price of the query

• Support efficient computation of prices for a large class of SQL queries

• Support the necessary functionality for a marketplace:– Pricing queries with overlapping information content– Database updates– Revenue sharing among different sellers?

6

Page 7: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

OUTLINE

1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing

7

Page 8: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

THE PRICING FRAMEWORK

• The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), … }

• A buyer can buy any query Q • The system will compute priceD

S(Q)

Seller

Price points

Buyer Q(D) ?

Pricing System+

Database D

priceDS(Q)

8

[Koutris et al., PODS 2012]

Page 9: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

PROPERTIES OF PRICES

Arbitrage-free: Given D, priceD(Q) is arbitrage-free if for all views V1, …, Vk that determine Q:

priceD(Q) ≤ priceD(V1) + … + priceD(Vk)

Discount-free: priceD(Q) must not offer additional discounts except for the explicit price points defined by the seller

9

We say that the views V1,…, Vk determine Q if one can compute Q(D) from V1(D),…, Vk(D) without access to D

Page 10: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

THE PRICING FORMULA

1010

Arbitrage-Price:• The price of the cheapest set of views from price points

S that determine the query Q• unique + arbitrage-free + discount-free + agrees with

price points

A

a1

A B

a1 b

a2 b

Table R Table SColA = { a1, a2, a3 }ColB = { b }

price = $1 price = $2 price = $3

• {σ[R.A=a1], σ[S.B=b] } determines Q • cost = 1 + 3 = 4

• {σ[R.A=a1], σ[S.A=a1] } also determines Q• cost = 1 + 2 = 3 (cheapest possible)

Q(y) = R(x),S(x,y)

Page 11: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

OUTLINE

1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing

11

Page 12: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

COMPUTING THE PRICE

1212

• The problem of computing the arbitrage price even for SELECT-PROJECT-JOIN queries is coNP-complete

• For some queries, the price can be computed fast:• Selections, joins w/o projection

• We describe pricing as an Integer Linear Program (ILP) and then use fast ILP solvers (e.g. GLPK, CPLEX)

• Classes of queries supported:• Selections/Projections/Joins• Unions• User-Defined Functions (UDF)• Bundles of queries

Page 13: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

ILP CONSTRUCTION (1)

1313

• Price the query Q(x,y) = R(x), S(x,y)• Introduce a {0/1} variable x[attribute,value] for each

price point: x[R.A, a2], x[S.A, a1], x[S.B, b], …

A

a1

A B

a1 b

a2 b

Table R Table S ColA = { a1, a2, a3 }ColB = { b }

price = $1 price = $2 price = $3

Page 14: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

ILP CONSTRUCTION (2)

1414

• Minimize (independent of the query):price = x[R.A,a1] + x[R.A,a2] + x[R.A,a3] +2x[S.A,a1] + 2x[S.A,a2] + 2x[S.A,a3] +3x[S.B,b]

• Constraints:• (a1,b) in Q: x[R.A,a1] ≥ 1 x[S.A,a1] + x[S.B,b] ≥ 1• (a2,b) not in Q: x[R.A,a2] ≥ 1 • (a3,b) not in Q: x[R.A,a3] + x[S.A,a3] + x[S.B,b] ≥ 1

A

a1

A B

a1 b

a2 b

Table R Table S ColA = { a1, a2, a3 }ColB = { b }

Q(x,y) = R(x), S(x,y)

Page 15: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

ILP CONSTRUCTION (3)

1515

• Projection: Q(y) = R(x), S(x,y)• Constraints:

• (a1,b) in Qfull: x[R.A,a1] ≥ z1 x[S.A,a1] + x[S.B,b] ≥ z1

• (a2,b) in Qfull: x[R.A,a2] ≥ z2 x[S.A,a2] + x[S.B,b] ≥ z2

• (b) in Q : z1 + z2 ≥ 1

A

a1

a2

A B

a1 b

a2 b

Table R Table S ColA = { a1, a2, a3}ColB = { b}

New variable for eachtuple in Qfull

Page 16: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

QUERYMARKET SYSTEM

• Runs on top of any SQL database• Information stored in the database:

– Price points are stored in the database in price tables– Keeping track of price tables with an index table

• The dataset:– English-german translation: Ten,gr(w, w’)

– English-french translation : Ten,fr(w, w’)

– UDF to find hashtags : IsHashtag(w)– Word frequency stats : WF(w, genre, frequency, rank)

16

Page 17: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

PRICE COMPUTATION (1)

17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

2

4

6

8

10ILP solving time

ILP construction time

Query

Tim

e in

sec

ond

s

• Small dataset where columns have size ~ 102

selections 2-way joinsw/o projections

2-way joinswith projections

3-way join

Page 18: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

PRICE COMPUTATION (2)

18

• Larger dataset where columns have size ~ 103

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

10

20

30

40

50

60

70

80

ILP solving time

ILP construction time

Query

Tim

e in

sec

ond

s

selections 2-way joinsw/o projections

2-way joinswith projections

3-way join

Page 19: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

OUTLINE

1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing

19

Page 20: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

QUERY HISTORY

• A user asks a sequence of queries over time of varying information overlap Q = Q1, Q2, …, Qk

• Experiment with 30 selection/join queries

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

2

4

6

8

10

12

14

16

18

High Overlap

query

pric

e in

do

llar

s

Oblivious pricing: each query priced independently

Bundle pricing: each query Qi priced p(Q1,…,Qi)- p(Q1,…,Qi-1)

View pricing: when a query is purchased, the purchased views are free for later queries

Page 21: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

QUERY HISTORY (2)

21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

5

10

15

20

25

Moderate OverlapOblivious pricing

View pricing

Bundle pricing

query

pric

e in

do

llar

s

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

2

4

6

8

10

12

14

16

Weak Overlap

Oblivious pricingBundle pricingView pricing

query

pric

e in

do

llar

s

Page 22: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

VIEW PRICING

• View Pricing is our proposed strategy:– Computationally efficient– Low storage overhead– Close to optimal (bundle) price

• View Pricing can be used for dynamic databases: if view V is purchased at some point and then updated, the user pays only an update price

22

Page 23: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

OUTLINE

1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing

23

Page 24: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

REVENUE SHARING

• How is the revenue shared between sellers if several datasets contribute to the answer?

• What if the cheapest set of views to determine a query is not unique ?

• Example: – Q(‘sigmod13’) = isHashtag(‘sigmod13’), isNoun(‘sigmod13’)– Seller 1 prices $1 per entry for isHashtag, so does seller 2– If both isHashtag, isNoun are false and each costs $1, purchasing

either of the entries answers Q

24

Page 25: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

REVENUE SHARING: SOLUTION

• For a seller s, share(s, Q) is the maximum revenue of s over all minimum-cost set of price points that determine Q

• share(s, Q) can be computed in our framework• Solution: split price(Q) among sellers proportionally to

their shares• Example:

– Both shares are $1– The revenue of each seller will be $0.5, since their shares are equal

25

Page 26: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

CONCLUSIONS

• QueryMarket: the first system that supports pricing a large class of SQL queries within a formal framework

• We presented solutions to address the requirements of a real-world marketplace

• Future work includes:– Scaling the price computation (bucketization)– Full SQL Support (aggregates, negation)– Query answering under limited budget

26

Page 27: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington

Thank you !

27