t oward p ractical q uery p ricing w ith q uery m arket paraschos koutris prasang upadhyaya...
TRANSCRIPT
TOWARD PRACTICAL QUERY PRICING WITH QUERYMARKET
Paraschos KoutrisPrasang UpadhyayaMagdalena BalazinskaBill HoweDan Suciu
University of WashingtonSIGMOD 2013
MOTIVATION
• Data is increasingly sold and bought on the web• Websites that sell data:
– Xignite (financial)– Gnip (social)
• Data marketplace services:– Windows Azure Marketplace – Infochimps– Factual – DataMarket
2
A PRICING SCENARIO (1)
3
English-German dictionary T
PRICING SCHEMES
Sell the whole table T for a fixed price• Q: translate only the word “thanks”• The user pays for redundant information
Price per output tuple• Q: Does the word “thanks” translate to “Auto” ?• An empty result still carries information
english german
thanks Danke
car Auto
day Tag
road Strasse
Road Weg
… …
A PRICING SCENARIO (2)
4
English-German dictionary T Word Frequency Stats UF
word frequency genre rank
rock 0.025 music 20
pop 0.030 music 10
database 0.001 science 1453
… … .. …
• Current systems do not sell queries that combine datasets• Queries issued by a user may have overlapping content
Q1: Return all translations to German of top 10 words in the genre “music”
Q2: Return all translations to German of top 20 words in the genre “music”
english german
thanks Danke
car Auto
day Tag
road Strasse
Road Weg
… …
HOW TO PRICE DATA
5
english german
thanks Danke
car Auto
day Tag
road Strasse
road Weg
… …
English-German dictionary T
p(σT.english=‘thanks’)=$0.1
p(σT.english=‘day’)=$0.1
p(σT.english=‘road’)=$0.15
p(σT.english=‘cat’)=$0.05
Price points• selection queries on single table• exhaust the possible values (ColA) of some attribute A• may select on values not in the active domain
p(σT.english=‘car’)=$0.1 p(σT.german=‘Auto’)=$0.5
…
QUERYMARKET: CONTRIBUTIONS
• A formal pricing framework where:– sellers specify a set of price points as selection queries– buyers can purchase any query on the database– the system automatically computes the price of the query
• Support efficient computation of prices for a large class of SQL queries
• Support the necessary functionality for a marketplace:– Pricing queries with overlapping information content– Database updates– Revenue sharing among different sellers?
6
OUTLINE
1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing
7
THE PRICING FRAMEWORK
• The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), … }
• A buyer can buy any query Q • The system will compute priceD
S(Q)
Seller
Price points
Buyer Q(D) ?
Pricing System+
Database D
priceDS(Q)
8
[Koutris et al., PODS 2012]
PROPERTIES OF PRICES
Arbitrage-free: Given D, priceD(Q) is arbitrage-free if for all views V1, …, Vk that determine Q:
priceD(Q) ≤ priceD(V1) + … + priceD(Vk)
Discount-free: priceD(Q) must not offer additional discounts except for the explicit price points defined by the seller
9
We say that the views V1,…, Vk determine Q if one can compute Q(D) from V1(D),…, Vk(D) without access to D
THE PRICING FORMULA
1010
Arbitrage-Price:• The price of the cheapest set of views from price points
S that determine the query Q• unique + arbitrage-free + discount-free + agrees with
price points
A
a1
A B
a1 b
a2 b
Table R Table SColA = { a1, a2, a3 }ColB = { b }
price = $1 price = $2 price = $3
• {σ[R.A=a1], σ[S.B=b] } determines Q • cost = 1 + 3 = 4
• {σ[R.A=a1], σ[S.A=a1] } also determines Q• cost = 1 + 2 = 3 (cheapest possible)
Q(y) = R(x),S(x,y)
OUTLINE
1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing
11
COMPUTING THE PRICE
1212
• The problem of computing the arbitrage price even for SELECT-PROJECT-JOIN queries is coNP-complete
• For some queries, the price can be computed fast:• Selections, joins w/o projection
• We describe pricing as an Integer Linear Program (ILP) and then use fast ILP solvers (e.g. GLPK, CPLEX)
• Classes of queries supported:• Selections/Projections/Joins• Unions• User-Defined Functions (UDF)• Bundles of queries
ILP CONSTRUCTION (1)
1313
• Price the query Q(x,y) = R(x), S(x,y)• Introduce a {0/1} variable x[attribute,value] for each
price point: x[R.A, a2], x[S.A, a1], x[S.B, b], …
A
a1
A B
a1 b
a2 b
Table R Table S ColA = { a1, a2, a3 }ColB = { b }
price = $1 price = $2 price = $3
ILP CONSTRUCTION (2)
1414
• Minimize (independent of the query):price = x[R.A,a1] + x[R.A,a2] + x[R.A,a3] +2x[S.A,a1] + 2x[S.A,a2] + 2x[S.A,a3] +3x[S.B,b]
• Constraints:• (a1,b) in Q: x[R.A,a1] ≥ 1 x[S.A,a1] + x[S.B,b] ≥ 1• (a2,b) not in Q: x[R.A,a2] ≥ 1 • (a3,b) not in Q: x[R.A,a3] + x[S.A,a3] + x[S.B,b] ≥ 1
A
a1
A B
a1 b
a2 b
Table R Table S ColA = { a1, a2, a3 }ColB = { b }
Q(x,y) = R(x), S(x,y)
ILP CONSTRUCTION (3)
1515
• Projection: Q(y) = R(x), S(x,y)• Constraints:
• (a1,b) in Qfull: x[R.A,a1] ≥ z1 x[S.A,a1] + x[S.B,b] ≥ z1
• (a2,b) in Qfull: x[R.A,a2] ≥ z2 x[S.A,a2] + x[S.B,b] ≥ z2
• (b) in Q : z1 + z2 ≥ 1
A
a1
a2
A B
a1 b
a2 b
Table R Table S ColA = { a1, a2, a3}ColB = { b}
New variable for eachtuple in Qfull
QUERYMARKET SYSTEM
• Runs on top of any SQL database• Information stored in the database:
– Price points are stored in the database in price tables– Keeping track of price tables with an index table
• The dataset:– English-german translation: Ten,gr(w, w’)
– English-french translation : Ten,fr(w, w’)
– UDF to find hashtags : IsHashtag(w)– Word frequency stats : WF(w, genre, frequency, rank)
16
PRICE COMPUTATION (1)
17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
2
4
6
8
10ILP solving time
ILP construction time
Query
Tim
e in
sec
ond
s
• Small dataset where columns have size ~ 102
selections 2-way joinsw/o projections
2-way joinswith projections
3-way join
PRICE COMPUTATION (2)
18
• Larger dataset where columns have size ~ 103
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
10
20
30
40
50
60
70
80
ILP solving time
ILP construction time
Query
Tim
e in
sec
ond
s
selections 2-way joinsw/o projections
2-way joinswith projections
3-way join
OUTLINE
1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing
19
QUERY HISTORY
• A user asks a sequence of queries over time of varying information overlap Q = Q1, Q2, …, Qk
• Experiment with 30 selection/join queries
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
2
4
6
8
10
12
14
16
18
High Overlap
query
pric
e in
do
llar
s
Oblivious pricing: each query priced independently
Bundle pricing: each query Qi priced p(Q1,…,Qi)- p(Q1,…,Qi-1)
View pricing: when a query is purchased, the purchased views are free for later queries
QUERY HISTORY (2)
21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
5
10
15
20
25
Moderate OverlapOblivious pricing
View pricing
Bundle pricing
query
pric
e in
do
llar
s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
2
4
6
8
10
12
14
16
Weak Overlap
Oblivious pricingBundle pricingView pricing
query
pric
e in
do
llar
s
VIEW PRICING
• View Pricing is our proposed strategy:– Computationally efficient– Low storage overhead– Close to optimal (bundle) price
• View Pricing can be used for dynamic databases: if view V is purchased at some point and then updated, the user pays only an update price
22
OUTLINE
1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing
23
REVENUE SHARING
• How is the revenue shared between sellers if several datasets contribute to the answer?
• What if the cheapest set of views to determine a query is not unique ?
• Example: – Q(‘sigmod13’) = isHashtag(‘sigmod13’), isNoun(‘sigmod13’)– Seller 1 prices $1 per entry for isHashtag, so does seller 2– If both isHashtag, isNoun are false and each costs $1, purchasing
either of the entries answers Q
24
REVENUE SHARING: SOLUTION
• For a seller s, share(s, Q) is the maximum revenue of s over all minimum-cost set of price points that determine Q
• share(s, Q) can be computed in our framework• Solution: split price(Q) among sellers proportionally to
their shares• Example:
– Both shares are $1– The revenue of each seller will be $0.5, since their shares are equal
25
CONCLUSIONS
• QueryMarket: the first system that supports pricing a large class of SQL queries within a formal framework
• We presented solutions to address the requirements of a real-world marketplace
• Future work includes:– Scaling the price computation (bucketization)– Full SQL Support (aggregates, negation)– Query answering under limited budget
26
Thank you !
27