q uery -b ased d ata p ricing paraschos koutris prasang upadhyaya magdalena balazinska bill howe dan...

27
QUERY-BASED DATA PRICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

Upload: gillian-may

Post on 12-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

QUERY-BASED DATA PRICING

Paraschos KoutrisPrasang UpadhyayaMagdalena BalazinskaBill HoweDan Suciu

University of WashingtonPODS 2012

Page 2: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

2

MOTIVATION

• Data is increasingly sold and bought on the web• Websites that sell data:

– AggData [www.aggdata.com]

– Xignite (financial data) [www.xignite.com]

– Gnip (social media) [www.gnip.com]

• Data marketplace services:– Windows Azure Marketplace (100+ datasets) [datamarket.azure.com]

– Infochimps (15,000 datasets) [www.infochimps.com]

Query-based pricing customized for buyers

Page 3: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

3

CURRENT PRICING (1)

• A fixed price for the whole dataset or for a specific set of views

• Example: CustomLists– USA Business Database for $399– Email addresses for $299– Businesses in WA for $199

• Limitations:– Restaurants in WA ?– Businesses in cities with population >100,000 ?

Page 4: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

4

CURRENT PRICING (2)

• API Subscriptions (Azure Marketplace, Infochimps)– Allow queries over the data– Pay by number of transactions (page of results)

Page 5: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

5

ISSUES WITH PRICING

• Buyers today need to buy a superset of the data they are interested in

• Sellers can’t easily anticipate all possible queries that buyers might ask

• Solution: we need a more flexible pricing scheme, parameterized by queries

Page 6: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

6

OUTLINE

1. The Pricing Framework

2. The Pricing Formula

3. The Complexity of Pricing

4. Dichotomy and Algorithms for Selections

Page 7: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

7

THE PRICING FRAMEWORK

• The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), … }

• A buyer can buy any query Q • The system will compute priceD

S(Q)

Seller

V1,p1

V2,p2

Buyer Q(D) ?

Pricing System+

Database D

priceDS(Q)

Page 8: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

8

INSTANCE-BASED DETERMINACY

Definition. V = V1,…,Vk determine Q given D, denoted D ⊢ V ↠Q, if: forall D’, if V(D) = V(D’), then Q(D) = Q(D’)

Intuitively, “V1,…, Vk determine Q” means that Q(D) can be answered only from V1(D),…,Vk(D), without accessing the database instance D

Page 9: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

9

ARBITRAGE-FREE

Suppose V determines Q and priceD(Q) > priceD(V). Then, we can

1. buy V(D) for priceD(V)

2. compute Q(D) from V(D)3. now we have answered Q at some price

p<priceD(Q)

Axiom 1.Given D, the pricing function priceD(Q) is arbitrage-free if for all views V1, …, Vk and query Q where D

⊢ V1, …, Vk ↠ Q: priceD(Q) ≤ priceD(V1) + … + priceD(Vk)

Page 10: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

10

DISCOUNT-FREE

• The intuition is that the price points represent discounts that the seller offers relative to the price of the whole database

• A pricing function is discount-free if it is maximal

Axiom 2.The pricing function priceD(Q) should not offer any other additional discounts except for the explicit price points defined by the seller.

Page 11: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

11

EXAMPLE: ORIGAMI DATABASE

Page 12: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

12

EXAMPLE: ORIGAMI DATABASE

Shape Color Picture

Swan White . . . . .

Swan Yellow . . . . .

Dragon Yellow . . . . .

Car Yellow . . . . .

Fish White . . . . .

View Price

V1(x,y,z) :- S(x,y,z), x=‘Swan’ $2

V2(x,y,z) :- S(x,y,z), x=‘Dragon’ $2

V3(x,y,z) :- S(x,y,z), x=‘Car’ $2

V4(x,y,z) :- S(x,y,z), x=‘Fish’ $2

W1(x,y,z) :- S(x,y,z), y=‘White’ $3

W2(x,y,z) :- S(x,y,z), y=‘Yellow’ $3

W3(x,y,z) :- S(x,y,z), y=‘Red’ $3

Price pointsDatabase S Get all dragonorigami for $2

Get all red origami for $3

What is the price of the entire database? Q(x,y,z) :- S(x,y,z)

Exhausts the active domain

V1, V2, V3, V4 determine Q: price(Q) ≤ $8W1, W2, W3 determine Q: price(Q) ≤ $9 price(Q)=$8

Page 13: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

13

EXAMPLE: ORIGAMI DATABASE

Shape Color Picture

Swan White . . . . .

Swan Yellow . . . . .

Dragon Yellow . . . . .

Car Yellow . . . . .

Fish White . . . . .

What is the price of the full join? Q(x,y,z,u,v) :- R(x,u), S(x,y,z), T(y,v)

Shape Instructions

Swan fold, cut, fold…

Dragon cut, fold, cut,…

Color PaperSpecs

White 15g/100, $10

Black 20g/100, $15

p(σshape)=$99 p(σcolor)=$50

p(σcolor)=$5p(σshape)=$2

R

S T

Page 14: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

14

OUTLINE

1. The Pricing Framework

2. The Pricing Formula

3. The Complexity of Pricing

4. Dichotomy and Algorithms for Selections

Page 15: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

15

THE QUERY PRICING FORMULA

15

Given:1. Price points S = {(V1,p1),…,(Vk, pk)}2. Database instance D3. Query Q.

Compute: priceDS(Q)

Properties: (a) arbitrage-free, (b) discount-free, (c) priceDS(Vi)=pi

If it exists, we say that the price points are consistent

Theorem.(a)The price points are consistent iff pD(Vi)=pi for any price point i=1,…,k(b) priceD

S(Q) = pD(Q) is the unique arbitrage-free, discount-free pricing function that agrees with the price points

Method:• Consider all subsets of V ={V1,…,Vk} that determine Q• Let C be the subset with the minimum price, Σi pi, for Vi in C• Define pD(Q) = Σi pi

Page 16: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

16

DISCUSSION

• If the result of Q1 is always a subset of Q2, should Q1 be priced less than Q2? No!

Example:– V(x,y) :- Fortune500(x,y)

Q(x,y) :- Fortune500(x,y), StrongBuyRec(x)– price(Q) >> price(V)

• We ignore computation costs in our framework– Cost of computing query Q– Q(D)=f(V(D)), but f can be hard to compute

Page 17: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

17

OUTLINE

1. The Pricing Framework

2. The Pricing Formula

3. The Complexity of Pricing

4. Dichotomy and Algorithms for Selections

Page 18: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

18

DETERMINACY

Definition. [Instance-independent]V determines Q, denoted as V Q, if:↠forall D, D’, if V(D) = V(D’), then Q(D) = Q(D’)

[Nash, Segoufin, Vianu ‘07]

V ↠ Q iff there exists a function f such that Q(D) = f(V(D)) for all D

iff for every D, we have that D V Q ⊢ ↠

Definition. [Instance-dependent]V determines Q given D, denoted as D V Q, if:⊢ ↠forall D’, if V(D’) = V(D), then Q(D) = Q(D’)

Page 19: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

19

COMPLEXITY OF DETERMINACY

V, Q are UCQ V, Q are CQ

Instance-independentV Q↠

Undecidable[NSV ’07]

?

Instance-dependentD V Q ⊢ ↠

data coNP-complete[this paper]

coNP-complete [this paper]

combined

Π2P

[this paper]Π2

P

[this paper]

Open Question: is the bound on the combined complexity tight?

Page 20: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

20

COMPLEXITY OF PRICING

Corollary.Deciding whether priceD

S(Q) ≤ k is:• Combined complexity [input S, D]: Σp

2

• Data complexity [input D]: coNP-hard

Proposition.Pricing is at least as hard as determinacy

How do we deal with the hardness of computation?

Page 21: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

21

OUTLINE

1. The Pricing Framework

2. The Pricing Formula

3. The Complexity of Pricing

4. Dichotomy and Algorithms for Selections

Page 22: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

22

RESTRICTING PRICE POINTS TO SELECTIONS

• A seller can specify only the prices of selection queries of the form σR.X=a: prices on columns

• The domain of each column is finite and known to buyers and sellers

• Price points on selections is how prices are set in most cases today

Page 23: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

23

DICHOTOMY THEOREMTheorem.Assuming selection views only, for any Conjunctive Query w/o self-joins Q, one of the following holds (data complexity):(a) priceQ

S(D) is in PTIME(b) checking whether priceQ

S(D)≤k is NP-complete

• PTIME:– Q(x,y,z,u,v) :- R(x,u),S(x,y,z),T(y,v) [Chains]– Q(x1,…,xk) :- R1(x1,x2),…,Rk(xk,x1) [Cycles]

• NP-complete: – Q(x) :- R(x,y) [Projections]– Q(x,y,z) :- R(x,y,z),S(x),T(y),U(z)

Page 24: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

24

ALGORITHM FOR PTIME CASES

• The algorithm uses a reduction to maximum flow• Edges of finite capacity represent price points• A set of edges of finite cost is a cut iff they

determine the query• Example:

– Chain query Q(x,y):-R(x),S(x,y),T(y)

X

a1

a2

X Y

a1 b1

a2 b2

a2 b2

a3 b2

a4 b1

Y

b1

b3

Dom(X) = {a1,a2,a3,a4}Dom(Y) = {b1,b2,b3}

R

S

T

Page 25: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

FLOW GRAPH

25

a4

a3

a2

a1

R

b1

b2

b3

T

b1

b2

b3

S

a4

a3

a2

a1

X

a1

a2

X Y

a1 b1

a2 b2

a2 b2

a3 b2

a4 b1

Y

b1

b3

RS

T

A set of edges of finite cost is a cut iff they determine the query

Page 26: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

26

CONCLUSIONS

• Summary:– The seller sets prices to some views, while the system

computes the price of any query– Interesting application of query determinacy– Complexity: dichotomy for CQs w/o self-joins

• Future Work:– Pricing in the presence of updates– How do we overcome pricing for intractable queries?– Connection of pricing and privacy

Page 27: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

27

Thank you !