dependable cardinality forecast for xquery

19
Dependable Cardinality Forecasts for XQuery Jens Teubner, ETH (formerly IBM Research) Torsten Grust, U T¨ ubingen (formerly TUM) Sebastian Maneth, UNSW and NICTA Sherif Sakr, UNSW and NICTA c Systems Group — Department of Computer Science — ETH Z¨ urich August 26, 2008

Upload: university-of-new-south-wales

Post on 13-Jul-2015

332 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Dependable Cardinality Forecast for XQuery

Dependable Cardinality Forecasts for XQuery

Jens Teubner, ETH (formerly IBM Research)Torsten Grust, U Tubingen (formerly TUM)

Sebastian Maneth, UNSW and NICTASherif Sakr, UNSW and NICTA

c© Systems Group — Department of Computer Science — ETH Zurich August 26, 2008

Page 2: Dependable Cardinality Forecast for XQuery

Cardinality Estimation for XQueryThe feature-richness and semantics of the language makecardinality estimation for XQuery notoriously hard.

for $d in doc ("forecast.xml")/descendant::day

let $day := $d/@t

let $ppcp := data ($d/descendant::ppcp)

return

if ($ppcp > 50)

then ("rain likely on", $day,

"chance of precipitation:", $ppcp)

else ("no rain on", $day)

I for iterationI conditionals (if-then-else)I existential quantification

I sequence constructionI . . .(XPath is not a focus of this work.)

→ Goal: Compute subexpression-level cardinalities cardi.

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 2

Page 3: Dependable Cardinality Forecast for XQuery

Cardinality Estimation for XQueryThe feature-richness and semantics of the language makecardinality estimation for XQuery notoriously hard.

card1 = ?

card2 = ?card3 = ?

card4 = ?

for $d in doc ("forecast.xml")/descendant::day

let $day := $d/@t

let $ppcp := data ($d/descendant::ppcp)

return

if ($ppcp > 50)

then ("rain likely on", $day,

"chance of precipitation:", $ppcp)

else ("no rain on", $day)

I for iterationI conditionals (if-then-else)I existential quantification

I sequence constructionI . . .(XPath is not a focus of this work.)

→ Goal: Compute subexpression-level cardinalities cardi.

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 2

Page 4: Dependable Cardinality Forecast for XQuery

Idea: Perform cardinality estimation on relational planequivalents for XQuery.

relational cardinality estimation (System R)+ existing work on XPath estimation+ histograms for value predicates

= cardinality estimation for XQuery

I Build on Pathfinder’s XQuery-to-relational algebra compiler.1

→ tuple count≡ XQuery item countI Cardinality information for each subexpression.

1http://www.pathfinder-xquery.org/

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 3

Page 5: Dependable Cardinality Forecast for XQuery

iter1

πiter,item:"forecast.xml"

docitem

item:descendant::day(item)

%pos:〈item〉‖iter

%inner:〈iter,pos〉

πiter:inner,itemπinner,outer:iter,

ord:pos

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

item:attribute::t(item)

%pos:〈item〉‖iter%pos:〈item〉‖iter

πiter1:iter,pos,item

item

πiter1:iter,pos,item

oniter1=iter

=res:(item,item1)

σres

πiter

δ\

πiter,pos:1,ord:1,item:"rain..."

oniter1=iter

πiter,pos,item,ord:2

·∪

·∪

·∪

oniter=iter1

πiter,pos:1,ord:3,item:"chance..."

πiter,pos,item,ord:4

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

oniter1=iter

πiter,pos:1,ord:1,item:"no rain..."

πiter,pos,item,ord:2

·∪

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

·∪

oniter=inner

%pos1:〈ord,pos〉‖outer

πiter:outer,pos:pos1,item

πiter

4©for $d in doc ("forecast.xml")/descendant::day

let $day := $d/@t

let $ppcp := data ($d/descendant::ppcp)

return

if ($ppcp > 50)

then ("rain likely on", $day,

"chance of precipitation:", $ppcp)

else ("no rain on", $day)

Example (Plan details not of interest today.)

I Pathfinder compiles XQuerywith arbitrary nesting.

I Maintain correspondence tooriginal query if back-end isnot relational.

I Derive estimates based on aninference rule set.

Page 6: Dependable Cardinality Forecast for XQuery

Relational XQuery Cardinality EstimationApply System R-style estimation to relational XQuery plans, e.g.,

Disjoint union:|q1 ·∪ q2| = |q1|+ |q2|

Cartesian product:|q1 × q2| = |q1| · |q2|

Equi-join:

|q1 ona=b

q2| =

|q1| · |q2|max {|a|idx , |b|idx}

if there are indexes onboth join columns,

?

|q1| · |q2||a|idx

if there is only an indexon column a,

?

|q1| · |q2| · 1/10 otherwise

|c|idx: Number of unique values in index on column c.

I Our joins typically operate over computed relations.

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 5

Page 7: Dependable Cardinality Forecast for XQuery

Relational XQuery Cardinality EstimationApply System R-style estimation to relational XQuery plans, e.g.,

Disjoint union:|q1 ·∪ q2| = |q1|+ |q2|

Cartesian product:|q1 × q2| = |q1| · |q2|

Equi-join:

|q1 ona=b

q2| =

|q1| · |q2|max {|a|idx , |b|idx}

if there are indexes onboth join columns, ?

|q1| · |q2||a|idx

if there is only an indexon column a, ?

|q1| · |q2| · 1/10 otherwise

|c|idx: Number of unique values in index on column c.

I Our joins typically operate over computed relations.

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 5

Page 8: Dependable Cardinality Forecast for XQuery

Abstract Domain Identifiers

A simple form of data flow analysis provides the informationneeded.

I Introduce abstract domain identifiers α, β, . . . as placeholdersfor the active runtime domain for each column c.(Read cα as “column c contains values from domain α.”)

I Estimate the size ‖α‖ of each domain α, e.g.,2

dom(%a:〈b1,...,bn〉(q)

)⊇ dom (q) ∪

{aα ∧ ‖α‖ =! |q|

}.

I Identify inclusion relationships α v β between domains, e.g.,

aα ∈ dom (q) ∧ aβ ∈ dom (σ···(q))⇒ β v α .

2Operator %a:〈b1,...,bn〉 introduces a new key column (holding row numbers).August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 6

Page 9: Dependable Cardinality Forecast for XQuery

Abstract Domain IdentifiersUse abstract domain information for cardinality estimation.

E.g., “foreign key” join:

aα ∈ dom (q1) bβ ∈ dom (q2) α v β

|q1 ona=b

q2| =|q1| · |q2|‖β‖

I Domain inclusion guarantees that each tuple in q1 finds(at least one) join partner in q2.

Other examples:I |q1 \ q2| = |q1| − |q2| if q2 is a subset of q1.I |q1 \ q2| = 0 if q1 is a subset of q2.I |q1 \ q2| = |q1| if q1 and q2 are disjoint.

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 7

Page 10: Dependable Cardinality Forecast for XQuery

Interfacing with XPath—Projection Paths

Track XPath navigation by means of projection paths3

a

b c d

e

a b1 γa2 γb3 γd

a b c1 γa γb1 γa γc1 γa γd3 γd γe

c:child::*(b)

q1 q2

I b⇒p ∈ path (q1) ⇒ c⇒p/child::* ∈ path (q2)

I Step operator makes XPath navigation explicit in relationalplans (compiles to join on SQL back-ends).

3A. Marian and J. Simeon. Projecting XML Documents. VLDB 2003.August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 8

Page 11: Dependable Cardinality Forecast for XQuery

Interfacing with XPath—Cardinality Inference

a

b c d

e

a b1 γa2 γb3 γd

a b c1 γa γb1 γa γc1 γa γd3 γd γe

c:child::*(b)

q1 q2

bα bα′

α′ v α

Cardinality:|q2| = |q1| · fn:count (p/child::*)

fn:count (p) = |q1| · Prchild::* (p)︸ ︷︷ ︸fanout (here: 4/3)

Domain Sizes:‖α′‖ = ‖α‖ · fn:count (p [ child::* ])

fn:count (p) = ‖α‖ · Pr[child::*] (p)︸ ︷︷ ︸selectivity (here: 2/3)

Any XPath estimator that provides Prp2 (p1) and Pr[p2] (p1) will do.I Our prototype uses a simple Data Guide-based implementation.

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 9

Page 12: Dependable Cardinality Forecast for XQuery

iter1

πiter,item:"forecast.xml"

docitem

item:descendant::day(item)

%pos:〈item〉‖iter

%inner:〈iter,pos〉

πiter:inner,itemπinner,outer:iter,

ord:pos

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

item:attribute::t(item)

%pos:〈item〉‖iter%pos:〈item〉‖iter

πiter1:iter,pos,item

item

πiter1:iter,pos,item

oniter1=iter

=res:(item,item1)

σres

πiter

δ\

πiter,pos:1,ord:1,item:"rain..."

oniter1=iter

πiter,pos,item,ord:2

·∪

·∪

·∪

oniter=iter1

πiter,pos:1,ord:3,item:"chance..."

πiter,pos,item,ord:4

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

oniter1=iter

πiter,pos:1,ord:1,item:"no rain..."

πiter,pos,item,ord:2

·∪

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

·∪

oniter=inner

%pos1:〈ord,pos〉‖outer

πiter:outer,pos:pos1,item

πiter

1

2

34

Back to our Example Plan

πiter:inner,item

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

%pos:〈item〉‖iter

item

oniter1=iter

=res:(item,item1)

πiter

a: Retrieve typed values for node identifiersin column a (atomization).

Page 13: Dependable Cardinality Forecast for XQuery

iter1

πiter,item:"forecast.xml"

docitem

item:descendant::day(item)

%pos:〈item〉‖iter

%inner:〈iter,pos〉

πiter:inner,itemπinner,outer:iter,

ord:pos

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

item:attribute::t(item)

%pos:〈item〉‖iter%pos:〈item〉‖iter

πiter1:iter,pos,item

item

πiter1:iter,pos,item

oniter1=iter

=res:(item,item1)

σres

πiter

δ\

πiter,pos:1,ord:1,item:"rain..."

oniter1=iter

πiter,pos,item,ord:2

·∪

·∪

·∪

oniter=iter1

πiter,pos:1,ord:3,item:"chance..."

πiter,pos,item,ord:4

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

oniter1=iter

πiter,pos:1,ord:1,item:"no rain..."

πiter,pos,item,ord:2

·∪

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

·∪

oniter=inner

%pos1:〈ord,pos〉‖outer

πiter:outer,pos:pos1,item

πiter

1

2

34

Back to our Example Plan

Projection path:item⇒···/descendant::ppcp∈ path

(...(q)

)Cardinality (uses fanout):∣∣ item:ax::nt(item)(q)

∣∣ = |q|·Prax::nt (· · · )

Domain inclusion:iterα

′∈ dom

(...(q)

); α′ v α

Domain size (uses step selectivity):‖α′‖ = ‖α‖ · Pr[ax::nt] (· · · )

iterα

iterα′

iterα

iterα′iter1α

πiter:inner,item

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

%pos:〈item〉‖iter

item

oniter1=iter

=res:(item,item1)

πiter

a: Retrieve typed values for node identifiersin column a (atomization).

Page 14: Dependable Cardinality Forecast for XQuery

iter1

πiter,item:"forecast.xml"

docitem

item:descendant::day(item)

%pos:〈item〉‖iter

%inner:〈iter,pos〉

πiter:inner,itemπinner,outer:iter,

ord:pos

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

item:attribute::t(item)

%pos:〈item〉‖iter%pos:〈item〉‖iter

πiter1:iter,pos,item

item

πiter1:iter,pos,item

oniter1=iter

=res:(item,item1)

σres

πiter

δ\

πiter,pos:1,ord:1,item:"rain..."

oniter1=iter

πiter,pos,item,ord:2

·∪

·∪

·∪

oniter=iter1

πiter,pos:1,ord:3,item:"chance..."

πiter,pos,item,ord:4

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

oniter1=iter

πiter,pos:1,ord:1,item:"no rain..."

πiter,pos,item,ord:2

·∪

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

·∪

oniter=inner

%pos1:〈ord,pos〉‖outer

πiter:outer,pos:pos1,item

πiter

1

2

34

Back to our Example Plan

Projection path:item⇒···/descendant::ppcp∈ path

(...(q)

)Cardinality (uses fanout):∣∣ item:ax::nt(item)(q)

∣∣ = |q|·Prax::nt (· · · )

Domain inclusion:iterα

′∈ dom

(...(q)

); α′ v α

Domain size (uses step selectivity):‖α′‖ = ‖α‖ · Pr[ax::nt] (· · · )

iterα

iterα′

iterα

iterα′iter1α

πiter:inner,item

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

%pos:〈item〉‖iter

item

oniter1=iter

=res:(item,item1)

πiter

a: Retrieve typed values for node identifiersin column a (atomization).

Page 15: Dependable Cardinality Forecast for XQuery

iter1

πiter,item:"forecast.xml"

docitem

item:descendant::day(item)

%pos:〈item〉‖iter

%inner:〈iter,pos〉

πiter:inner,itemπinner,outer:iter,

ord:pos

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

item:attribute::t(item)

%pos:〈item〉‖iter%pos:〈item〉‖iter

πiter1:iter,pos,item

item

πiter1:iter,pos,item

oniter1=iter

=res:(item,item1)

σres

πiter

δ\

πiter,pos:1,ord:1,item:"rain..."

oniter1=iter

πiter,pos,item,ord:2

·∪

·∪

·∪

oniter=iter1

πiter,pos:1,ord:3,item:"chance..."

πiter,pos,item,ord:4

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

oniter1=iter

πiter,pos:1,ord:1,item:"no rain..."

πiter,pos,item,ord:2

·∪

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

·∪

oniter=inner

%pos1:〈ord,pos〉‖outer

πiter:outer,pos:pos1,item

πiter

1

2

34

Back to our Example Plan

“Foreign key” join:∣∣q1 oniter1=iter

q2∣∣ = |q1|·|q2|

‖α‖ (since α′ v α)

Projection path:item⇒···/descendant::ppcp∈ path

(...(q)

)Cardinality (uses fanout):∣∣ item:ax::nt(item)(q)

∣∣ = |q|·Prax::nt (· · · )

Domain inclusion:iterα

′∈ dom

(...(q)

); α′ v α

Domain size (uses step selectivity):‖α′‖ = ‖α‖ · Pr[ax::nt] (· · · )

iterα

iterα′

iterα

iterα′iter1α

πiter:inner,item

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

%pos:〈item〉‖iter

item

oniter1=iter

=res:(item,item1)

πiter

a: Retrieve typed values for node identifiersin column a (atomization).

Page 16: Dependable Cardinality Forecast for XQuery

iter1

πiter,item:"forecast.xml"

docitem

item:descendant::day(item)

%pos:〈item〉‖iter

%inner:〈iter,pos〉

πiter:inner,itemπinner,outer:iter,

ord:pos

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

item:attribute::t(item)

%pos:〈item〉‖iter%pos:〈item〉‖iter

πiter1:iter,pos,item

item

πiter1:iter,pos,item

oniter1=iter

=res:(item,item1)

σres

πiter

δ\

πiter,pos:1,ord:1,item:"rain..."

oniter1=iter

πiter,pos,item,ord:2

·∪

·∪

·∪

oniter=iter1

πiter,pos:1,ord:3,item:"chance..."

πiter,pos,item,ord:4

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

oniter1=iter

πiter,pos:1,ord:1,item:"no rain..."

πiter,pos,item,ord:2

·∪

%pos1:〈ord,pos〉‖iter

πiter,pos:pos1,item

·∪

oniter=inner

%pos1:〈ord,pos〉‖outer

πiter:outer,pos:pos1,item

πiter

1

2

34

card1: 990

card2: 4104

card3: 3540 card4: 564Back to our Example Plan

“Foreign key” join:∣∣q1 oniter1=iter

q2∣∣ = |q1|·|q2|

‖α‖ (since α′ v α)

Projection path:item⇒···/descendant::ppcp∈ path

(...(q)

)Cardinality (uses fanout):∣∣ item:ax::nt(item)(q)

∣∣ = |q|·Prax::nt (· · · )

Domain inclusion:iterα

′∈ dom

(...(q)

); α′ v α

Domain size (uses step selectivity):‖α′‖ = ‖α‖ · Pr[ax::nt] (· · · )

iterα

iterα′

iterα

iterα′iter1α

πiter:inner,item

item:descendant::ppcp(item)

πiter1:iter,pos1:1,item1:50

%pos:〈item〉‖iter

item

oniter1=iter

=res:(item,item1)

πiter

a: Retrieve typed values for node identifiersin column a (atomization).

Page 17: Dependable Cardinality Forecast for XQuery

Forecasting New Zealand’s WeatherThe obtained cardinalities can be mapped back to predict itemcounts for corresponding XQuery expressions:4

card1 = 990/990

card2 = 4104/3402card3 = 3540/2370

card4 = 564/1032

for $d in doc ("forecast.xml")/descendant::day

let $day := $d/@t

let $ppcp := data ($d/descendant::ppcp)

return

if ($ppcp > 50)

then ("rain likely on", $day,

"chance of precipitation:", $ppcp)

else ("no rain on", $day)

I Value statistics based on histograms (↗ paper)I Inaccuracy is mainly due to correlations in the data.

I Rain in the morning likely means rain in the afternoon, too.

4estimated/observed, based on data taken for New Zealand two weeks ago.August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 11

Page 18: Dependable Cardinality Forecast for XQuery

More Realistic Queries: W3C XQuery Use CasesI Prototype implementation

based on PathfinderI For each subexpression:

estimated cardinalityobserved cardinality

I Diameter indicates datapoint “stacking”

I Plan root: filled circle

I Applicable to “real” queriesI Recovery from intermediate

mis-estimationsI e.g., existential semantics

XMP Q4XMP Q5XMP Q6XMP Q8XMP Q9XMP Q10XMP Q11SGML Q3SGML Q4SGML Q8aSGML Q8bR Q2R Q3R Q10R Q11R Q13R Q14R Q15R Q17

0.01

0.01

0.1

0.1

1

1

10

10estimated cardinality/observed cardinality

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 12

Page 19: Dependable Cardinality Forecast for XQuery

Wrap-UpI Cardinality estimation framework for XQueryI Subexpression-level estimates for arbitrary XQuery expressionsI Based on Pathfinder’s XQuery-to-relational algebra compiler

relational cardinality estimation (System R)+ existing work on XPath estimation+ histograms for value predicates

= cardinality estimation for XQuery

I High-quality estimates for realistic XQuery workloadsI Robust with respect to intermediate errors

I Pluggable and extensibleI e.g., XPath estimation subsystem, positional predicates

August 26, 2008 Systems Group — Department of Computer Science — ETH Zurich 13