dependable cardinality forecast for xquery

Download Dependable Cardinality Forecast for XQuery

Post on 13-Jul-2015

320 views

Category:

Education

1 download

Embed Size (px)

TRANSCRIPT

  • Dependable Cardinality Forecasts for XQuery

    Jens Teubner, ETH (formerly IBM Research)Torsten Grust, U Tubingen (formerly TUM)

    Sebastian Maneth, UNSW and NICTASherif Sakr, UNSW and NICTA

    c Systems Group Department of Computer Science ETH Zurich August 26, 2008

  • Cardinality Estimation for XQueryThe feature-richness and semantics of the language makecardinality estimation for XQuery notoriously hard.

    for $d in doc ("forecast.xml")/descendant::day

    let $day := $d/@t

    let $ppcp := data ($d/descendant::ppcp)

    return

    if ($ppcp > 50)

    then ("rain likely on", $day,

    "chance of precipitation:", $ppcp)

    else ("no rain on", $day)

    I for iterationI conditionals (if-then-else)I existential quantification

    I sequence constructionI . . .(XPath is not a focus of this work.)

    Goal: Compute subexpression-level cardinalities cardi.

    August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 2

  • Cardinality Estimation for XQueryThe feature-richness and semantics of the language makecardinality estimation for XQuery notoriously hard.

    card1 = ?

    card2 = ?card3 = ?

    card4 = ?

    for $d in doc ("forecast.xml")/descendant::day

    let $day := $d/@t

    let $ppcp := data ($d/descendant::ppcp)

    return

    if ($ppcp > 50)

    then ("rain likely on", $day,

    "chance of precipitation:", $ppcp)

    else ("no rain on", $day)

    I for iterationI conditionals (if-then-else)I existential quantification

    I sequence constructionI . . .(XPath is not a focus of this work.)

    Goal: Compute subexpression-level cardinalities cardi.

    August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 2

  • Idea: Perform cardinality estimation on relational planequivalents for XQuery.

    relational cardinality estimation (System R)+ existing work on XPath estimation+ histograms for value predicates= cardinality estimation for XQuery

    I Build on Pathfinders XQuery-to-relational algebra compiler.1 tuple count XQuery item count

    I Cardinality information for each subexpression.

    1http://www.pathfinder-xquery.org/August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 3

  • iter1

    piiter,item:"forecast.xml"

    docitem

    item:descendant::day(item)

    %pos:itemiter

    %inner:iter,pos

    piiter:inner,itempiinner,outer:iter,ord:pos

    item:descendant::ppcp(item)

    piiter1:iter,pos1:1,item1:50

    item:attribute::t(item)

    %pos:itemiter%pos:itemiter

    piiter1:iter,pos,item

    item

    piiter1:iter,pos,item

    oniter1=iter

    =res:(item,item1)res

    piiter

    \

    piiter,pos:1,ord:1,item:"rain..."

    oniter1=iter

    piiter,pos,item,ord:2

    oniter=iter1

    piiter,pos:1,ord:3,item:"chance..."

    piiter,pos,item,ord:4

    %pos1:ord,positer

    piiter,pos:pos1,item

    oniter1=iter

    piiter,pos:1,ord:1,item:"no rain..."

    piiter,pos,item,ord:2

    %pos1:ord,positer

    piiter,pos:pos1,item

    oniter=inner

    %pos1:ord,posouter

    piiter:outer,pos:pos1,item

    piiter

    1

    1

    2

    2

    3

    3

    4

    4for $d in doc ("forecast.xml")/descendant::day

    let $day := $d/@t

    let $ppcp := data ($d/descendant::ppcp)

    return

    if ($ppcp > 50)

    then ("rain likely on", $day,

    "chance of precipitation:", $ppcp)

    else ("no rain on", $day)

    Example (Plan details not of interest today.)

    I Pathfinder compiles XQuerywith arbitrary nesting.

    I Maintain correspondence tooriginal query if back-end isnot relational.

    I Derive estimates based on aninference rule set.

  • Relational XQuery Cardinality EstimationApply System R-style estimation to relational XQuery plans, e.g.,

    Disjoint union:|q1 q2| = |q1|+ |q2|

    Cartesian product:|q1 q2| = |q1| |q2|

    Equi-join:

    |q1 ona=bq2| =

    |q1| |q2|max {|a|idx , |b|idx}

    if there are indexes onboth join columns,

    ?

    |q1| |q2||a|idx

    if there is only an indexon column a,

    ?

    |q1| |q2| 1/10 otherwise

    |c|idx: Number of unique values in index on column c.

    I Our joins typically operate over computed relations.

    August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 5

  • Relational XQuery Cardinality EstimationApply System R-style estimation to relational XQuery plans, e.g.,

    Disjoint union:|q1 q2| = |q1|+ |q2|

    Cartesian product:|q1 q2| = |q1| |q2|

    Equi-join:

    |q1 ona=bq2| =

    |q1| |q2|max {|a|idx , |b|idx}

    if there are indexes onboth join columns, ?

    |q1| |q2||a|idx

    if there is only an indexon column a, ?

    |q1| |q2| 1/10 otherwise

    |c|idx: Number of unique values in index on column c.

    I Our joins typically operate over computed relations.

    August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 5

  • Abstract Domain Identifiers

    A simple form of data flow analysis provides the informationneeded.I Introduce abstract domain identifiers , , . . . as placeholders

    for the active runtime domain for each column c.(Read c as column c contains values from domain .)

    I Estimate the size of each domain , e.g.,2

    dom(%a:b1,...,bn(q)

    ) dom (q) {a =! |q|} .I Identify inclusion relationships v between domains, e.g.,

    a dom (q) a dom ((q)) v .

    2Operator %a:b1,...,bn introduces a new key column (holding row numbers).August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 6

  • Abstract Domain IdentifiersUse abstract domain information for cardinality estimation.E.g., foreign key join:

    a dom (q1) b dom (q2) v |q1 ona=bq2| =

    |q1| |q2|

    I Domain inclusion guarantees that each tuple in q1 finds(at least one) join partner in q2.

    Other examples:I |q1 \ q2| = |q1| |q2| if q2 is a subset of q1.I |q1 \ q2| = 0 if q1 is a subset of q2.I |q1 \ q2| = |q1| if q1 and q2 are disjoint.

    August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 7

  • Interfacing with XPathProjection Paths

    Track XPath navigation by means of projection paths3

    ab c d

    e

    a b1 a2 b3 d

    a b c1 a b1 a c1 a d3 d e

    c:child::*(b)

    q1 q2

    I bp path (q1) cp/child::* path (q2)I Step operator makes XPath navigation explicit in relational

    plans (compiles to join on SQL back-ends).

    3A. Marian and J. Simeon. Projecting XML Documents. VLDB 2003.August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 8

  • Interfacing with XPathCardinality Inference

    ab c d

    e

    a b1 a2 b3 d

    a b c1 a b1 a c1 a d3 d e

    c:child::*(b)

    q1 q2

    b b

    v

    Cardinality:|q2| = |q1| fn:count (p/child::*)fn:count (p) = |q1| Prchild::* (p)

    fanout (here: 4/3)Domain Sizes: = fn:count (p [ child::* ])

    fn:count (p) = Pr[child::*] (p) selectivity (here: 2/3)

    Any XPath estimator that provides Prp2 (p1) and Pr[p2] (p1)will do.I Our prototype uses a simple Data Guide-based implementation.

    August 26, 2008 Systems GroupDepartment of Computer Science ETH Zurich 9

  • iter1

    piiter,item:"forecast.xml"

    docitem

    item:descendant::day(item)

    %pos:itemiter

    %inner:iter,pos

    piiter:inner,itempiinner,outer:iter,ord:pos

    item:descendant::ppcp(item)

    piiter1:iter,pos1:1,item1:50

    item:attribute::t(item)

    %pos:itemiter%pos:itemiter

    piiter1:iter,pos,item

    item

    piiter1:iter,pos,item

    oniter1=iter

    =res:(item,item1)res

    piiter

    \

    piiter,pos:1,ord:1,item:"rain..."

    oniter1=iter

    piiter,pos,item,ord:2

    oniter=iter1

    piiter,pos:1,ord:3,item:"chance..."

    piiter,pos,item,ord:4

    %pos1:ord,positer

    piiter,pos:pos1,item

    oniter1=iter

    piiter,pos:1,ord:1,item:"no rain..."

    piiter,pos,item,ord:2

    %pos1:ord,positer

    piiter,pos:pos1,item

    oniter=inner

    %pos1:ord,posouter

    piiter:outer,pos:pos1,item

    piiter

    1

    2

    34

    Back to our Example Plan

    piiter:inner,item

    item:descendant::ppcp(item)

    piiter1:iter,pos1:1,item1:50

    %pos:itemiter

    item

    oniter1=iter

    =res:(item,item1)

    piiter

    a: Retrieve typed values for node identifiersin column a (atomization).

  • iter1

    piiter,item:"forecast.xml"

    docitem

    item:descendant::day(item)

    %pos:itemiter

    %inner:iter,pos

    piiter:inner,itempiinner,outer:iter,ord:pos

    item:descendant::ppcp(item)

    piiter1:iter,pos1:1,item1:50

    item:attribute::t(item)

    %pos:itemiter%pos:itemiter

    piiter1:iter,pos,item

    item

    piiter1:iter,pos,item

    oniter1=iter

    =res:(item,item1)res

    piiter

    \

    piiter,pos:1,ord:1,item:"rain..."

    oniter1=iter

    piiter,pos,item,ord:2

    oniter=iter1

    piiter,pos:1,ord:3,item:"chance..."

    piiter,pos,item,ord:4

    %pos1:ord,positer

    piiter,pos:pos1,item

    oniter1=iter

    piiter,pos:1,ord:1,item:"no rain..."

    piiter,pos,item,ord:2

    %pos1:ord,positer

    piiter,pos:pos1,item

    oniter=inner

    %pos1:ord,posouter

    piiter:outer,pos:pos1,item

    piiter

    1

    2

    34

    Back to our Example Plan

    Projection path:item/descendant::ppcp path ( ...(q))Cardinality (uses fanout): item:ax::nt(item)(q) = |q|Prax::nt ( )

    Domain inclusion:iter dom ( ...(q)); v Domain size (uses step selectivity): = Pr[ax::nt] ( )

    iter

    iter

    iter

    iteriter1

    piiter:inner,item

    item:descendant::ppcp(item)

    piiter1:iter,pos1:1,item1:50

    %pos:itemiter

    item

    oniter