containment of nested xml queries xin (luna) dong, alon halevy, igor tatarinov university of...

47
Containment of Nested XML Queries Xin (Luna) Dong, Alon Halevy, Igor Tatarinov University of Washington

Upload: skyla-comfort

Post on 15-Dec-2015

225 views

Category:

Documents


5 download

TRANSCRIPT

Containment of Nested XML Queries

Xin (Luna) Dong, Alon Halevy, Igor Tatarinov

University of Washington

Query Containment

The most fundamental relationship between a pair of queries

Query Q is contained in Q’ if:For any database D,Q(D) is a subset of Q’(D)

Applications of Query Containment Semantic caching Reasoning about contents of data

sources in data integration Verification of integrity constraints Verification of knowledge bases Determining queries independent of

updates Query answering using views

Query Processing in PDMS XML Query Containment in Peer Data

Management System (PDMS)

Answering queries using views to extract remote data

Removing redundant queries to enhance performance[Tatarinov and Halevy, SIGMOD 2004]

MWS

MPW

MSB

MBW

QWQW

UW Stanford

Berkeley UPenn

QW

QP QB1

QB2

QS

QB1

QS

QB2 QB1

Query Containment: Relational v.s. XML

Relational

Input D Sets of tuples

Output Q(D) A set of tuples

Instance containment

Q(D) Q’(D)– Subset

Query containment

Q Q’– for every

input D, Q(D) Q’(D)

Query Containment: Relational v.s. XML

Relational XML

Input D Sets of tuplesAn XML instance

tree

Output Q(D) A set of tuplesAn XML instance

tree

Instance containment

Q(D) Q’(D)– Subset

Q(D) Q’(D)– Tree

homomorphism

Query containment

Q Q’– for every

input D, Q(D) Q’(D)

Q Q’– for every input

D, Q(D) Q’(D)

Example – An XML Instance

D:

<project>

<member>Alice</member>

</project>

<project>

<member>Bob</member>

</project>

project project

member member

Alice Bob

Example – An XML QueryQ:for $x in /project return<group>{

for $y in $x/member return <name>{

where $y=“Alice”return <Alice/>

where $y=“Bob”return <Bob/>

}</name>}</group>

D:

Q(D):

group

name

group

name

Alice Bob

project project

member member

Alice Bob

Example – Another XML Query

Q’:for $x in /project return<group>{

for $y in /project/member return <name>{

where $y=“Alice”return <Alice/>

where $y=“Bob”return <Bob/>

}</name>}</group>

D:

Q’(D):

name

group

name

Alice Bob

project project

member member

Alice Bob

Q’(D):Q(D):

X

Example – Tree Homomorphism and Query Containment

Q (D) Q’(D)

Q’(D) Q (D)

name

group

name

Alice Bob

group

name

group

name

Alice Bob

Q’(D):Q(D):

name

group

name

Alice Bob

group

name

group

name

Alice Bob

Query Containment Problem

From answer containment to query containment

Our problemsGiven queries Q and Q’, decide whether

Q Q’The complexity of query containment

Q’(D) Q (D) Q’ Q

Q (D) Q’(D)

Q Q’

Previous Work (I)

Relational query containment Conjunctive queries [Chandra and Merlin, STOC

1977] Acyclic queries [Yannakakis, VLDB 1981] Queries with union [Sagiv and Yannakakis, JACM

1980] Queries with negation [Levy and Sagiv, VLDB 1993] Queries with arithmetic comparisons [Klug, JACM

1988] Recursive queries

[Shmueli, 1993], [Chaudhuri and Vardi, 1992] Queries over bags [Ioannidis and Ramakrishnan,

1995]

Previous Work (II)

XML query containment – two new challenges XPath containment

With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables

[Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions

[Florescu, Levy and Suciu, PODS 1998] Nested query containment

Containment Cannot be Determined Solely by Comparing XPath Components

Q: for $g in /group where $g/gname/text() = “database”return<area>{

for $p in $g/person return <person> <name>{$p/text()}</name>{for $q in $g/paper where $q/author/text() = $p/text() return

<paper>{$q/title/text()}</paper>}</person>

}</area>

Q’: for $g in /group return<area>{

for $p in $g/person return <person> <name>{$p/text()}</name> <group>{$g/gname/text()}</group>{for $q in $g/paper where $q/author/text() = $p/text() return

<paper>{$q/title/text()}</paper>}</person>

}</area>

Previous Work (II)

XML query containment – two new challenges XPath containment

With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables

[Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions

[Florescu, Levy and Suciu, PODS 1998] Nested query containment

Complex object query containment [Levy and Suciu, PODS 1997]Containment of nested XML queries Containment of nested XML queries

has has notnot been fully studied been fully studied

Our Focus: Nested XML Queries Returned tag constants Conjunctive – no two sibling query blocks

return the same tag XPath:

HAVE Child axis (/) Wildcards (*) Branches ([…])

NOT HAVE descendant // Arithmetic comparison Union

Here, XPath containment is in Here, XPath containment is in PTIMEPTIME

Complexity Result (I)

Depth

Fanout

Fixed Arbitrary

= 1 PTIME PTIME

ArbitrarycoNP

complete

In coNEXPTIM

E

Complexity Result (II)

Query

Type

No tag variab

les

With tag

variables

With unions

Withneg

With//

Witheuiq-join on

tags

With arith comp

Un-neste

d

PTIME

PTIME

coNP complet

e

coNP comple

te

coNP complet

e

NP comple

te

2P

complete

Fan-out=1

PTIME

Fixed- depth

coNP complet

e

General

in coNEXPTIME

Complexity Result (II)

Query

Type

No tag variab

les

With tag

variables

With unions

Withneg

With//

Witheuiq-join on

tags

With arith comp

Un-neste

d

PTIME

PTIME

coNP complet

e

coNP comple

te

coNP complet

e

NP comple

te

2P

complete

Fan-out=1

PTIME

PTIME

Fixed- depth

coNP complet

e

coNP complet

e

General

in coNEXPTIME

in coNEXPTIME

Complexity Result (II)

Query

Type

No tag variab

les

With tag

variables

With unions

Withneg

With//

Witheuiq-join on

tags

With arith comp

Un-neste

d

PTIME

PTIME

coNP complet

e

coNP comple

te

coNP complet

e

NP comple

te

2P

complete

Fan-out=1

PTIME

PTIME

coNP complet

e

coNP comple

te

coNP complet

e

NP comple

te

2P

complete

Fixed- depth

coNP complet

e

coNP complet

e

coNP complet

e

coNP comple

te

coNP complet

e

2P

complete

2P

complete

General

in coNEXPTIME

Roadmap

Introduction and problem definition Containment of a subset of XML queries

Query containment is decidable

Query containment in practice Relaxing the assumptions Conclusions

DepthFanout

Fixed Arbitrary

= 1 PTIME PTIME

Arbitrary coNP complete

In coNEXPTIME

Deciding Q Q’?

How to find a property for an infinite number of input XML instances

Standard technique Find a finite set of input representatives – Canonical

Databases Relational query: each canonical database is a

minimal input to generate the answer template XML query answers have infinite number of shapes

Find a finite set of answer templates – Canonical Answers

Answer Shapes Determined by the Head Tree

Q’:

for $x in /project return

<group>{

for $y in /project/member return

<name>{where $y=“Alice”

return <Alice/>

where $y=“Bob”

return <Bob/>

}</name>

}</group>

Alice

Bob

Head Tree:

group

namegroup

name

group

group

Alice

name

group

name

Bob

group

Alice

name

Bob

Head Tree:

An Additional Candidate Answer

name

group

name

Alice Bob

group

name

group

group

Alice

name

group

name

Bob

group

Alice

name

Bob

Head Tree:

Why Consider the Additional Case

name

group

name

Alice Bob

project project

member member

Alice Bob

Q(D):

group

name

group

name

Alice Bob

Q’(D):

D:

What can Serve as Canonical Answers?

Prefix subtrees of the head tree? – necessary but not sufficient

Trees contained in the head tree? – necessary and sufficient– but, too many and too complex

A Head Tree can Have Many Trees Contained in it

group

name name

Alice BobAlice

group

name name

Alice BobAliceBob

name

group group

Alice BobAliceBob

group

name name name

group

Alice

name

Bob

Head Tree:

What can Serve as Canonical Answers? Prefix subtrees of the head tree?

– necessary but not sufficient Trees contained in the head tree?

– necessary and sufficient– but, too many and too complex

Our solution: consider only minimal trees that are contained in the head tree

Canonical Answer A minimal XML instance: No two sibling

subtrees where one is contained in the other Canonical Answer : A minimal XML instance

contained in the head tree

Every answer A of query Q corresponds to a unique canonical answer CA, s.t. A CA, CA A

group

name name

Alice BobAlice

group

Alice

name

Bob

group

name name

Alice Bob

Canonical Database Canonical Database: DBCA

The minimal XML instance to generate CA

project

member

project

member

Alice Bob

project

group

name name

Alice Bob

CA:

DB:

for $x in /project return

<group>{

for $y in /project/member return

<name>{

where $y=“Alice”

return <Alice/>

where $y=“Bob”

return <Bob/>

}</name>

}</group>

Sound and Complete Conditions for Nested Query ContainmentTheorem 1. Q Q’, if and only if for

every canonical database DB of Q, Q(DB) Q’(DB)

Theorem 2. Q Q’, if and only if for every canonical answer CA of Q,

CA is a canonical answer of Q’ DB’CA DBCA

Query Containment Algorithm Algorithm:

for every canonical answer CA of Q do

1. check whether CA is a canonical answer of Q’

2. generate DBCA and DB’CA

3. check DB’CA DBCA

Roadmap

Introduction and problem definition Containment of a subset of XML queries

Query containment is decidable

Query containment in practice Relaxing the assumptions Conclusions

DepthFanout

Fixed Arbitrary

= 1 ? ?

Arbitrary ? ?

Query Containment Algorithm Algorithm:

for every canonical answer CA of Q do

1. check whether CA is a canonical answer of Q’

2. generate DBCA and DB’CA

3. check DB’CA DBCA

Polynomial in the size and number of canonical answers What are the sizes of canonical answers? What is the number of canonical answers?

Containment of XML Queries with Fanout 1 E.g. d=3 – the depth; m=1 – the maximum fanout

Canonical Answers and Complexity Number: the depth of the query Size: bounded by the depth of the query Complexity: O( d·|Q|·|Q’|)

Theorem: Testing containment of XML Queries with fanout 1 is in PTIME

for $x in /project return

<group>{for $y in /project/member return

<name>{where $y =“Alice” return <Alice/>

}</name>

}</group>

group

Alice

name

group

name

group

Nesting with fanout 1 does not Nesting with fanout 1 does not increase complexityincrease complexity

Roadmap

Introduction and problem definition Containment of a subset of XML queries

Query containment is decidable

Query containment in practice Relaxing the assumptions Conclusions

DepthFanout

Fixed Arbitrary

= 1 PTIME PTIME

Arbitrary ? ?

Containment of XML Queries with Arbitrary Fanout E.g. d=4 – the depth; m=3 – the maximum fanout

Canonical Answers Complexity Number:

Size:

Theorem: Testing containment of XML Queries with depth 2 and arbitrary fanout is coNP-hard

1 2 3 1 2 2 33 1 1 2 2 3 2 33 1 3 11 21 2 2 31 2 3

d-1

d-2

d-1

Roadmap

Introduction and problem definition Containment of a subset of XML queries

Query containment is decidable

NOT

TIGHT

Query containment in practice Relaxing the assumptions Conclusions

DepthFanout

Fixed Arbitrary

= 1 PTIME PTIME

Arbitrary coNP hard coNP hard

Effect of the Depth on Containment of XML Queries Insight: Kernel Canonical Answer

The root node has a single child In any subtree, a path pattern is repeated no more than

cd times.d – query depthc – #(maximum path steps in a query block)

The size of kernel canonical answers Polynomial in the query size Exponential in the query depth

Theorem: Testing containment of XML queries with fixed depth is

coNP-complete Testing containment of XML queries with arbitrary

depth is in coNEXPTIME

Roadmap

Introduction and problem definition Containment of a subset of XML queries

Query containment is decidable

Query containment in practice Relaxing the assumptions Conclusions

DepthFanout

Fixed Arbitrary

= 1 PTIME PTIME

Arbitrary coNP complete

In coNEXPTIME

Containment Checking in Practice

Q: for $g in /group where $g/gname/text() = “database”return<area>{

for $p in $g/person return <person> <name>{$p/text()}</name>{for $q in $g/paper where $q/author/text() = $p/text() return

<paper>{$q/title/text()}</paper>}</person>

}</area>

Q’: for $g in /group return<area>{

for $p in $g/person return <person> <name>{$p/text()}</name> <group>{$g/gname/text()}</group>{for $q in $g/paper where $q/author/text() = $p/text() return

<paper>{$q/title/text()}</paper>}</person>

}</area>

Analyze element cardinality to reduce the number of canonical answers for containment checking

#canonical answers – originally : 71 after

analysis : 2

Roadmap

Introduction and problem definition Containment of a subset of XML queries

Query containment is decidable

Query containment in practice Relaxing the assumptions Conclusions

DepthFanout

Fixed Arbitrary

= 1 PTIME PTIME

Arbitrary coNP complete

In coNEXPTIME

An Example Query that Returns Tag Variables

for $x in dbGrp return<result>{

for $y in $x/proj return <group>{

for $u in $y/member return <name> $u/text() </name>for $v in $y/paper return <pub> $v/text() </pub>

}</group>}</result>

Deciding Query Containment Leverage previous results –

simulation mapping [Levy and Suciu, PODS’97]

Check query simulation mapping for every canonical answer

Complexity Simulation mapping can be checked in

polynomial time in terms of query size Complexity of checking containment

does not arise

Other Extensions

Query

Type

No tag variab

les

With tag

variables

With unions

Withneg

With//

Witheuiq-join on

tags

With arith comp

Un-neste

d

PTIME

PTIME

coNP complet

e

coNP comple

te

coNP complet

e

NP comple

te

2P

complete

Fan-out=1

PTIME

PTIME

coNP complet

e

coNP comple

te

coNP complet

e

NP comple

te

2P

complete

Fixed- depth

coNP complet

e

coNP complet

e

coNP complet

e

coNP comple

te

coNP complet

e

2P

complete

2P

complete

General

in coNEXPTIME

Conclusions

Contributions A sound and complete condition for

containment of nested XML queries Detailed complexity analysis

Future work Fill in the open gap of complexity in case of

queries with arbitrary fanout and arbitrary nesting depth

Evaluate and optimize the containment algorithm with element cardinality analysis

Answering nested XML queries using views

Containment of Nested XML Queries

@VLDB 2004Xin (Luna) Dong, Alon Halevy, Igor

TatarinovUniversity of Washington

www.cs.washington.edu/homes/lunadong