order-sensitive xml query processing over relational sources: an algebraic approach authors: ling...

30
Order-sensitive XML Query Processing over Relational Sources: An Algebraic Approach Authors: Ling Wang, Song Wang, Brian Murphy and Elke A. Rundensteiner Institute: Database Systems Research Group, Worcester Polytechnic Institute (WPI) IDEAS’2005 IDEAS’2005

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Order-sensitive XML Query Processing over Relational Sources: An Algebraic Approach

Authors: Ling Wang, Song Wang, Brian Murphy and Elke A. Rundensteiner

Institute: Database Systems Research Group, Worcester Polytechnic Institute (WPI)

IDEAS’2005IDEAS’2005

IDEAS’052

Order in XMLOrder in XML

Order is important to XML Document order XML view can be ordered (OrderBy) … User query can be order-sensitive (OrderBy, position(), range()…)

<RESULT> <SONG>

SXE Revenge Shutdown

</SONG></RESULT>

<RESULT>FOR $play in document(“record.xml")/PLAYOrderBy $play/bandRETURN <SONG>

$play[3]/SONG[rang 1 to 2]/text() </SONG></RESULT>

1. Sort PLAY by its band’s name2. Find third PLAY3. Extract its first and second SONG

<RECORDLIST> <PLAY> <BAND> Misfits </BAND>

<SONG> She </SONG> </PLAY> <PLAY> <BAND> Back Street Boy </BAND> <SONG> Bullet </SONG>

<SONG> We Are 138 </SONG> </PLAY> <PLAY> <BAND> Project X </BAND> <SONG> SXE Revenge </SONG> <SONG> Shutdown </SONG> </PLAY></RECORDLIST>

IDEAS’053

Why XML-to-SQL?Why XML-to-SQL?

XML is stored in relational database to … provide reliable persistent storage exploit mature technologies

XML-to-SQL Systems SilkRoute (AT&T), XPERANTO (IBM), RAINBOW (WPI), Rolex

(BellLab), Agora, MARS Oracle XML DB, Microsoft SQL Server 2000 SQLXML, IBM DB2 XML

Extender

IDEAS’054

XML Views support XML view mechanism for XML data publishing support queries (updates) over XML views

XML publishing scenario Relational model is not order-sensitive Order in XML views over RDB has no meaning

XML storage scenario Order is essential !!! Order-preserving loading

– XML document Relational database– Implicit order in XML document explicit order code in RDB

Order-restoring in extraction views– Explicit order code in RDB implicit order in XML view

through view query

XML ViewsXML Views

XML

RDB

XML View

User query

Order encoding

IDEAS’055

Order-specific loadingOrder-specific loading

Order-specific loading: Loading strategies: Inline, edge, … Order encoding methods: Global, local, dewey …

IDEAS’056

ExampleExample

<xs:schema xmlns:xsd="http://www.w3.org/XMLSchema"> <xs:element name="RECORDLIST"> <xs:complexType> <xs:element name="PLAY"

minoccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="BAND" type="xs:string"/> <xs:element name="SONG" type="xs:string"

minoccurs="1"/> </xs:sequence> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>

<RECORDLIST> <PLAY> <BAND> Misfits </BAND>

<SONG> She </SONG> </PLAY> <PLAY> <BAND> Back Street Boy </BAND> <SONG> Bullet </SONG>

<SONG> We Are 138 </SONG> </PLAY> <PLAY> <BAND> Project X </BAND> <SONG> SXE Revenge </SONG> <SONG> Shutdown </SONG> </PLAY></RECORDLIST>

XML schema XML document

IDEAS’057

IID PID POSITION

1 0 1

RECORDLIST

IID PID POSITION BAND_PCDATA

2 1 1 Misfits

3 1 2 Back Street Boy

4 1 3 Project X

PLAY

IID PID POSITION SONG_PCDATA

5 2 1 She

6 3 1 Bullet

7 3 2 We Are 138

8 4 1 SXE Revenge

9 4 2 Shutdown

SONG

Relational Database Inline loading + local order encoding

ExampleExample

<RECORDLIST>FOR $play IN document("dxv.xml")/PLAY/ROWORDER BY $play/POSITION/text()RETURN <PLAY>

<BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY></RECORDLIST>

View query

IDEAS’058

MotivationMotivation

Many loading + Encoding combinations are possible … {inline, edge, …} * {local, global, dewey…}

Hybrid of multiple loading and encoding may occur: Loading:

– Schema is available --- inline– Schema is not available --- edge

Order-encoding– Heavy update workload --- dewey– Query workload --- global

Multiple XML documents are loaded into RDB

Other loading and encoding methods may emerge in future

Conclude: Need general approach for XQuery-to-SQL translation

IDEAS’059

XSOTXSOT

XML-to-SQL Order-sensitive Translation (XSOT):

Step1: Encode XML document with explicit order code(order-exposing)

Step 2: Load XML to relational database (order-preserving)

Step 2: Extract XML view from relational database (order-restoring)

Step 3: Query via XML view with order predicates(order-sensitive)

IDEAS’0510

XQuery Parser

Default XML Schema

Default XML View

Web/Intranet Application User

Sub- System

Process

Data

Query flow

Data flow

Legend

XAT Generator

User XAT

View Composer

XAT Optimizer

View XAT

SQL

XML Result

Ordered Tuple Streams

XML Schema

XML Data

View Query

XML Generator

XAT

RDBMS

SQL GeneratorMapping Manager

XQuery Engine

DB2Oracle SQL Server

Loading XQuery

Schema generation

Data Loading

Order EncodingXQuery

Data Extracting

XML Source Wrapper

Default XML View

Order-Sensitive User Query

Composed XAT

Optimized XAT

Order CodeComparison

Function

Sybase

XSOT FrameworkXSOT Framework

IDEAS’0511

IID PID POSITION

1 0 1

RECORDLIST

IID PID POSITION BAND_PCDATA

2 1 1 Misfits

3 1 2 Back Street Boy

4 1 3 Project X

PLAY

IID PID POSITION SONG_PCDATA

5 2 1 She

6 3 1 Bullet

7 3 2 We Are 138

8 4 1 SXE Revenge

9 4 2 Shutdown

SONG

<RESULT> <SONG>

We are 138Shutdown

</SONG></RESULT>

<RESULT>FOR $record in document(“record.xml")RETURN <SONG>

$record/PLAY/SONG[2]/text() </SONG></RESULT>

Find second song of each play

<RECORDLIST>FOR $play IN document("dxv.xml")/PLAY/ROWORDER BY $play/POSITION/text()RETURN <PLAY>

<BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY></RECORDLIST>

View queryRelational Database Inline loading + local order encoding

Running ExampleRunning Example

IDEAS’0512

Order-sensitive XML Algebra TreeOrder-sensitive XML Algebra Tree

XSOT methodology: An algebraic approach

XML Algebra Tree (XAT) XAT operators

– Select, CartesianProduct, ThetaJoin, LeftOuterJoin, Distinct, GroupBy, OrderBy

– Source, Navigate, Combine, Tagger

XAT Order Extension– Position()

– Range()

Composition of the view and user XAT

IDEAS’0513

View Query XATView Query XAT

<RECORDLIST>FOR $play IN document("dxv.xml")/PLAY/ROWORDER BY $play/POSITION/text()RETURN <PLAY>

<BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY></RECORDLIST>

Combine $dataPlayTag

Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag

GroupBy $play

Combine $dataSongTag

Navigate $song, SONG_PCDATA/text()$sData

Tagger <SONG>$sData</SONG>$dataSongTag

Navigate $song, POSITION/text()$sPos OrderBy $sPos

GroupBy $play

Source “dxv.xml” $S

Navigate $S, SONG/ROW$song

Navigate $song, PID/text()$sPID

ThetaJoin $pIID=$sPID

Source “dxv.xml” $P

Navigate $P, PLAY/ROW$play

Navigate $play, POSITION/text()$pPos

OrderBy $pPos

Tagger <RECORDLIST> $dataPlayTag</ RECORDLIST >$record

Navigate $play, IID/text()$pIID

1

2

3

4

5

6

7

8

9

1110

12

13

14 15

16

17

18

19

IDEAS’0514

User Query XATUser Query XAT

<RESULT>FOR $record in document(“record.xml")RETURN <SONG>

$record/PLAY/SONG[2]/text() </SONG></RESULT>

GroupBy $record, $uPlay

Combine $uDataSongTag

Tagger <RESULT> $uDataSongTag</RESULT>$result

Tagger <SONG> $uSongData</SONG>$uDataSongTag

Navigate $uRecord, PLAY$uPlay

Navigate $uPlay, SONG$uSong

Navigate $uSong, text()$uSongData

Select $uNumPos=2

Source “record.xml” $P

20

21

22

23

24

25

26

27

28

POS $uSong $uNumPos

IDEAS’0515

GroupBy $record, $uPlay

Combine $uDataSongTag

Tagger <RESULT> $uDataSongTag</RESULT>$result

Tagger <SONG> $uSongData</SONG>$uDataSongTag

Navigate $uRecord, PLAY$uPlay

Navigate $uPlay, SONG$uSong

Navigate $uSong, text()$uSongData

Select $uNumPos=2

Source “record.xml” $P

20

21

22

23

24

25

26

27

28

POS $uSong $uNumPos

User XAT $P=$record

Composed XATComposed XAT

Combine $dataPlayTag

Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag

GroupBy $play

Combine $dataSongTag

Navigate $song, SONG_PCDATA/text()$sData

Tagger <SONG>$sData</SONG>$dataSongTag

Navigate $song, POSITION/text()$sPos OrderBy $sPos

GroupBy $play

Source “dxv.xml” $S

Navigate $S, SONG/ROW$song

Navigate $song, PID/text()$sPID

ThetaJoin $pIID=$sPID

Source “dxv.xml” $P

Navigate $P, PLAY/ROW$play

Navigate $play, POSITION/text()$pPos

OrderBy $pPos

Tagger <RECORDLIST> $dataPlayTag</ RECORDLIST >$record

Navigate $play, IID/text()$pIID

1

2

3

4

5

6

7

8

9

1110

12

13

14 15

16

17

18

19

View XAT

IDEAS’0516

XAT Optimization – Order ExplicitXAT Optimization – Order Explicit

Why? Order in user XAT depends on the implicit order in the view It blocks further optimization: Computation push down

IDEAS’0517

XAT Optimization – Order ExplicitXAT Optimization – Order Explicit

Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag

GroupBy $play

Combine $dataSongTagTagger <SONG>$sData</SONG>

$dataSongTag

View XAT

construct SONG

construct PLAY

GroupBy $record, $uPlay

Select $uNumPos=2

POS $uSong $uNumPos

For each PLAY

Sort SONGs

Pick second song

User XAT

Depend onCannot push down! Cannot translated into SQL!

IDEAS’0518

XAT Optimization – Order ExplicitXAT Optimization – Order Explicit

Goal: Convert user query order FROM: implicit order in the XML view TO: Explicit order-code column in relational encoding

POS$uSong = POS$sPos

POS $uSong $uNumPos

GroupBy $record, $uPlay22

23

View Portion XAT

Navigate $song, POSITION/text()$sPos OrderBy $sPos

GroupBy $play

10 11

12

POS $sPos $uNumPos

GroupBy $play22

23

User Portion XAT

View Portion XAT

Navigate $song, POSITION/text()$sPos OrderBy $sPos

GroupBy $play

10 11

12

User Portion XAT

IDEAS’0519

SQL-oriented XAT optimizationSQL-oriented XAT optimization

Goal: Optimize XAT for efficient order-sensitive SQL generation

Rules: Computation push-down

– Push as much as possible to RDB

Order pull-up – Sort as late as possible

– Avoid re-sorting !!!

Order-step rewrite – Match RDB order template

IDEAS’0520

Optimized XATOptimized XAT

Navigate $song, POSITION/text()$sPos

OrderBy $sPos

GroupBy $play

Source “dxv.xml” $S

Navigate $S, SONG/ROW$song

Navigate $song, PID/text()$sPID

ThetaJoin $pIID=$sPID

Source “dxv.xml” $P

Navigate $P, PLAY/ROW$play

Navigate $play, POSITION/text()$pPos

Navigate $play, IID/text()$pIID

1

2

3

5

6

7

8

9 11

10

12

OrderBy $sPos, $pPos4

GroupBy $pPos

Combine $uDataSongTag

Tagger <RESULT> $uDataSongTag</RESULT>$result

Tagger <SONG> $sData</SONG>$uDataSongTag

Select $uNumPos=2

22

23

24

26

27

28

POS $sPos $uNumPos

Navigate $song, SONG_PCDATA/text()$sData

13Computation push down

Order pull up

OrderStep rewriteOrderStep [$pPos], [$pPos, $sPos]

$uNumPos

IDEAS’0521

Navigate $song, POSITION/text()$sPos

OrderStep [$pPos], [$pPos, $sPos] $uNumPos

Source “dxv.xml” $S

Navigate $S, SONG/ROW$song

Navigate $song, PID/text()$sPID

ThetaJoin $pIID=$sPID

Source “dxv.xml” $P

Navigate $P, PLAY/ROW$play

Navigate $play, POSITION/text()$pPos

Navigate $play, IID/text()$pIID

1

2

3

5

6

7

8

9

10

29

OrderBy $sPos, $pPos4

Combine $uDataSongTag

Tagger <RESULT> $uDataSongTag</RESULT>$result

Tagger <SONG> $sData</SONG>$uDataSongTag

Select $uNumPos=224

26

27

28

Navigate $song, SONG_PCDATA/text()$sData

13

Optimized XATOptimized XAT

IDEAS’0522

TEMPLATE:SELECT row_number() over(<PARTITION>?<ORDERBY>) $pos_func_bindingFROM <TABLE>+

PARTITION: partition by <ELEMENT>

ORDERBY:order by <TONUMBER>|<ELEMENT>

TONUMBER: to_number(<ELEMENT>)

ELEMENT:element name

TABLE:table name | TEMPLATE

Order TemplateOrder Template

SQL-99 standard Oracle, DB2 …

Order Template

IDEAS’0523

Order-sensitive SQL generationOrder-sensitive SQL generation

About push-down strategies In general ----

push as much computation as possible into relational engine.

In order scenario --- tradeoff Deep push:

– Push OrderStep into Relational Engine

– Relational engine has to support order template (SQL99)

Q5 = SELECT Q2.sData FROM

(SELECT Q1.pPos, Q1.sPos, Q1.sData,row_number() OVER (PARTITION BY Q1.pPos ORDER BY Q1.sPos) uNumPos

FROM(SELECT P.POSITION AS pPos, S.POSITION AS sPos,

S.SONG_PCDATA AS sData FROM PLAY P, SONG SWHERE P.IID = S.PID) Q1

) Q2WHERE Q2.uNumPos = 2ORDER BY Q2.pPos, Q2.sPos

SQL Q5$sData

32

Combine $uDataSongTag

Tagger <RESULT> $uDataSongTag</RESULT>$result

Tagger <SONG> $sData</SONG>$uDataSongTag

26

27

28

IDEAS’0524

Order-sensitive SQL generationOrder-sensitive SQL generation

Shallow push (otherwise)– leave OrderStep outside RDB

– No requirement for Relational engine

for supporting order template (SQL99)

SELECT P.POSITION AS pPos, S.POSITION AS sPos, S.SONG_PCDATA AS sData

FROM PLAY P, SONG SWHERE P.IID = S.PID

OrderStep [$pPos], [$pPos, $sPos] $uNumPos

29

OrderBy $sPos, $pPos4

Combine $uDataSongTag

Tagger <RESULT> $uDataSongTag</RESULT>$result

Tagger <SONG> $sData</SONG>$uDataSongTag

Select $uNumPos=224

26

27

28

SQL Q1$sData

IDEAS’0525

Deep Push vs. Shallow PushDeep Push vs. Shallow Push

Low selectivitysimilar

High selectivity Shallow push is better Repeated sorting in deep

push is expensive!

IDEAS’0526

Experimental StudyExperimental Study

SQL Execution time --- Global vs. Local order encoding

IDEAS’0527

Discussion: Further SQL optimizationDiscussion: Further SQL optimization

General SQL optimization can be applied… Cost-based SQL translation (SilkRoute) Any other SQL optimization…

When order encoding is assumed… SQL statements can be simplified by avoiding re-ordering

When relational database schema is aware … Schema specific SQL optimization [KKN2002]

IDEAS’0528

Related WorkRelated Work

XQuery-to-SQL translation systems: XPERANTO, SilkRoute, …

[TVB2002] I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and Querying Ordered XML Using a Relational Database System. In SIGMOD, 2002. Three order encoding methods are utilized Algorithms of translating ordered XPath expressions into SQL

But …

[KKN2002] R. Krishnamurthy, R. Kaushik, and J. F. Naughton. Optimizing Fixed-Schema XML to SQL Query Translation. In VLDB, 2002.

IDEAS’0529

ConclusionConclusion

Propose a general framework for order-sensitive XQuery-to-SQL translation (XSOT)

Propose order-sensitive XML algebra Tree (XAT)

SQL-oriented order-sensitive XAT optimization

Efficient order SQL statements generation and optimization techniques

Implementation using Rainbow query engine

Experiments to verify the generality and SQL performance

IDEAS’0530

Rainbow XML Management SystemRainbow XML Management System

Rainbow website:

http://davis.wpi.edu/dsrg/rainbow/index.html

Software downloadhttp://davis.wpi.edu/dsrg/rainbow/RainbowCore/release.htm

Thank you!