XQuery Implementation in a Relational Database System
Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian Baras, Brandon Berg, Denis Churin, Eugene Kogan
SQL ServerMicrosoft Corp
VLDB 2005 - Sep 1 S. Pal et al. 2
Overview Background
XML Support in SQL Server 2005 OrdPath labeling of XML nodes XML indexes – PATH, VALUE, PROPERTY
Main topic – XQuery compilation Architecture XML operators Mapping XML operators to relational+ ops
Conclusions
VLDB 2005 - Sep 1 S. Pal et al. 3
Create table DOCS ( ID int primary key,
XDOC xml)
XML stored in an internal, binary form (‘blob’) Optionally typed by a collection of XML schemas
Used for storage and query optimizations 3 of 5 methods on XML data type:
query(): returns XML type value(): returns scalar value exist(): checks conditions on XML nodes
XML indexing More information at http://msdn.microsoft.com/xml
Background XML Support in SQL Server 2005
VLDB 2005 - Sep 1 S. Pal et al. 4
Background XQuery embedded in SQL Retrieve section titles from <book>
wrapped in new <topic> elements:
SELECT ID, XDOC.query(' for $s in /BOOK/SECTION return <topic>
{data($s/TITLE)} </topic>
') FROM DOCS
VLDB 2005 - Sep 1 S. Pal et al. 5
Background XQuery – supported features XQuery clauses “for”, “where”, “return” and “order by” XPath axes – child, descendant, parent, attribute, self
and descendant-or-self Functions – numeric, string, Boolean, nodes, context,
sequences, aggregate, constructor, data accessor SQL Server extension functions to access SQL variable
and column data within XQuery Numeric operators (+, -, *, div, mod) Value comparison operators (eq, ne, lt, gt, le, ge) General comparison operators (=, !=, <, >, <=, >=)
VLDB 2005 - Sep 1 S. Pal et al. 6
Background [SIGMOD04] ORDPATH Label of Nodes
BOOKBOOK11
SectionSection1.31.3
FigureFigure1.3.31.3.3
TitleTitle1.3.11.3.1
SectionSection1.51.5
TitleTitle1.5.11.5.1
FigureFigure1.5.31.5.3
node1 precedes node2 in document order ORDPATH (node1) < ORDPATH (node2)
node1 is ancestor of node2 ORDPATH (node1) is prefix of ORDPATH (node2)
ORDPATH(1.3) ≤ id < Descendant_Limit (1.3) = 1.4
VLDB 2005 - Sep 1 S. Pal et al. 7
Background [VLDB 2004]Indexing XML column Primary XML index on an XML column
Creates B+tree tree on data model content of the XML nodes
Adds column Path_ID for the reversed, encoded path from each XML node to root of XML tree
OrdPath labeling schema is used for XML nodes Relative order of nodes Document hierarchy
VLDB 2005 - Sep 1 S. Pal et al. 8
Background XML exampleINSERT INTO myTable VALUES (7,‘<Book xmlns="myns" ISBN = "1-55860-3612">
<Section><Title>Bad Bugs</Title>
</Section><Section>
<Title> Tree frogs </Title><Figure>…</Figure>
</Section></Book>’)
VLDB 2005 - Sep 1 S. Pal et al. 9
Background Primary XML Index Entries
ID ORDPATH TAG NODETYPE VALUE PATH_ID
7 1 1 (Book) 10 (ns:bT) NULL #1
7 1.1 2 (ISBN) 2 (xs:string) '1-55860-…' #2#1
7 1.3 3 (Section) 11 (ns:sT) NULL #3#1
7 1.3.1 4 (Title) 2 (xs:string) 'Bad Bugs' #4#3#1
7 1.3.3 5 (Figure) 12 (ns:fT) NULL #5#3#1
7 1.5 3 (Section) 11 (ns:sT) NULL #3#1
7 1.5.1 4 (Title) 2 (xs:string) 'Tree frogs' #4#3#1
7 1.5.3 5 (Figure) 12 (ns:fT) NULL #5#3#1
Clustering key- Encoding of tags & types stored in system meta-data - Additional details not shown
VLDB 2005 - Sep 1 S. Pal et al. 10
Background Secondary XML indexes To speed up different classes of commonly
occurring queries
Statistics created on key columns of the primary and secondary XML indexes Used for cost-based selection of secondary XML
indexes
PATH path-based queries PATH_ID, VALUE, ID, ORDPATH
VALUE value-based queries
VALUE, PATH_ID, ID, ORDPATH
PROPERTY Object properties ID, PATH_ID, VALUE, ORDPATH
VLDB 2005 - Sep 1 S. Pal et al. 11
Background Handling Types If XML column is typed
Values are stored in XML blob and XML indexes with appropriate typing
Untyped XML Values are stored as strings Convert to appropriate types for operations
SQL typed values stored in primary XML index Most SQL types are compatible with XQuery
types (integer) Value comparisons on XML index columns suffice Some types (e.g. xs:datetime) are stored in
internal format and processed specially
VLDB 2005 - Sep 1 S. Pal et al. 12
XQuery Processing Architecture XQuery Compiler:
Parses XQuery expr Checks static type
correctness Type annotations Applies static optimiztns
Path collapsing Rewrites using XML
schemas XML Operator Mapper
Recursively traverses XML algebra tree
Converts each XmlOp to reln+ operator sub-tree
Mapping depends upon existence of primary XML index
XQuery expression
XQuery Compiler
XML algebra tree (XmlOp ops)
XML Operator Mapper
Relational Operator Tree (relational+ operators)
Reln Query Processor Reln Query Processor
VLDB 2005 - Sep 1 S. Pal et al. 13
Examples of XML Operators
XmlOp_Select In: list of items, conditionOut: items satisfying condition
XmlOp_Path In: simple paths, no predicatesOpt: path context to collapse paths Out: eligible XML nodes
XmlOp_Apply In: two item lists Out: one item listVariable binding in “for” expression
XmlOp_Construct In: sub-nodes for element construction, otherwise valueOut: constructed node
VLDB 2005 - Sep 1 S. Pal et al. 14
XML Operator Mapping – Overview
1
20
35
XMLPK
XQUERY
1
1
1
1
20
20
20
35
35
PK
REL+ tree
PrimaryXMLIndex
PATH Index
VALUE Index
PROPERTY Index
OrdPath
Special handling forSELECT * | XDOC
VLDB 2005 - Sep 1 S. Pal et al. 15
New operators Some produce N rows from M (≠ N) rows
XML_Reader – streaming, pull-model XML parser XML_Serializer – to serialize query result as XML
Some are for efficiency Contains – to evaluate XQuery contains() TextAdd – to evaluate the XQuery function
string() Data – to evaluate XQuery data() function
Some are for specific needs Check – validate XML during insertion or
modification
VLDB 2005 - Sep 1 S. Pal et al. 16
XML Operator Mapping
Following categories: Mapping of XPath expressions Mapping of XQuery expressions Mapping of XQuery built-in functions
VLDB 2005 - Sep 1 S. Pal et al. 18
Non-indexed XML, Full Path XML_Reader produces
subtrees of <SECTION> Node table rows Contains OrdPath No PK or PATH_ID
XML_Serialize reassembles those row into XML data type To output result
XML operator tree:XML operator tree:
XmlOp_Path PATH = XmlOp_Path PATH = “ “/BOOK/SECTION”/BOOK/SECTION”
Rel+ operator tree:Rel+ operator tree:
XML_SerializeXML_Serialize
XML_Reader (XDOC, XML_Reader (XDOC, “/BOOK/SECTION”)“/BOOK/SECTION”)
VLDB 2005 - Sep 1 S. Pal et al. 20
Sample query execution using Primary XML Index
ID ORDPATH TAG NODETYPE VALUE PATHID
7 1 1 (Book) 10 (ns:bT) NULL #1
7 1.1 2 (ISBN) 2 (xs:string) '1-55860-…' #2#1
7 1.3 3 (Section) 11 (ns:sT) NULL #3#1
7 1.3.1 4 (Title) 2 (xs:string) 'Bad Bugs' #4#3#1
7 1.3.3 5 (Figure) 12 (ns:fT) NULL #5#3#1
7 1.5 3 (Section) 11 (ns:sT) NULL #3#1
7 1.5.1 4 (Title) 2 (xs:string) 'Tree frogs' #4#3#1
7 1.5.3 5 (Figure) 12 (ns:fT) NULL #5#3#1
Clustering key• /Book/Section /Book/Section #3#1 (by #3#1 (by XML Op XML Op Mapper)Mapper)
VLDB 2005 - Sep 1 S. Pal et al. 21
Indexed XML, Full Path XmlOp_Path
mapped to SELECT GET(PXI) – rows
from primary XML index Match PATH_ID
Not shown: JOIN with base table
on PK
XML_SerializeXML_Serialize
ApplyApply
Select ($b)Select ($b)
GETGET(PXI)(PXI)
Path_ID=#SECTION#BOOKPath_ID=#SECTION#BOOK
$b.OrdP $b.OrdP ≤ OrdP< ≤ OrdP< DL($b)DL($b)
GETGET(PXI)(PXI)
SelectSelect
Assemble Assemble SubtreeSubtree
VLDB 2005 - Sep 1 S. Pal et al. 22
XML index – PATH PATH_ID VALUE ID ORDPATH
#1 NULL 7 1
#2#1 '1-55860-…' 7 1.1
#3#1 NULL 7 1.3
#3#1 NULL 7 1.5
#4#3#1 'Bad Bugs' 7 1.3.1
#4#3#1 'Tree frogs' 7 1.5.1
#5#3#1 NULL 7 1.3.3
#5#3#1 NULL 7 1.5.3 Speeds up path evaluations Example – /Book/Section #3#1
VLDB 2005 - Sep 1 S. Pal et al. 23
Indexed XML, Imprecise Paths
/BOOK/SECTION//TITLE Matched using LIKE
operator on Path_ID
ApplyApply
Select ($s)Select ($s)
GETGET(PXI)(PXI)
Path_ID LIKE #TITLEPath_ID LIKE #TITLE%#SECTION#BOOK%#SECTION#BOOK
XML_SerializeXML_Serialize
Assemble Assemble subtree of subtree of <TITLE><TITLE>
VLDB 2005 - Sep 1 S. Pal et al. 24
Path_ID=#@ISPath_ID=#@ISBN#BOOK & BN#BOOK & VALUE=“12” VALUE=“12”
&&Par($b)Par($b)
Predicate Evaluation /BOOK[@ISBN = “12”] Search value compared
with VALUE column in PXI Collapsed path
/BOOK/@ISBN Induce index seeks Reduce intermediate
result size Parent check – Par($b)
Using OrdPath Value conversion might
be needed
XML_SerializeXML_Serialize
ApplyApply
SelectSelect
GETGET(PXI)(PXI)
ApplyApply
Select ($b)Select ($b)
GETGET(PXI)(PXI)
Path_ID=Path_ID=#BOOK#BOOK
Assemble Assemble subtree of subtree of <BOOK><BOOK>
VLDB 2005 - Sep 1 S. Pal et al. 25
Ordinal Predicate /BOOK[n] Adds ranking column to the rows for
<BOOK> elements Retrieves the nth <BOOK> node
Special optimizations [1] TOP 1 ascending [last()] TOP 1 descending Avoids sorting when input is sorted
Example – in XML_Serializer
VLDB 2005 - Sep 1 S. Pal et al. 26
Error handling Static type errors at compilation time
Raises static type errors if an expression could fail at runtime due to type safety violation Addition of string to integer Querying non-existent node name in typed XML Non-singleton in “eq”
Some can be fixed using explicit cast or ordinal specification
Dynamic error converted to empty sequence Yields correct result in predicates without
negations
VLDB 2005 - Sep 1 S. Pal et al. 27
“for” Iterator
Path_ID LIKE #@num#SEC%#BK & VALUE >= 3 & Par($s)
Select
Select ($s)
GET(PXI)
Path_ID LIKE #SECTION%#BOOK
Exists
GET(PXI)
Select
XML_Serialize
Assemble <SECTION>
Path_ID LIKE #TITLE#SECTION%#BOOK & Par($s)
Apply ($s)
Apply
for $s in /BOOK//SECTION where $s/@num >= 3 return $s/TITLE XML op for “for” is
XmlOp_Apply Maps to APPLY Binds $s and iterates
over <SECTION> Determines its <TITLE>
children Nested “for” and “for”
with multiple bindings turn into nested APPLY Each APPLY binds to a
different variable
VLDB 2005 - Sep 1 S. Pal et al. 28
XQuery “order by” and “where” Order by:
Sorts rows based on order-by expression Adds a ranking column to these rows Ranking column converted into OrdPath values
Yield the new order of the rows Fits rest of query processing framework
Where Becomes SELECT on input sequence Filters rows satisfying specified condition
VLDB 2005 - Sep 1 S. Pal et al. 29
XQuery “return”
Return nodes sequence in document order Use OrdPath values and XML_Serialize operator
New element and sequence constructions Merge constructed and existing nodes
into a single sequence (SWITCH_UNION)
VLDB 2005 - Sep 1 S. Pal et al. 30
XQuery Functions & Operators
Built-in fn and op are mapped to relational fn and op if possible fn:count() count()
Additional support for XQuery types, functions and operators that cannot be mapped directly Intrinsics
VLDB 2005 - Sep 1 S. Pal et al. 31
Optimizations
Exploiting Ordered Sets Sorting information (OrdPath) made
available to further relational operators XML_Serialize is an example
Using static type information Eliminates CONVERT() in operations Allows range scan on VALUE index
VLDB 2005 - Sep 1 S. Pal et al. 32
Conclusions Built-up infrastructure for query processing
framework Other XQuery features (such as “let” and
typeswitch) can be implemented Data modification language
Fits into relational query processing framework XQuery features can be implemented using
rel++ operators Optimizations pose the biggest challenges More cost-based optimizations can be done
Enhanced costing model (e.g. choice of PXI) Matching materialized views
VLDB 2005 - Sep 1 S. Pal et al. 33
Thank you!