monetdb/xquery: using a relational dbms for xml peter boncz cwi the netherlands
TRANSCRIPT
![Page 1: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/1.jpg)
MonetDB/XQuery:
Using a Relational DBMS for XML
Peter BonczCWI
The Netherlands
![Page 2: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/2.jpg)
Outline
• Basic XML / XQuery• Introduction of Pathfinder and MonetDB projects• Relational XQuery
– XPath steps in the pre/post plane– Translating for-loops, and beyond
• Optimizations– Order prevention– Loop-Lifted Staircase join – Join recognition
• Outlook– Conclusions
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 3: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/3.jpg)
Outline
• Basic XML / XQuery• Introduction of Pathfinder and MonetDB projects• Relational XQuery
– XPath steps in the pre/post plane– Translating for-loops, and beyond
• Optimizations– Order prevention– Loop-Lifted Staircase join – Join recognition
• Outlook– Conclusions
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 4: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/4.jpg)
XML
• Standard, flexible syntax for data exchange
– Regular, structured data
Database content of all kinds: Inventory, billing, orders, …
“Small” typed values
– Irregular, unstructured text
Documents of all kinds: Transcripts, books, legal briefs, …
“Large” untyped values
• Lingua franca of B2B Applications…
– Increase access to products & services
– Integrate disparate data sources
– Automate business processes
• … and numerous other application domains
– Bio-informatics, library science, …
![Page 5: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/5.jpg)
XML : A First Look
• XML document describing catalog of books
<?xml version="1.0" encoding="ISO-8859-1" ?><catalog> <book isbn="ISBN 1565114302"> <title>No Such Thing as a Bad Day</title> <author>Hamilton Jordan</author> <publisher>Longstreet Press, Inc.</publisher> <price currency="USD">17.60</price> <review> <reviewer>Publisher</reviewer>: This book is the moving
account of one man's successful battles against three cancers ... <title>No Such Thing as a Bad Day</title> is warmly recommended.
</review> </book>
<!-- more books and specifications -->
</catalog>
![Page 6: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/6.jpg)
XQuery 1.0
• Functional, strongly-typed query language• XQuery 1.0 =
XPath 2.0 for navigation, selection, extraction
+ A few more expressions For-Let-Where-Order By-Return (FLWOR)
XML construction
Operators on types
+ User-defined functions & modules
+ Strong typing
![Page 7: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/7.jpg)
XSLT vs. XQuery
• XSLT 1.0: XML XML, HTML, Text– Loosely-typed scripting language– Format XML in HTML for display in browser– Must be highly tolerant of variability/errors in data
• XQuery 1.0: XML XML– Strongly-typed query language– Large-scale database access– Must guarantee safety/correctness of operations on data
• Over time, XSLT & XQuery may both serve needs of many application domains
• XQuery will become a hidden, commodity language
![Page 8: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/8.jpg)
Navigation, Selection, Extraction
• Titles of all books published by Longstreet Press
$cat/catalog/book[publisher=“Longstreet Press”]/title <title>No Such Thing As A Bad Day</title>
• Publications with Jerome Simeon as author or editor • $cat//*[(author|editor) = “Jerome Simeon”]
<book><title>XQuery from the Experts</title>…</book>
<spec><title>XQuery Formal Semantics</title>…</spec>
![Page 9: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/9.jpg)
Transformation & Construction
• First author & title of books published by A/W
for $b in $cat//book[publisher = “Addison Wesley”] return <awbook> { $b/author[1], $b/title } </awbook> <awbook> <author>Don Chamberlin</author> <title>XQuery from the Experts</title>
</awbook>
![Page 10: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/10.jpg)
Sequences & Iteration
• Sequence constructorReturn all books followed by all W3C specifications($cat/catalog/book, $cat/catalog/W3Cspec)
• XPath ExpressionReturn all books & W3C specifications in doc order$cat/catalog/(book|W3Cspec)
• For Expression– Similar to map : apply function to each item in sequence
Return number of authors in each bookfor $b in $cat/catalog/book return fn:count($b/authors)
=> (3,1,2,…)
![Page 11: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/11.jpg)
Conditional & Quantified
• Conditionalif //show[year >= 2000] then “A-OK!” else “Error!”
• Existential quantification
– Implicit meaning of predicate expressions
//show[year >= 2000]
– Explicit expression:
//show[some $y in ./year satisfies $y >= 2000]
• Universal quantification //show[every $y in year satisfies $y >= 2000]
![Page 12: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/12.jpg)
Putting It Together
• For each author, return number of books and receipts books published in past 2 years, ordered by name
let $cat := fn:doc(“www.bn.com/catalog.xml“), Join $sales := fn:doc(“www.publishersweekly.com/sales.xml“)
for $author in distinct-values($cat//author) Groupinglet $books := $cat//book[@year >= 2000 and author = $a], S.J.
$receipts := $sales/book[@isbn = $books/@isbn]/receipts
order by $author Orderingreturn
<sales> XML Construction { $author }
<count> { fn:count($books) } </count> Aggregation <total> { fn:sum($receipts) } </total></sales>
![Page 13: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/13.jpg)
Recursive Processing
• Recursive functions support recursive data <part id=“001”> <partCt count=“2” id=“001”>
<part id=“002”> <partCt count=“1” id=“002”/>
<part id=“003”/> => <partCt count=“0” id=“003”/> </part> </partCt>
<part id=“004”/> <partCt count=“0” id=“004”/>
</part> </partCt>
declare function partCount($p as element(part))
as element(partCt) {
<partCt count=“{ count($p/part) }”>
{ $p1/@id, for $p2 in $p/part return partCount($p2) }
</partCt>
}
![Page 14: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/14.jpg)
XML Schema Languages
• Many variants…– DTDs, XML Schema, RELAX-N/G, XDuce
• … with similar goals to define– Types of literal (terminal) data– Names of elements & attribute
• XQuery designed to support (all of) XML Schema– Structural & name constraints over types– Regular tree expressions over elements, attributes, atomic types
![Page 15: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/15.jpg)
TeXQuery : Full-text extensions
• Text search & querying of structured content
• Limited support in XQuery 1.0
– String operators with collation sequences
$cat//book[contains(review/text(), “two thumbs up”)]
• Stop words, proximity searching, ranking
Ex: “Tony Blair” within two words of “George Bush”
• Phrases that span tags and annotations
Ex: Match “Mr. English sponsored the bill” in <sponsor> Mr. English </sponsor> <footnote> for himself and <co-
sponsor> Mr.Coyne </co-sponsor> </footnote> sponsored the bill in the <committee-name> Committee for Financial Services </committee-name>
![Page 16: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/16.jpg)
Outline
• Basic XML / XQuery• Introduction of Pathfinder and MonetDB projects• Relational XQuery
– XPath steps in the pre/post plane– Translating for-loops, and beyond
• Optimizations– Order prevention– Loop-Lifted Staircase join – Join recognition
• Outlook– Conclusions
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 17: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/17.jpg)
Outline
• Basic XML / XQuery• Introduction of Pathfinder and MonetDB projects• Relational XQuery
– XPath steps in the pre/post plane– Translating for-loops, and beyond
• Optimizations– Order prevention– Loop-Lifted Staircase join – Join recognition
• Outlook– Conclusions
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 18: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/18.jpg)
XQuery Systems: 2 Approaches
• Tree-based– Tree is basic data structure
• Also on disk (if an XQuery DBMS)– Navigational Approach
• Galax [Simeon..], Flux [Koch..], X-Hive– Tree Algebra Approach
• TIMBER [Jagadish..]
• Relational– Data shredded in relational tables– XQuery translated into database query (e.g. SQL)
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 19: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/19.jpg)
The Pathfinder Project
• Challenge / Goal:– Turn RDBMSs into efficient XQuery engines
• People:– Maurice van Keulen
• University of Twente
– Torsten Grust, Jens Teubner• University of Konstanz
– Jan Rittinger• University of Konstanz & CWI
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 20: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/20.jpg)
The Pathfinder Project
• Challenge / Goal:– Turn RDBMSs into efficient XQuery engines
• People:– Maurice van Keulen
• University of Twente
– Torsten Grust, Jens Teubner• University of Konstanz
– Jan Rittinger• University of Konstanz & CWI
• Task: generate code for MonetDB
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 21: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/21.jpg)
MonetDB: Applied CS Research at CWI
• a decade of “query-intensive” application experience
• image retrieval: Peter Bosch ImageSpotter
• audio/video retrieval: Alex van Ballegooij RAM
• XML text retrieval: de Vries / Hiemstra TIJAH
• biological sequences: Arno Siebes BRICKS
• XML databases: Albrecht Schmidt XMark
Grust / vKeulen Pathfinder
• GIS: Wilco Quak MAGNUM
• data warehousing / OLAP / data mining
SPSS DataDistilleries
Univ. Massachussetts PROXIMITY
CWI research group successfully spun off DataDistilleries (now SPSS)
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 22: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/22.jpg)
MIL (Query Algebra)
Pathfinder — MonetDB
Pathfinder
MonetDB
Parser
Sem. Analysis
Core Translation
Typechecking
Relational Algebra
Database
SQL
Core to MILTranslation
Parser
Sem. Analysis
Core Translation
Typechecking
Database
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 23: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/23.jpg)
Open Source
• MonetDB + Pathfinder on Sourceforge– Mozilla License
• Project Homepage– http://monetdb.cwi.nl
• Developers website:– http://sf.net/projects/monetdb
RoadMap• 14-apr-04: initial Beta release MonetDB/SQL• 30-sep-04: first official release MonetDB/SQL• 30-may-05: beta release of MonetDB/XQuery (i.e. Pathfinder)
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 24: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/24.jpg)
MonetDB
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 25: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/25.jpg)
MonetDB Particulars
• Column wise fragmentation– BAT: Binary Association Tables [oid,X]– Don’t touch what you don’t need
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 26: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/26.jpg)
Binary Association Tables (BATs)
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 27: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/27.jpg)
BAT storage as thin arrays
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 28: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/28.jpg)
MonetDB Particulars
• Column wise fragmentation– BAT: Binary Association Tables [oid,X]– Don’t touch what you don’t need
• Void (virtual-oid) columns– Contain dense sequence 0,1,2,3,4,…– Require no space– Positional access (nice for XPath skipping)
• pre = void
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 29: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/29.jpg)
DBMS Architecture
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 30: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/30.jpg)
Monet: DBMS Microkernel
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 31: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/31.jpg)
MonetDB: extensible architecture
Front-end/back-end:
• support multiple data models
• support multiple end-user languages
• support diverse application domains
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 32: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/32.jpg)
Front-end/back-end:
• support multiple data models
• support multiple end-user languages
• support diverse application domains
PathfinderXQuery Frontend
MonetDB: extensible architecture
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 33: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/33.jpg)
Architecture
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 34: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/34.jpg)
Outline
• Basic XML / XQuery• Introduction of Pathfinder and MonetDB projects• Relational XQuery
– XPath steps in the pre/post plane– Translating for-loops, and beyond
• Optimizations– Order prevention– Loop-Lifted Staircase join – Join recognition
• Outlook– Conclusions
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 35: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/35.jpg)
Outline
• Basic XML / XQuery
• Introduction of Pathfinder and MonetDB projects
• Relational XQuery– XPath steps in the pre/post plane
– Translating for-loops, and beyond
• MonetDB Implementation– Data structures
• Optimizations– Order prevention
– Loop-Lifted Staircase join
– Join recognition
• Outlook– Conclusions
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 36: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/36.jpg)
XPath on and RDBMS
Node-based relational encoding of XQuery's data model
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 37: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/37.jpg)
Tree Knowledge 1: pruning
![Page 38: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/38.jpg)
Tree Knowledge 2: Partitioning
![Page 39: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/39.jpg)
Staircase Join Algorithm
![Page 40: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/40.jpg)
Tree Knowledge 3: Skipping
![Page 41: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/41.jpg)
Pre/Post Pre/Level/Size
done for better skipping and updates
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 42: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/42.jpg)
Updates
• Dense pre-numbers are nice for XPath– Positional skipping in Staircase join!
• But how to handle updates?
![Page 43: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/43.jpg)
Updates
• Dense pre-numbers are nice for XPath– Positional skipping in Staircase join!
• But how to handle updates?
DenseNot Dense
![Page 44: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/44.jpg)
Planned Update Solution
![Page 45: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/45.jpg)
Planned Update Solution
![Page 46: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/46.jpg)
Planned Update Solution
![Page 47: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/47.jpg)
XPath XQuery
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 48: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/48.jpg)
Sequence Representation
• sequence = table of items• add pos column for maintaining order• ignore polymorphism for the moment
(10, “x”, <a/>, 10) →Pos Item
1 102 “X”3 pre(a)4 10
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 49: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/49.jpg)
For-loops: the iter column
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 50: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/50.jpg)
For-loops: the iter column
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 51: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/51.jpg)
Loop-lifting
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 52: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/52.jpg)
Loop-lifting
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 53: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/53.jpg)
Full Example
join calc project
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 54: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/54.jpg)
Mapping Rules
XQuery construct relational algebraSee VLDB’04 / TDM’04
[Grust,Teubner]
– Sequence construction union– If-Then-[Else] select, [union]– For loop map with cartesian product (all combinations)– Calculations projection expressions– List-functions (e.g. fn:first) select(pos=1)– Element Construction updates using descendant– Path steps selections on the pre/post plane
• Staircase join [VLDB03]: – Single-pass for a *set* of context nodes
– elaborate skipping!
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 55: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/55.jpg)
Xmark Query 2
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 56: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/56.jpg)
Xmark Query 2 (common subexpr)
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 57: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/57.jpg)
Outline
• Basic XML / XQuery• Introduction of Pathfinder and MonetDB projects• Relational XQuery
– XPath steps in the pre/post plane– Translating for-loops, and beyond
• Optimizations– Order prevention– Loop-Lifted Staircase join – Join recognition
• Outlook– Conclusions
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 58: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/58.jpg)
Outline
• Basic XML / XQuery
• Introduction of Pathfinder and MonetDB projects
• Relational XQuery– XPath steps in the pre/post plane
– Translating for-loops, and beyond
• MonetDB Implementation– Data structures
• Optimizations– Order prevention
– Loop-Lifted Staircase join
– Join recognition
• Outlook– Conclusions
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 59: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/59.jpg)
Order Prevention
To encode order, we use the pos column
New pos columns are created using DENSE RANK (sql) primitive
• Needs [pos] | [iter] order
• More commonly [iter,pos]
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 60: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/60.jpg)
Order Prevention
To encode order, we use the pos column
New pos columns are created using DENSE RANK (SQL) primitive
• Needs [pos] | [iter] order
• More commonly [iter,pos]
This requires a lot of sorting! often not necessary
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 61: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/61.jpg)
Order Prevention
[VLDB03 Wang&Cherniack]
• Order properties of relations
• Order propagation rules for relational operators
Decoration of physical plans with order properties eliminate sort
New ideas:
• RefineSort: pipelined algorithm that extends sort order
• Order property [C1] | [C2]
“for each equal value of [C2] in order of appearance, the values in [C1] are monotonically increasing”
Hash-based DENSE RANK only requires [pos] | [iter]
sorts on [iter,pos] avoided
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 62: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/62.jpg)
Order Prevention
[VLDB03 Wang&Cherniack] define:
• Order properties of relations
• Order propagation rules for relational operators
Decoration of physical plans with order properties eliminate sort
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 63: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/63.jpg)
Join Recognition (recap Mapping Rules)
XQuery construct relational algebraSee VLDB’04 / TDM’04
[Grust,Teubner]
– Sequence construction union– If-Then-[Else] select, [union]– For loop map with cartesian product (all combinations)– Calculations projection expressions– List-functions (e.g. fn:first) select(pos=1)– Element Construction updates using descendant– Path steps selections on the pre/post plane
• Staircase join [VLDB03]: – Single-pass for a *set* of context nodes
– elaborate skipping!
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 64: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/64.jpg)
– For loop map with all combinations O(N*N)– If `simple’ condition exist on two loop variables join
– Only make a map with the matching combinations– E.g. with Hash-Table O(N)
Join Recognition
for $p in $auction/site/people/person for $t in $auction/site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 65: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/65.jpg)
– For loop map with all combinations O(N*N)– If `simple’ condition exist on two loop variables join
– Only make a map with the matching combinations– E.g. with Hash-Table O(N)
Performed on the XCore tree
Recognize if-then expressions
Open question:
where to optimize best??
Join Recognition
for $p in $auction/site/people/person for $t in $auction/site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 66: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/66.jpg)
Join Optimization for $x in $foo for $y in $bar where $x/p1/@a < $y/p2/@a return $x
p1p1 p2theta-join
project
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 67: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/67.jpg)
Join Optimization for $x in $foo for $y in $bar where $x/p1/@a < $y/p2/@a return $x
p1/p1 /p2theta-join
project
p1/p1 /p2
theta-join
Aggr(min) Aggr(max)
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 68: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/68.jpg)
Loop-Lifted StaircaseJoin (recap rules)
XQuery construct relational algebraSee VLDB’04 / TDM’04
[Grust,Teubner]
– Sequence construction union– If-Then-[Else] select, [union]– For loop map with cartesian product (all combinations)– Calculations projection expressions– List-functions (e.g. fn:first) select(pos=1)– Element Construction updates using descendant– Path steps selections on the pre/post plane
• Staircase join [VLDB03]: – Single-pass for a *set* of context nodes
– elaborate skipping!
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 69: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/69.jpg)
Loop-lifted staircase join
• Staircase join [VLDB03]: – Single-pass for a *set* of context nodes
Loop-lifting multiple iters multiple sets of context nodes
– elaborate skipping!
– Loop-Lifted Staircase Join
In a single pass: process multiple input context node lists
– Use a stack
– Exploit axis properties for pruning
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 70: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/70.jpg)
Staircase join
document
List of context nodes
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 71: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/71.jpg)
Loop-lifted staircase join
document document
List of context nodes Active stack
Multiple lists of context nodes
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 72: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/72.jpg)
Loop-lifted staircase join
• Staircase join [VLDB03]: – Single-pass for a *set* of context nodes
Loop-lifting multiple iters multiple sets of context nodes
– elaborate skipping!
– Loop-Lifted Staircase Join
In a single pass: process multiple input context node lists
– Use a stack
– Exploit axis properties for pruning
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 73: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/73.jpg)
Scalability
Test platform• Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit
• Can process 11GB document!
Mostly linear scaling with document size
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 74: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/74.jpg)
Scalability
Test platform• Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit
• Can process 11GB document!
Mostly linear scaling with document size
• Some swapping in the join queries
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 75: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/75.jpg)
Scalability
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
Test platform• Opteron 1.6GHz, 8GB RAM, Red Hat Linux 64-bit
• Can process 11GB document!
Mostly linear scaling with document size
• Some swapping in the join-queries
• Q11 + Q12 generate quadratic result
![Page 76: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/76.jpg)
XMark 10MB : Pathfinder vs XHive & Galax
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 77: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/77.jpg)
XMark 1GB: Pathfinder vs X-Hive
did not finish
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 78: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/78.jpg)
Conclusions
• Relational approach can be scalable & fast• Crucial Optimizations
– Join recognition– Loop-lifted XPath steps– Order awareness
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery
![Page 79: MonetDB/XQuery: Using a Relational DBMS for XML Peter Boncz CWI The Netherlands](https://reader036.vdocuments.site/reader036/viewer/2022070412/56649edc5503460f94bed71d/html5/thumbnails/79.jpg)
Conclusions
• Relational approach can be scalable & fast• Crucial Optimizations
– Join recognition– Loop-lifted XPath steps– Order awareness
Future Roadmap (beta: May 30, Holland Open)• Alegebraic Query Optimization• Updates (not in release)
Peter Boncz TU Delft 10-5-2005Pathfinder - MonetDB/XQuery