raindrop: an algebra-automata combined xquery engine over xml streams
DESCRIPTION
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams. Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004. Stream Processing. data sources. Networks. data requesters. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/1.jpg)
Raindrop:
An Algebra-Automata Combined XQuery Engine over XML Streams
Hong Su, Elke Rundensteiner, Murali Mani, Ming Li
Worcester Polytechnic Institute
Worcester, MA
VLDB 2004
![Page 2: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/2.jpg)
Stream Processingdata sources
data requesters
Networks
![Page 3: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/3.jpg)
What’s Special for XML Stream Processing
<auctions>
Token-by-Token access manner
timeline
Pattern retrieval + Filtering + Restructuring
FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>
Token: not a counterpart of a self-contained tuple
Pattern Retrieval on Token Streams
<auction>
<seller>
<primary>
<phone>
![Page 4: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/4.jpg)
Two Computation Paradigms Automata-based [yfilter, xscan, xsm, xsq, xpush…] Algebraic [niagara00, …]
FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>
1auction
*
2
3seller
bidder
Automata
8Navigate
$a, /seller->$b
Navigate $a, /bidder-> $c
Tagger
Algebra
Navigate stream(bids),//auction->$a
4
homepage
9sameAddr
5 6* phone
…
7
bid
![Page 5: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/5.jpg)
Comparison of Two Paradigms
Either paradigm has deficiencies
Both paradigms complement each other
Automata Paradigm Algebra Paradigm
Good for pattern retrieval on tokens Does not support token inputs
Need patches for filtering and restructuring
Good for filtering and restructuring
Present all details on same low level Support multiple descriptive levels (e.g., logical plan, physical plan)
Little studied as query processing paradigm
Well studied as query process paradigm
![Page 6: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/6.jpg)
Four-Level Algebraic Framework
Semantics-Focused PlanSemantics-Focused Plan
Stream Physical PlanStream Physical Plan
Stream Execution PlanStream Execution Plan
Express the semantics of query regardless of
input sources
Accommodate tokenized streams/
automata computation
Describe implementation
details of operators
Decide how an operator is invoked
(scheduling) Abstraction Level
High (Declarative)
Low (Procedural)
Stream Logic PlanStream Logic Plan
This Raindrop framework intends to integrate both paradigms into one
![Page 7: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/7.jpg)
Level I: Semantics-Focused Plan
Express query semantics regardless of stored or stream input sources [Rainbow-ZPR02]
Reuse existing general optimization techniques Decorrelation Cancel duplicate navigation operators …
![Page 8: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/8.jpg)
Stream Data:Stream Data: <auctions> <auction> <seller> <primary><phone>508</phone></primary> <secondary><phone>613</phone></secondary> </seller> <bid><bidder>…</bidder><bidder>…</bidder></bid> </auction> …
source<auctions> … </auctions>
source<auctions>… </auctions>
$a<auction> … </auction>
<auctions> … </auctions>
<auction> … </auction>
source<auctions>… </auctions>
$a<auction>… </auction>
$b <seller>…
</seller>
<auctions>… </auctions>
<auction>… </auction>
…
source <auctions>…
</auctions>
$a<auction>… </auction>
$b <seller>…
</seller>
$c <bidder>…
</bidder>
<auctions>… </auctions>
<auction>. .. </auction>
…
NavUnneststream(bids),//auction->$a
NavUnnest $a, /seller ->$b
NavUnnest $a, /bid/bidder ->$c
Example Semantics-Focused Plan
Plan and Input/output Data:Plan and Input/output Data:
Query:Query:
…
FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>
![Page 9: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/9.jpg)
Level II: Stream Logical Plan
Extend semantics-focused plan to accommodate tokenized stream inputs New input data format:
Tokens New operators:
StreamSource, TokenNavigate, ExtractUnnest, ExtractNest, StructuralJoin
New rewrite rules: Push-into/Pull-out-of Automata
![Page 10: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/10.jpg)
One Uniform Algebraic View
Token-based plan (automata plan)
Tuple-based plan
Tuple stream
XML data stream
Query answer
Algebraic Stream Logical Plan
![Page 11: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/11.jpg)
Modeling Automata in Algebraic Plan:Black Box[XScan01] vs. White Box
$a := stream(bids)//auction$b := $a/seller$c := $a/bid/bidder
Black Box
XScan
StructuralJoin$a
ExtractUnnest $a, $b
ExtractUnnest $a, $c
White Box
TokenNavigate $a, /seller->$b
TokenNavigate $a, /bid/bidder->$c
TokenNavigate stream(bids), //auction->$a
FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>
![Page 12: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/12.jpg)
Data Model in Algebraic Plan Modeling Automata
StructuralJoin$a
ExtractUnnest $a, $b
ExtractUnnest $a, $c
TokenNavigate $a, /seller->$b
TokenNavigate $a, /bid/bidder->$c
TokenNavigate stream(bids), //auction->$a
…
…
<phone>
<primary>
<seller>
<auction>
…
0314
<bidderid>
<bidder>
…
<bidder>...</bidder>
</primary>
</phone>
508
...
<phone>
<primary>
<seller>
…
<seller>…</seller>
……
<bidder>...</bidder><seller>…</seller>
…
....
<auction>
<auctions>
StreamSource
![Page 13: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/13.jpg)
For Details of Levels III and IV, please refer to “Automaton Meets Query Algebra: Towards a Unified Mo
del for XQuery Evaluation over XML Data Streams”, ER 2003
“Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, CIKM 2003
“Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, Journal Submission 2004
![Page 14: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/14.jpg)
Optimization I: Computation Into or Out of Automata?
TokenNavigate $a, /bid/bi
dder->$c
ExtractUnnest $a, $c
ExtractUnnest $a, $b
StructuralJoin $a
TokenNavigate $a, /seller->$
b
TokenNavigate stream(bids), //a
uction->$a
ExtracUnnest stream(bids), $a
NavigateUnnest $a, /seller-
>$b
NavigateUnnest $a, /bid/bid
der->$c
TokenNavigate stream(bids), //aucti
on->$a
NavUnnest stream(bids), //auction->$a
NavigateUnnest $a, /seller ->$b
NavigateUnest $a, /bid/bidder ->$c
Out of Automata Into Automata
Automata Plan
Automata Plan
…
… …
![Page 15: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/15.jpg)
Experimentation Results
Execution Time on 85M XML Stream Under Various Selectivity
25000
30000
35000
40000
45000
50000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Selectivity of Selection
Exe
cutio
n Ti
me
(ms)
1 Nav
2 Navs
3 Navs
4 Navs
5 Navs
![Page 16: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/16.jpg)
Optimization II: Semantic Query Optimization
General schema-based optimizations Eliminate predicate/join, … Focus on operators manipulating flat values
XML specific schema-based optimizations Focus on pattern retrieval Fall into two categories
General XML SQO• Minimize query tree [YCL+-AT&T 01]
Stream XML SQO (our focus)
![Page 17: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/17.jpg)
Stream-Specific XML SQO
Observations Pattern retrieval over tokens solely relies on docum
ent-order traversal Schema constraints help expedite document-order t
raversal State-of-the-Art
[XPush03] covers limited query (boolean XPath match) and one type of constraints
Our goals: Support more powerful query (XQuery) Support more types of constraints (XSchema)
![Page 18: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/18.jpg)
Step I: Construct Query Graph
(a) Example Query (b) Query Tree
FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>
![Page 19: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/19.jpg)
Example XML Schema
![Page 20: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/20.jpg)
Step II: Apply Optimization Rules
Offer optimization rules utilizing occurrence constraints exclusive constraints order constraints
Apply rules in an order ensuring no beneficial rule missed no redundant rule introduced
![Page 21: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/21.jpg)
Step III: Translate Rewritten Query Graph Back to Plan (I)
when </phone> is encountered twice, check /*/phone: if fails the predicate, suspend states s2 and s3
Utilize Occurrence Constraints
![Page 22: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/22.jpg)
Step III: Translate Rewritten Query Graph Back to Plan (II)
when <billTo> or <shipTo> is encountered once: suspend states s2 and s9
Utilize Exclusive Constraints
![Page 23: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/23.jpg)
Step III: Translate Rewritten Query Graph Back to Plan (III)
when <primary> is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2
Utilize Order Constraints
![Page 24: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/24.jpg)
http://davis.wpi.edu/dsrg/raindrop/
Thank WPI DSRG Rainbow Team for XAT Algebra Support
![Page 25: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/25.jpg)
![Page 26: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/26.jpg)
![Page 27: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/27.jpg)
![Page 28: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams](https://reader036.vdocuments.site/reader036/viewer/2022062410/56815884550346895dc5e537/html5/thumbnails/28.jpg)
Thank WPI DSRG Rainbow Team for XAT
Algebra Support