min lutimber: a native xml db1 timber: a native xml database author: h.v. jagadish, etc. presenter:...
TRANSCRIPT
Min Lu TIMBER: A Native XML DB 1
TIMBER: TIMBER: A Native XML DatabaseA Native XML Database
Author: H.V. Jagadish, etc.
Presenter: Min Lu
Date: Apr 5, 2005
Min Lu TIMBER: A Native XML DB 2
IntroductionIntroduction
• Growing XML – XML repository• New Approach – Native XML DB• TIMBER: Tree-structured native XML
database Implemented at the University of Michigan by Bright Energetic Researchers
Min Lu TIMBER: A Native XML DB 3
Topics of DiscussionTopics of Discussion
MotivationMotivation• TIMBER Architecture• Tree Algebra (TAX)• Query Optimization• Conclusion
Min Lu TIMBER: A Native XML DB 5
MotivationMotivation
• XML Characteristics* Tree structured - elements can be structurally
related and these relationships are meaningful
* Flexibility
• Map XML to Relational DB
* Unnormalized relational representation
* Or a large number of tables
Min Lu TIMBER: A Native XML DB 6
MotivationMotivation
• Native XML DB• Tamino - a commercial one• Natix - a native XML data management
system, designed for storing and processing XML data.
• Timber – on “Shore” storage manager.
Min Lu TIMBER: A Native XML DB 7
Topics of DiscussionTopics of Discussion
• Motivation TIMBER ArchitectureTIMBER Architecture• Tree Algebra (TAX)• Query Optimization• Conclusion
Min Lu TIMBER: A Native XML DB 8
TIMBER ArchitectureTIMBER Architecture
(Shore)
Shore:• Disk memory management• Buffering • Concurrency control
Min Lu TIMBER: A Native XML DB 9
TIMBER Architecture – Data FlowTIMBER Architecture – Data Flow
Parse tree
(Shore)
Internal representation
Interface
One node at a time
InterfaceInterface
Min Lu TIMBER: A Native XML DB 10
TIMBER Architecture – Query FlowTIMBER Architecture – Query Flow
Operator tree
(Shore)
CallCall
CallCall
Min Lu TIMBER: A Native XML DB 11
Nodes in TIMBERNodes in TIMBER
• One node for each element• All attributes clubbed into one node• Content of element pulled into a
child node• Processing instruction, comments
are simply ignored
Min Lu TIMBER: A Native XML DB 12
Node LabelsNode Labels
• The determination of PC, AD relationships is a frequent operation
• Label each node with a triple• Start, end, level: (S, E, L)
Min Lu TIMBER: A Native XML DB 13
Triple Labels for AD & PCTriple Labels for AD & PC
• AD: (S1, E1, L1) - (S2, E2, L2) <=> S1<S2 & E1>E2ex. (1.0, 9.0, 1) – (3.0, 6.0, 5)
• PC: (S1, E1, L1) - (S2, E2, L2)<=> S1<S2 & E1>E2 & L1=L2-1ex. (1.0, 9.0, 1) – (2.0, 8.0, 2)
1.0 3.0 6.0 9.0
Descendant interval Ancestor interval
Min Lu TIMBER: A Native XML DB 14
Triple Label BenefitsTriple Label Benefits
• Updates: no re-labeling• Use Double value to leave gaps for
new nodes• Serves as a node identifier• Store nodes by the start labels to
cluster their sub-elements together with them
Min Lu TIMBER: A Native XML DB 15
Topics of DiscussionTopics of Discussion
• Motivation• TIMBER Architecture Tree Algebra (TAX)Tree Algebra (TAX)• Query Optimization• Conclusion
Min Lu TIMBER: A Native XML DB 16
Tree Algebra (TAX)Tree Algebra (TAX)
• Set-at-a-time for efficiency• Bulk algebra: input one or more sets of
trees and output a set of trees• Pattern tree: the portion of interest• Witness tree: bears witness to the
success of the pattern match on the input tree
Min Lu TIMBER: A Native XML DB 18
Operators in TAXOperators in TAX
• Algebra Operations developed:Selection, Projection, Product,
Set union, Set difference,
Renaming, Reordering, Grouping
• The core of XQuery can be parsed to TAX operators
Min Lu TIMBER: A Native XML DB 19
Projection Operator in TAXProjection Operator in TAX
Input C: collection of treesParameter P: pattern treeParameter PL: projection list
(the info to keep in the output)
Min Lu TIMBER: A Native XML DB 20
Topics of DiscussionTopics of Discussion
• Motivation• TIMBER Architecture• Tree Algebra (TAX) Query OptimizationQuery Optimization• Conclusion
Min Lu TIMBER: A Native XML DB 21
Query OptimizationQuery Optimization
• Consider the join between faculty node and secretary node first, then join the result with RA node.
• Join faculty node with RA node first, then, join the result with secretary node.
Min Lu TIMBER: A Native XML DB 22
Query OptimizerQuery Optimizer
• Query optimizer enumerates all evaluation plans, estimate their costs, then choose the optimal one.
• An algorithm FP_Optimization for finding the best evaluation plan.
Min Lu TIMBER: A Native XML DB 23
Case Study for Query OptimizationCase Study for Query Optimization
• Consider the query against the DB “mBench 0.1x data set” with about 130,000 nodes
A
B
D
F
C
E
G
A
B
D F
C E G
Min Lu TIMBER: A Native XML DB 24
Query OptimizationQuery Optimization
Five Alternative Query Plans with different orders and combination of operators.
Min Lu TIMBER: A Native XML DB 26
Topics of DiscussionTopics of Discussion
• Motivation• TIMBER Architecture• Tree Algebra (TAX)• Query Optimization ConclusionConclusion
Min Lu TIMBER: A Native XML DB 27
ConclusionConclusion
• A comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing
• New access methods have been developed to evaluate queries from XML
• New cost estimation and query optimization techniques have been developed.
Min Lu TIMBER: A Native XML DB 28
Work to be DoneWork to be Done
• Currently all processing instructions, comments, and such are simply ignored.- An extra child node of the element node with all such data needs to be created.
• TIMBER was developed when XQuery didn’t support updates.- 11th Feb 2005: First Public Working Draft of the XQuery Update Facility Requirements;- A parser has to be implemented to support updates.
• During an extremely localized sequence of inserts, the Start End labels become an issue.