master informatique 1 dr. vu le anhstructural indexes of xml databases dr. vu le anh

Download Master Informatique 1 Dr. Vu Le AnhStructural indexes of XML Databases Dr. Vu Le Anh

If you can't read please download the document

Upload: alexandrina-white

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Master Informatique 3 Dr. Vu Le AnhStructural indexes of XML Databases NCBI GEO dataset GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. About 600 gigabyte (Feb ). Data are stored in XML datasets A map of gene is written in XML file, and its XML graph.

TRANSCRIPT

Master Informatique 1 Dr. Vu Le AnhStructural indexes of XML Databases Dr. Vu Le Anh Master Informatique 2 Dr. Vu Le AnhStructural indexes of XML Databases Outline 1.Motiviation 2.Regular queries processing over XML datasets 3.Indexes over XML datasets 4.Structural indexes 5.Structural indexes for distributed XML datasets 6.Summary Master Informatique 3 Dr. Vu Le AnhStructural indexes of XML Databases NCBI GEO dataset GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. About 600 gigabyte (Feb ). Data are stored in XML datasets A map of gene is written in XML file, and its XML graph. Master Informatique 4 Dr. Vu Le AnhStructural indexes of XML Databases Virtual observatory A collection of interoperating data archives and software tools which utilize the internet to form a scientific research environment in which astronomical research programs can be conducted. IVOA (International Virtual Observatory Alliance) Building an international community Using very big XML datasets for storing, exchanging data Master Informatique 5 Dr. Vu Le AnhStructural indexes of XML Databases Problem Efficient query processing over Big (Distributed) XML - Databases Two interesting ideas: 1.Storing the XML database in relational database. Rewriting XML a az XML queries SQL and Datalog. Rewriting and combining the results. 2.Indexing the XML database. Using the indexes for query processing. Master Informatique 6 Dr. Vu Le AnhStructural indexes of XML Databases Data Graph Data Model for XML Data graph: directed, rooted, labelled graph. : set of nodes. : set of label values : set of edges : set of basic edges. : set of reference edges. : the root. : labeling function Master Informatique 7 Dr. Vu Le AnhStructural indexes of XML Databases Publication XML document John ABC Dr.Ben Tom Dr. Kiss DEF Dr. Baker XYZ Master Informatique 8 Dr. Vu Le AnhStructural indexes of XML Databases XML - Datagraph Master Informatique 9 Dr. Vu Le AnhStructural indexes of XML Databases Regular queries Query language for XML: XQuery, XPath, UnQL, Lorel, XQL, XML-QL, etc. Build around regular expressions. 3 basic operations: Concatation:. or / Union: | Interation: * For short: _ - some label value // - (_)* some sequence of label values Example: //(Student | Professor)//Paper/Title Master Informatique 10 Dr. Vu Le AnhStructural indexes of XML Databases Regular queries Pair of nodes (u, v) matches R regular query, if there is a rout from u to v, in which the label sequence of the rout matching R. The result of R : I the input-set and O the output-set, (u, v) matches R} General case: I={root} s O={V}. Every R regular expression can be represented by a finite, not determined automata (NFA), which computes L(R) language. Query graph is the graph representing the automata. Master Informatique 11 Dr. Vu Le AnhStructural indexes of XML Databases Query processing based on the automata The query graph of //B/D: Input: I={0}; Output: O={0,1,,15} A A B CB26 AD913 A D BE D CA F E 15 * BD q0q0 q1q1 q2q2 q0q0 q0q0 q0q0 q0q0 q0q0 q 0 q 2 q1q1 q0q0 The result = {(0,3),(0,11),(0,13)} Master Informatique 12 Dr. Vu Le AnhStructural indexes of XML Databases Transform to Edge Labeled graph Node labeled graphEdge labeled graph Query graph is a edge labeled graph. Transform data graph to edge labeled graph. Master Informatique 13 Dr. Vu Le AnhStructural indexes of XML Databases State-Data (SD) graph SD graph = Query graph JOINING Data graph SD graph may be not connective. SD-Nodes: (data-node, state-node) SD- labeled edges: Constructing from the matching of labels of data-edges and node-edges. Master Informatique 14 Dr. Vu Le AnhStructural indexes of XML Databases Joining R:= a/(b|c)*/a and data graph s0s0 s1s1 s2s2 a b c a Query graph: Data graph: a c a a b SD-graph: 1,s 0 2,s 0 2,s 1 1,s 1 2,s 2 a b 3,s 1 c 4,s 2 a 5,s 2 a 5,s 1 a a 3,s 0 4,s 1 Result: (1,4), (1,5) a Master Informatique 15 Dr. Vu Le AnhStructural indexes of XML Databases SD-graph representation on relational database [KissVu05] Main results: The data graph and query graph can be represented by tables SD graph (table) = Joining data table and query table. Computing the result based on the SD-table. Regular query processing DATALOG + SQL Building the index to support SQL computation. Master Informatique 16 Dr. Vu Le AnhStructural indexes of XML Databases 1. Step: Transform data graph to edge labeled graph Master Informatique 17 Dr. Vu Le AnhStructural indexes of XML Databases 2. step: Query graph representation Master Informatique 18 Dr. Vu Le AnhStructural indexes of XML Databases 3. lps: Using DATALOG, SQL for the computation Master Informatique 19 Dr. Vu Le AnhStructural indexes of XML Databases 4. step: Computation in Relational Databases results: {4,5,6} Master Informatique 20 Dr. Vu Le AnhStructural indexes of XML Databases Classes of XML indexes 1.Indexing the basic values The basis values are indexing (Ex: data(//emp/salary)) Using B + -tree 2.Indexing the text values Keywords should be indexed 3.Indexes for XML -Tree Quickly checking and computing the label sequence of rout between some pair of nodes. Applying it for near-tree XML datasets. 4.Structural indexes. Simulating the datagraph by smaller one to reduce the cost of computation Master Informatique 21 Dr. Vu Le AnhStructural indexes of XML Databases XML-tree pre/post computing [Dietz82] Tree preorder/postorder walking for computing (pre(x),post(x)) (1,7) (2,4) (3,1) (4,2) (5,3) (6,6) (7,5) x is a descendent of y pre(x) < pre(y) s post(x) > post(y) Master Informatique 22 Dr. Vu Le AnhStructural indexes of XML Databases Tree- Structure Improvement [Li&Moon VLDB 2001] Every x node: (order(x), size(x)) (1,100) (10,30) (11,5) (17,5) (25,5) (41,10) (45,5) x is a descendent of y order(x) < order(y) s order(y)