d(k)-index: an adaptive structural summary for graph-structured data
DESCRIPTION
D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data. Qun Chen, Andrew Lim and Kian Win Ong. SIGMOD 2003. Outline. Introduction: XML Query and Path Expression Previous Structural Summaries for XML 1-Index A(k)-Index D(k)-Index Construction Update Experimental Results - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/1.jpg)
D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data
Qun Chen, Andrew Lim and Kian Win OngSIGMOD 2003
![Page 2: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/2.jpg)
Outline
• Introduction: XML Query and Path Expression • Previous Structural Summaries for XML
– 1-Index– A(k)-Index
• D(k)-Index– Construction– Update
• Experimental Results• Conclusion and Future Work
![Page 3: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/3.jpg)
An XML Document<?xml version="1.0"?> <!DOCTYPE MovieDB SYSTEM “moviedb.dtd”><MovieDB> <director name=“Steven Pat”> <movie> <title> Titanic </title> … </movie> … </director></MovieDB>
![Page 4: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/4.jpg)
XML Data Model
![Page 5: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/5.jpg)
Regular Path Expression• Example:
– director.movie.title– movieDB.(_)?.movie.actor.name
• Definition:– A sequence of labels(.or_)– Alternation(|), repetition(*), optional
expression(?) allowed
![Page 6: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/6.jpg)
Path MatchingP: director.movie.title
{15,16,18}
![Page 7: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/7.jpg)
Purpose of Structural Summary
P: A.C.D
To improve evaluation performance by pruning the search space!
![Page 8: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/8.jpg)
Bisimilarity• Existing summary structures, 1-index and
A(k)-Index, are based on bisimilarity;• Definition:
– Two data nodes u and v are bisimilar(uv) if • u and v have the same label;• if u’ is a parent of u, then there is a parent v’ of v
such that u’v’, and vice versa;
• Intuitively, the set of paths coming into them is the same if two nodes are bisimilar
![Page 9: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/9.jpg)
1-Index
• Each index node represents an equivalence class, in which data nodes are mutually bisimilar.
• Evaluation on 1-index is • safe: its result always contains the result of
evaluating on the data graph;• sound:its result contains no false data node;
![Page 10: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/10.jpg)
1-Index (cont’d)
A1
B B C
D E E
F F
2 3 4
765
8 9
A1
B C
D E E
F F
2,3 4
765
8 9
源数据图 1-Index图
![Page 11: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/11.jpg)
1-Index (cont’d)
![Page 12: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/12.jpg)
Local Bisimilarity
• k-bisimilarity(k) is defined inductively:• For any two nodes, u and v, u0v iff u and v have
the same label;• Node ukv iff u(k-1)v, and for every parent u’ of u,
there is a parent v’ of v such that u’(k-1)v’, and vice versa;
• Intuitively, if two data nodes are k-bisimilar, the set of paths coming into them with length ( k) is the same.
![Page 13: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/13.jpg)
A(k)-Index
• In A(k)-Index, data nodes in each index nodes are mutually k-bisimilar;
• Evaluation on A(k)-index is – 1. If nodes u and v are k-bisimilar, then the set of label paths of
length ≤ k into them is the same.– 2. The set of label-paths of length m(m ≤ k) into an A(k)-index node
is the set of label paths of length m into any data node in its extent.– Safe its results on a path expression always contain the data graph
results for that query.– sound if the length of the query path is k, otherwise the result on
the index graph should be validated on the data graph.
![Page 14: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/14.jpg)
A(k)-Index
A1
B B C
D E E
F F
2 3 4
765
8 9
源数据图
C
5
1
2
A
B B
D E
F
3 4
6,7
8,9
A(0)索引图
5
A1
B B C
D E E
F
2 3 4
76
8,9
A(1)索引图
![Page 15: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/15.jpg)
D(k)-Index
• Each index node in D(k) has its own local bisimilarity
• A clear generalization of 1-Index and A(k)-Index;
• Advantage over 1-Index and A(k)-Index• workload-sensitive;• can more efficiently updated
![Page 16: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/16.jpg)
D(k)-Index(Cont.)
• The D(k)-index is the index graph based on the local bisimilarity. It satisfies the condition that for any two index nodes ni and nj, k(ni)k(nj)-1 if there is an edge from ni to nj, in which k(ni) and k(nj) are ni and nj’s local bisimilarities, respectively.
k(A)k(B)-1
![Page 17: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/17.jpg)
Properties of D(k)-Index
• The set of label paths of length s(≤ k(ni)) into a node ni in the D(k)-index is the set of label paths of length s into any data node in its extent;
• The D(k)-index is safe, i.e , its result on a path expression always contains the data graph result for that query;
• The D(k)-index is sound for a path expression P of length m, l1l2 · · · lm+1, if, for each matching index node ni of P, k(ni) ≥ m.
![Page 18: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/18.jpg)
Construction of D(k)-Index
![Page 19: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/19.jpg)
A Construction Example
Label E has a local bisimilarity requirement of 2, other labels’ are 1
![Page 20: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/20.jpg)
Update on D(k)-Index• Two types of updates:
• The addition of a subgraph;• The addition of a new edge; this represents a
small incremental change to the source data;
• For the addition of a subgraph, no major difference between D(k)-Index and previous static summary structures;
• For the addition of a new edge, D(k)-Index is significantly more efficient!
![Page 21: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/21.jpg)
Subgraph Addition
![Page 22: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/22.jpg)
Edge Addition
![Page 23: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/23.jpg)
Update Comparison
Splitting up index nodes is computationally expensive!
![Page 24: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/24.jpg)
Experiments (Data Sets)• The Xmark benchmark data. It simulates
information about activities of an auction site.
• The Nasa data. This data set is generated by the IBM data generator using a real DTD file, which is a markup language for the data and metadate at NASA/GSFC.
![Page 25: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/25.jpg)
D(k) VS. A(k)• We compare our D(k)-index with the previous
structural index A(k)-index, since the A(k)-index has been shown to outperform the 1-index.
• We randomly generate 100 test paths with lengths between 2 and 5 for the Xmark and Nasa data. So we compare D(k)-index’s performance with A(0), A(1), up to A(4). Because evaluating test paths on the A(4)-index is already sound.
![Page 26: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/26.jpg)
Evaluation before Updating(Xmark)
![Page 27: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/27.jpg)
Evaluation before Updating(Nasa)
![Page 28: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/28.jpg)
Updating Performance Running Time(msec)
Xmark Nasa
A(1) 1,022 3,863
A(2) 3,322 11,126
A(3) 5,196 31,992
A(4) 23,262 53,090
D(k) 2 1377
1:100 new references are added to XML documents randomly
2: Our Machine features Linux OS, a Pentium 41.8 Ghz processor and a 512RAM
Notes:
![Page 29: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/29.jpg)
Evaluation after Updating(Xmark)
![Page 30: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/30.jpg)
Evaluation after Updating(Nasa)
![Page 31: D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data](https://reader030.vdocuments.site/reader030/viewer/2022020423/5681383e550346895d9fe874/html5/thumbnails/31.jpg)
Conclusion and Future Work• D(k)-Index, as a clean generalization of 1-
index and A(k)-Index, has a clear advantage over them:
• Adaptive to workload• More efficient update operations
• Future works:• Query pattern mining• Extending D(k)-Index to handle more
complicated, branching path queries