![Page 1: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/1.jpg)
gStore: Answering SPARQL Queries via Subgraph Matching
Lei Zou , Jinghui Mo , Lei Chen , M. Tamer Ozsu¨ , Dongyan Zhao
{ zoulei,mojinghui,zdy}@icst.pku.edu.cn, [email protected],
![Page 2: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/2.jpg)
Agenda
• Introduction• Preliminaries• Overview of gStore• Storage Scheme and Encoding Technique• Indexing Structure and Query Algorithm• Optimized methods• Experiments and their results• Conclusions
![Page 3: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/3.jpg)
Introduction -1/4• What is RDF?
– Building block of semantic web– Represented as a collection of triples : (Subject,Property,Object)
Prefix: y=http://en.wikipedia.org/wiki/Subject Property Object
y:Abraham Lincoln hasName Abraham Lincolny:Abraham Lincoln BornOnDate 1809-02-12y:Abraham Lincoln DiedOnDate 1865-04-15y:Abraham Lincoln DiedIn y:Washington_D.Cy:Washington_D.C hasName “Washington D.C”y:Washington_D.C FoundYear 1790y:Washington_D.C rdf:type y:cityy:United_States hasName “United States”y:United_States hasCapital y:Washington_D.Cy:United_States rdf:type Countryy:Reese_Witherspoon rdf:type y:Actory:Reese_Witherspoon BornOnDate “1976-03-22”y:Reese_Witherspoon BornIn y:New_Orleans_Louisianay:Reese_Witherspoon hasName “Reese Witherspoon”y:New_Orleans_Louisiana FoundYear 1718y:New_Orleans_Louisiana rdf:type y:cityy:New_Orleans_Louisiana locatedIn y:United_States
![Page 4: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/4.jpg)
Introduction 2/4:RDF Graph
![Page 5: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/5.jpg)
Introduction - 3/4
• What is SPARQL?• Sample query:
Select ?name Where { ?m <hasName> ?name. ?m <BornOn Date > “1809-02-12” ?m <DiedOnDate> “1865-04-15” }
• Query with wildcards:Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> ?bd. ?m <DiedOnDate> ?dd. FILTER regex(str(?bd), “02-12”), regex(str(?dd), “04-15”) }
![Page 6: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/6.jpg)
Introduction - 4/4
• Problems with existing solutions:– they cannot answer SPARQL queries with
wildcards in a scalable manner– they cannot handle frequent updates in RDF
repositories• Answering with subgraph matching– Modeling RDF data and Query as two graphs– Cannot use regular graph pattern matching– Answering SPARQL query ≈ subgraph matching
![Page 7: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/7.jpg)
Preliminaries• RDF graph , G, is denoted as G=(V, LV , E, LE )• Query graph , Q, is denoted as Q=(V, LV , E, LE )
![Page 8: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/8.jpg)
• G(u1, u2,…, un) is a match of Q(v1, v2,…, vn) if:– vi is a literal vertex, vi and ui have the same literal value– vi is a class/entity vertex, vi and ui have the same URI– vi is a parameter vertex, there is no constraint over ui
– vi is a wildcard vertex, vi is a substring of ui and ui is a literal value
– there is an edge from vi to vj in Q with the property p, there is also an edge from ui to uj in G with the same property p
Preliminaries Cont’d
![Page 9: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/9.jpg)
Overview of gstore
• Work directly on RDF graph and SPARQL Query graph
• Use a signature-based encoding of each entity and class vertex to speed up matching
• Filter and evaluate– Use a false-positive algorithm to prune nodes and obtain a set of
candidates; then verify each candidate
• Use an index (VS -tree) over the data signature ∗graph (has light maintenance load) for efficient pruning
![Page 10: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/10.jpg)
Storage Scheme & Encoding Technique
• Storage Scheme
![Page 11: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/11.jpg)
Storage Scheme & Encoding Technique
• Encoding technique
(hasName, “Abraham Lincoln”)
0100 0000 0000
![Page 12: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/12.jpg)
Storage Scheme & Encoding Technique
• Encoding technique
(hasName, “Abraham Lincoln”)
0100 0000 0000
“Abr”
“bra”
“rah”
![Page 13: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/13.jpg)
Storage Scheme & Encoding Technique
• Encoding technique
(hasName, “Abraham Lincoln”)
0100 0000 0000
“Abr”
“bra”
“rah”
0000 0100 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000
![Page 14: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/14.jpg)
Storage Scheme & Encoding Technique
• Encoding technique
(hasName, “Abraham Lincoln”)
0100 0000 0000
“Abr”
“bra”
“rah”
0000 0100 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000
OR
1000 0100 0100 0000
![Page 15: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/15.jpg)
Storage Scheme & Encoding Technique
• Encoding technique
(hasName, “Abraham Lincoln”)
0100 0000 0000
1000 0100 0100 0000
1000 0100 0100 0000
![Page 16: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/16.jpg)
Storage Scheme & Encoding Technique
• Encoding technique
(hasName, “Abraham Lincoln”)
0010 0000 0000 1000 0100 0100 0000
(BornOnDate, "1908-02-12")
0100 0000 0000 0100 0010 0100 1000
(DiedOnDate, "1965-04-15")
0000 1000 0000 0000 0010 0100 0000
(DiedIn, y:Washington DC)0000 0010 0000 1000 0010 0100 0001
0110 1010 0000 1100 0110 0100 1001
OR
![Page 17: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/17.jpg)
Indexing Structure and Query Algorithm
![Page 18: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/18.jpg)
Data Signature Graph G*
![Page 19: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/19.jpg)
Converting Q to Q*
![Page 20: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/20.jpg)
Filter and Evaluate
Find matches of Q* over G*(CL)
Verify each match in RDF against G(RS)
![Page 21: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/21.jpg)
Generating Candidate List(CL)
• Two step process:– for each vertex vi ∈ V (Q ), we find a list ∗ Ri = {ui1 ,
ui2 , ..., uin}, where vi&ui=vi, ui V(G*) and ∈ uij R∈ i – do a multi-way join to get the candidate list
• Use S-trees– Height-balanced tree over signatures– Does not support second step - expensive
• Vs-tree and Vs*-tree– Multi-resolution summary graph based on S-
tree– Supports both steps efficiently
![Page 22: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/22.jpg)
S-tree Solution
001 002 003 004
005 007 008 006
d13
d23 d3
3
d43
d12 d2
2
d13
0010 1000 1000 0100 1000 0001 0001 1000
0000 0001 0100 0100 0001 0100 1000 1000
0010 1001 1100 0100 1001 0101 1001 1000
1001 11011110 1101
1111 1101
0000 1000 1000 000010000
![Page 23: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/23.jpg)
S-tree Solution
001 002 003 004
005 007 008 006
d13
d23 d3
3
d43
d12 d2
2
d13
0010 1000 1000 0100 1000 0001 0001 1000
0000 0001 0100 0100 0001 0100 1000 1000
0010 1001 1100 0100 1001 0101 1001 1000
1001 11011110 1101
1111 1101
0000 1000 1000 000010000 001
004
006
![Page 24: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/24.jpg)
S-tree Solution
001 002 003 004
005 007 008 006
d13
d23 d3
3
d43
d12 d2
2
d13
0010 1000 1000 0100 1000 0001 0001 1000
0000 0001 0100 0100 0001 0100 1000 1000
0010 1001 1100 0100 1001 0101 1001 1000
1001 11011110 1101
1111 1101
0000 1000 1000 000010000 001
004
006
002003
006
![Page 25: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/25.jpg)
S-tree Solution
001 002 003 004
005 007 008 006
d13
d23 d3
3
d43
d12 d2
2
d13
0010 1000 1000 0100 1000 0001 0001 1000
0000 0001 0100 0100 0001 0100 1000 1000
0010 1001 1100 0100 1001 0101 1001 1000
1001 11011110 1101
1111 1101
0000 1000 1000 000010000 001
004
006
002003
006
![Page 26: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/26.jpg)
S-tree Solution
001 002 003 004
005 007 008 006
d13
d23 d3
3
d43
d12 d2
2
d13
0010 1000 1000 0100 1000 0001 0001 1000
0000 0001 0100 0100 0001 0100 1000 1000
0010 1001 1100 0100 1001 0101 1001 1000
1001 11011110 1101
1111 1101
0000 1000 1000 000010000 001
004
006
002003&
006
![Page 27: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/27.jpg)
VS-tree Solution
1110 1101 1001 1101
0010 10011100 0100
1001 0101
1001 1000
0010 1000 1000 0100 1000 00010001 1000
0000 0001 0100 0100 0001 0100
1000 1000
001 002 003004
005
006
007 008
d13 d2
3d3
3
d43
d12
d22
d11
11111
10010 00110
00001
10010 01000
01011
00010
00010 00100 00010
10000
0001000010
01000
00010
00100
00010
![Page 28: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/28.jpg)
VS-tree Solution 0000 1000 1000 000010000
![Page 29: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/29.jpg)
VS-tree Solution 0000 1000 1000 000010000
d11 X d1
1
![Page 30: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/30.jpg)
VS-tree Solution 0000 1000 1000 000010000
d12 X d1
2
![Page 31: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/31.jpg)
VS-tree Solution 0000 1000 1000 000010000
d13 X d2
3
![Page 32: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/32.jpg)
VS-tree Solution 0000 1000 1000 000010000
001 X
002
![Page 33: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/33.jpg)
VS-tree Solution-limitations 0000 1000 1000 0000
10000
If this level is dense,many summary matches => More search space
Process each level step by step
![Page 34: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/34.jpg)
Possible Optimization Methods
• “magically” know which level to begin with to minimize the number of summary matches
• Use DFS(Depth First Search) to find the valid child nodes
• While inserting vertices, consider not only the hamming distance but also the number of super edges introduced
![Page 35: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/35.jpg)
Optimization example
![Page 36: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/36.jpg)
Experimental results-Exact queries
Queries
Yago network (20 million triples & size 3.1GB)
gStore RDF-3x SW-Store x-RDF-3x BigOWLIM GRIN
![Page 37: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/37.jpg)
Experimental results-Wildcard queries
Queries
gStore RDF-3x SW-Storex-RDF-3x BigOWLIM GRIN
![Page 38: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/38.jpg)
Conclusion
• This approach:– Uses two novel indexes VS-tree and VS*-tree to
speed up query processing– Was also to solve the two problems with existing
solutions:• answers SPARQL queries with wildcards in a scalable
manner• handle frequent and online updates in RDF repositories
![Page 39: GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao { zoulei,mojinghui,zdy}@icst.pku.edu.cn,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649ebc5503460f94bc548a/html5/thumbnails/39.jpg)
Questions?