zero-knowledge query planning for an iterator implementation of link traversal based query execution
DESCRIPTION
WiTRANSCRIPT
Zero-KnowledgeQuery Planning
for an Iterator Implementation ofLink Traversal Based Query Execution
Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research GroupHumboldt-Universität zu Berlin
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 2
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Iterator Based Execution Plan
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 3
Iterator Based Execution Plan
query-localdataset
Next?
Next?
Next?
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 4
Next?
Next?
Next?
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution Plan
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
query-localdataset
:
<http://.../alice> ex:affiliated_with <http://.../orgaX>
:
Descriptor object
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 5
Next?
Next?
Next?
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution Plan
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
query-localdataset
:
<http://.../alice> ex:affiliated_with <http://.../orgaX>
:
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 6
Next?
Iterator Based Execution Plan
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
:
<http://.../alice> ex:affiliated_with <http://.../orgaX>
:
query-localdataset
{ ?p = <http://.../alice> }
Next?
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 7
Next?
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution Plan
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
query-localdataset
{ ?p = <http://.../alice> }
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 8
Next?
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution Plan
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
query-localdataset
{ ?p = <http://.../alice> }
:
<http://.../alice> ex:interested_in <http://.../b1>
:
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 9
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution Plan
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
{ ?p = <http://.../alice> }
:
<http://.../alice> ex:interested_in <http://.../b1>
:
query-localdataset
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 10
Iterator Based Execution Plan
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
Next?
{ ?p = <http://.../alice> }
query-localdataset
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 11
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
Next?
{ ?p = <http://.../alice> }
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Iterator Based Execution Plan
query-localdataset
:
<http://.../Book> rdfs:subClassOf <http://.../CreativeWork>
:
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 12
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
Next?
{ ?p = <http://.../alice> }
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Iterator Based Execution Plan
query-localdataset
:
<http://.../b1> rdf:type <http://.../Book>
:
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 13
Iterator Based Execution Plan
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
{ ?p = <http://.../alice> }
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
query-localdataset
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 14
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
tp2 = ( ?p , ex:interested_in , ?b ) I
2
Example Query Execution Plan
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 15
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
tp2 = ( ?p , ex:interested_in , ?b ) I
2
An Alternative Execution Plan
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 16
An Alternative Execution Plan
query-localdataset
Next?
Next?
Next?
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 17
Next?
Next?
Next?
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
An Alternative Execution Plan
query-localdataset
:
<http://.../Book> rdfs:subClassOf <http://.../CreativeWork>
:
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 18
Next?
Next?
END!
tp2 = ( ?p , ex:interested_in , ?b ) I
2
An Alternative Execution Plan
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
:
<http://.../Book> rdfs:subClassOf <http://.../CreativeWork>
:
query-localdataset
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 19
An Alternative Execution Plan
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
query-localdataset
END!
END!
tp2 = ( ?p , ex:interested_in , ?b ) I
2
END!
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 20
An Alternative Execution Plan
query-localdataset
END!
END!
tp2 = ( ?p , ex:interested_in , ?b ) I
2
END!
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
Number of results may depend on the order of triple patterns
= logical query execution plan
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 21
Query Plan Selection
● Assessment criteria:● Cost (query execution time)● Benefit (number of results)
● Cost and benefit must be estimated without plan execution
● Estimation impossible due to “zero knowledge”
● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE
● SEED TP RULE
● NO VOCAB SEED RULE
● FILTERING TP RULE
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 22
SEED TP RULE
● Potential seed triple pattern
… is a triple pattern that contains at least one HTTP URI
● Seed triple pattern of a plan
… is the first triple pattern in the plan and
… is a potential seed triple pattern
● Rationale: goodstarting point
Use a plan with a seed triple pattern
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
√√
√
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 23
NO VOCAB SEED RULE
● Not only vocabulary term URIs in the seed triple pattern
● Patterns to avoid: ?s ex:any_property ?o
?s rdf:type ex:any_class
● Rationale: URIs for vocabulary term usually resolve tovocabulary definitions with little instance data
Avoid a seed triple pattern with vocabulary terms
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
√
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 24
FILTERING TP RULE
● Filtering triple pattern: each variable already occurs in oneof the preceding triple patterns
● For each resultconsumed as inputa filtering TP canonly report 1 or 0results as output
● Rationale: Reduce cost
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Use a plan where all filtering triple patterns areas close to the seed triple pattern as possible
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 25
Evaluation Procedure
● Generate all possible plans
● Execute each plan:● 5 runs (+ 1 initial warm-up run) ● Use an initially empty query-local dataset for each run
● Measure for each plan:● Avg. execution time● Avg. number of descriptor objects retrieved during execution● Avg. number of query results
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 26
Evaluation Query (Example)
SELECT ?spec ?genus WHERE {
geospecies:4qyn7 gs:inFamily ?fam .
?fam skos:narrowerTransitive ?spec .
?spec skos:closeMatch ?sp2 .
?sp2 rdfs:subClassOf ?genus .
?spec gs:isExpectedIn ?loc .
geospecies:4qyn7 gs:isExpectedIn ?loc
?loc rdf:type gs:State . }
● 2 potential seed triple patterns thatsatisfy our NO SEED VOCAB RULE
● 56 different plans, each contains2 filtering triple patterns
Of what genus are the species that are● classified in the
same family as the American Badger,
● and expected in the same states as the American Badger ?
Picture source: Wikipedia
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 27
Measurements
1st Filtering TP
Percentage of plans in each group with a filtering TP in specific positions
2nd Filtering TPre
trie
v ed
desc
r. o
b jec
ts
0 30 60 90 120 150 1800
100
200
300
400
query exec. times (in seconds)
quer
y re
sult s
0 30 60 90 120 150 1800
10
20
30
query exec. times (in seconds)
1 2 3 4 5 6 70
50
100
TP position in the ordered BGP1 2 3 4 5 6 7
0
50
100
TP position in the ordered BGP
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 28
Conclusions
● Approach that uses iterators to implementLink Traversal Based Query Execution
… is sound
… guarantees termination
… cannot guarantee (reachability-) completeness
● Degree of completeness depends onthe query plans (i.e. orders of the BGP)
● Heuristic based plan selection
● Next steps:● Algorithm to generate most suitable plans only● Integrate adaptive query processing techniques
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 29
Backup Slides
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 30
Outline
1. Link Traversal BasedQuery Execution
2. Characteristics of theIterator BasedImplementation Approach
3. Query Plan Selection
4. Evaluation
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 31
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Link Traversal Based Querying
● Semantics definedin two phases:
1. Reachability
2. Query ResultsDescriptor
object
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 32
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Link Traversal Based Querying
● Semantics definedin two phases:
1. Reachability
2. Query Results
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 33
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Link Traversal Based Querying
● Semantics definedin two phases:
1. Reachability
2. Query Results
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 34
Link Traversal Based Querying
● Semantics definedin two phases:
1. Reachability
2. Query Results
● 2 phases do not reflect idea of actual execution strategy● Intertwine query evaluation with the traversal of data links
Reachabledescriptor
object
The mapping μ : V → U B Lis a result to BGP bgp iff:
Condition 1: dom(μ) = vars(bgp)Condition 2: μ[bgp] D
reachable
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 35
Characteristics
● No need to know all data sources in advance● Never complete w.r.t. all data on the Web
● Reason: reachability based on links that match query patterns● New concept: reachability-completeness
● No guarantee for termination● Reason: Web of Data is infinite (at any point in time)
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 36
Outline
1. Characteristics of theImplementation Approach
2. Query Plan Selection
3. Evaluation
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 37
Characteristics
● Sound
● Number of results may depend on order of triple patterns
● Reachability-completeness not guaranteed● Main reason: inflexibility due to fixed order
● Termination is guaranteed● No such guarantee for link traversal based
query execution in general because the Webof Data is infinite (at any point in time)
● Efficient
● Easy to apply in existing query engines
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 38
Plan Selection Problem
● Number of results may depend on order of triple patterns
➔ Problem: select a suitable plan
= logical query execution plan
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 39
Outline
1. Link Traversal BasedQuery Execution
2. Characteristics of theIterator BasedImplementation Approach
3. Query Plan Selection
4. Evaluation
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 40
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
√
Use a dependency respecting query plan
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 41
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Use a dependency respecting query plan
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 42
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
Use a dependency respecting query plan
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
● Rationale:Avoidcartesianproducts
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?b , rdf:type , <http://.../Book> ) I
2
tp3 = ( ?p , ex:interested_in , ?b ) I
3
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 43
These slides have been created byOlaf Hartig
http://olafhartig.de
This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)