executing sparql queries of the web of linked data
DESCRIPTION
With these slides I presented my paper at the International Semantic Web Conference (ISWC'09), Washington DC, USA, Oct.2009TRANSCRIPT
Executing SPARQL Queriesover the
Web of Linked DataOlaf Hartig*Christian Bizer˚Johann-Christoph Freytag*
*Humboldt-Universität zu Berlin ˚Freie Universität Berlin
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
● Use URIs as names for things● Use HTTP URIs so that people
can look up those names.● When someone looks up a
URI, provide useful information.
● Include links to other URIs so that they can discover more things.
Tim Berners-Lee, July 2006
My Movie DB
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
http://mymovie.db/movie0362
● Use URIs as names for things● Use HTTP URIs so that people
can look up those names.● When someone looks up a
URI, provide useful information.
● Include links to other URIs so that they can discover more things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie2449
http://mymovie.db/movie5112
http://mym
ovi e.db
/movie
13 42
My Movie DB
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
http://mymovie.db/movie0362
● Use URIs as names for things● Use HTTP URIs so that people
can look up those names.● When someone looks up a
URI, provide useful information.
● Include links to other URIs so that they can discover more things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie2449
http://mymovie.db/movie5112
http://mym
ovi e.db
/movie
13 42
My Movie DB
http://mym
ovie.db/movie2449
?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
http://mymovie.db/movie0362
● Use URIs as names for things● Use HTTP URIs so that people
can look up those names.● When someone looks up a
URI, provide useful information.
● Include links to other URIs so that they can discover more things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie2449
http://mymovie.db/movie5112
http://mym
ovi e.db
/movie
13 42
My Movie DB
http://mym
ovie.db/movie2449
?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
http://mymovie.db/movie0362
● Use URIs as names for things● Use HTTP URIs so that people
can look up those names.● When someone looks up a
URI, provide useful information.
● Include links to other URIs so that they can discover more things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie2449
http://mymovie.db/movie5112
http://mym
ovi e.db
/movie
13 42
My Movie DB
http://mym
ovie.db/movie2449
?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
http://mymovie.db/movie0362
● Use URIs as names for things● Use HTTP URIs so that people
can look up those names.● When someone looks up a
URI, provide useful information.
● Include links to other URIs so that they can discover more things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie2449
http://mymovie.db/movie5112
http://mym
ovi e.db
/movie
13 42
My Movie DB http://geo.db/cityCJ
http
://ge
o.d
b/c
ou
ntry
7
http://geo.db/country21
http://geo.db/cityXA
http://mym
ovie.db/movie2449
?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
http://mymovie.db/movie0362
● Use URIs as names for things● Use HTTP URIs so that people
can look up those names.● When someone looks up a
URI, provide useful information.
● Include links to other URIs so that they can discover more things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie2449
http://mymovie.db/movie5112
http://mym
ovi e.db
/movie
13 42
My Movie DB http://geo.db/cityCJ
http
://ge
o.d
b/c
ou
ntry
7
http://geo.db/country21
http://geo.db/cityXA
http://mym
ovie.db/movie2449
?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
● The Web: a huge, globally distributed dataspace
● Querying this dataspace opens new possibilities:● Aggregating data from different sources● Integrating fragmentary information● Achieving a more complete view
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 1: data centralization
● Querying a collection ofcopies from all relevantdatasets
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 1: data centralization● Querying a collection of
copies from all relevantdatasets
● Misses unknown or new sources● Collection probably out of date● Will it scale?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 2: federated query processing
● Querying a mediator whichdistributes subqueries torelevant sources andintegrates the results
???
?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 2: federated query processing● Querying a mediator which distributes
subqueries to relevant sources andintegrates the results
● Requires sources toprovide a query service
● Requires informationabout the sources
● Misses unknownor new sources
???
?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Main drawback:
You have to know the relevantdata sources in advance.
You restrict yourself tothe selected sources.
You do not tap thefull potential of
the Web !
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
A novel approach:
Link Traversal Based Query Execution
Allows data sources to be discovered at runtime
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
Queried data
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
Queried data
filmingLocation
http://.../movie2449
statistics
?loc
Query unemp_rate?ur
?stat
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
Queried data
filmingLocation
http://.../movie2449
statistics
?loc
Query unemp_rate?ur
?stat
http://.../movie2449
?
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
Queried data
filmingLocation
http://.../movie2449
statistics
?loc
Query unemp_rate?ur
?stat
http://.../movie2449
?
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
Queried data
filmingLocation
http://.../movie2449
statistics
?loc
Query unemp_rate?ur
?stat
http://.../movie2449
?
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
Queried data
filmingLocation
http://.../movie2449
statistics unemp_rate?ur
?stat
?loc
Query
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
Queried data
filmingLocationhttp://geo.../Italyhttp://.../movie2449
filmingLocation
http://.../movie2449
statistics unemp_rate?ur
?stat
?loc
Query
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
Queried data
http://geo.../Italy
?loc
filmingLocationhttp://.../movie2449
filmingLocation
http://.../movie2449
statistics unemp_rate?ur
?stat
?loc
Query
http://geo.../Italy
Main Idea
Queried data
http
://ge
o.../
Italy
?
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
http://geo.../Italy
?loc
filmingLocation
http://.../movie2449
statistics unemp_rate?ur
?stat
?loc
Query
Main Idea
Queried data
http
://ge
o.../
Italy
?
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
http://geo.../Italy
?loc
filmingLocation
http://.../movie2449
statistics unemp_rate?ur
?stat
?loc
Query
Main Idea
Queried data
http
://ge
o.../
Italy
?
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
http://geo.../Italy
?loc
filmingLocation
http://.../movie2449
statistics unemp_rate?ur
?stat
?loc
Query
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
http://geo.../Italy
?loc
Queried data
filmingLocation
http://.../movie2449
statistics unemp_rate?ur
?stat
?loc
Query
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
http://geo.../Italy
?loc
Queried data
filmingLocation
http://.../movie2449 unemp_rate?ur
statistics ?stat
?loc
Query
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
http://geo.../Italy
?loc
Queried data
filmingLocation
http://.../movie2449 unemp_rate?ur
statistics ?stat
?loc
Query
statistics http://stat.db/.../it
http://geo.../Italy
Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:● Evaluate parts of the query on a
continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the queried data set
http://geo.../Italy
?loc
Queried data
http://geo.../Italy http://stats.db/../it
?stat?loc
statistics http://stat.db/.../it
http://geo.../Italy
filmingLocation
http://.../movie2449 unemp_rate?ur
statistics ?stat
?loc
Query
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
In a Nutshell
● Link traversal based query execution:● Evaluation on a continuously augmented dataset● Discovery of potentially relevant data during execution● Discovery driven by intermediate solutions
● Main advantage:● No need to know all data sources in advance
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Real-World Examples
SELECT DISTINCT ?author ?phone WHERE {
?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> .
?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .
FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .
?pub swrc:author ?author .
{ ?author owl:sameAs ?authorAlt }
UNION
{ ?authorAlt owl:sameAs ?author }
?authorAlt foaf:phone ?phone
}
Return phone numbers ofauthors of ontology engineering papers
at ESWC'09.
2
297
161min 30sec
# of query results
# of retrieved graphs
# of accessed servers
avg. execution time
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
● Iterator:● implements an operation● is a group of functions:
OPEN, GETNEXT, CLOSE
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
● Iterator:● implements an operation● is a group of functions:
OPEN, GETNEXT, CLOSE
● Query execution usesa chain of iterators
I1
I2
I3
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
● Iterator:● implements an operation● is a group of functions:
OPEN, GETNEXT, CLOSE
● Query execution usesa chain of iterators
● Each iterator responsiblefor a single triple pattern
filmingLocation
http://.../movie2449
statistics
?loc
Query unemp_rate?ur
?stat
filmingLocation
http://.../movie2449
?loc
statistics
?loc
?stat
unemp_rate
?ur
?stat
I1
I2
I3
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Find matching triples match(tpc u r
) in queried data set
3. Create solution μ' for each t in match(tpc u r
)
4. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Find matching triples match(tpc u r
) in queried data set
3. Create solution μ' for each t in match(tpc u r
)
4. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
tpi = ( ?loc ex:stats ?s )
μc u r
= { ?p → http://ex... , ?loc → http://geo... } Example
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Find matching triples match(tpc u r
) in queried data set
3. Create solution μ' for each t in match(tpc u r
)
4. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
tpc u r
= ( http://geo... ex:stats ?s )
tpi = ( ?loc ex:stats ?s )
μc u r
= { ?p → http://ex... , ?loc → http://geo... } Example
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Find matching triples match(tpc u r
) in queried data set
3. Create solution μ' for each t in match(tpc u r
)
4. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
tpc u r
= ( http://geo... ex:stats ?s )
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
tpi = ( ?loc ex:stats ?s )
μc u r
= { ?p → http://ex... , ?loc → http://geo... } Example
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Find matching triples match(tpc u r
) in queried data set
3. Create solution μ' for each t in match(tpc u r
)
4. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
tpc u r
= ( http://geo... ex:stats ?s )
μ' = { ?s → http://db... }
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
tpi = ( ?loc ex:stats ?s )
μc u r
= { ?p → http://ex... , ?loc → http://geo... } Example
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Find matching triples match(tpc u r
) in queried data set
3. Create solution μ' for each t in match(tpc u r
)
4. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
tpc u r
= ( http://geo... ex:stats ?s )
{ ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... }
μ' = { ?s → http://db... }
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
tpi = ( ?loc ex:stats ?s )
μc u r
= { ?p → http://ex... , ?loc → http://geo... } Example
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
● Results of Ii are solutions for tp
1 , … , tp
i
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Iterator based Query Execution
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal
● The queried data set grows
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
● The queried data set grows
● Look-up Requirement:
Do not evaluate tpc u r
until the
queried data set contains all
data that can be retrieved from
all URIs in tpc u r
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Application to Link Traversal
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Ensure look-up requirement for tpc u r
3. Find matching triples match(tpc u r
) in queried data set
4. Create solution μ' for each t in match(tpc u r
)
5. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
Application to Link Traversal
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Ensure look-up requirement for tpc u r
3. Find matching triples match(tpc u r
) in queried data set
4. Create solution μ' for each t in match(tpc u r
)
5. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
Initiate look-upsand wait
Application to Link Traversal
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Ensure look-up requirement for tpc u r
3. Find matching triples match(tpc u r
) in queried data set
4. Create solution μ' for each t in match(tpc u r
)
5. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
Initiate look-upsand wait
Application to Link Traversal
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Blocked Query Execution
● Waiting for URI look-upsblocks query execution
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Initiate look-upsand wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
● Waiting for URI look-upsblocks query execution
● URI prefetching: when a URIis bound to a variable initiatelook-up in the background
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Ensure look-upis finished
Initiate look-up
Initiate look-upsand wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
1. Substitute tpc u r
= μc u r
[ tpi ]
2. Ensure look-up requirement for tpc u r
3. Find matching triples match(tpc u r
) in queried data set
4. Create solution μ' for each t in match(tpc u r
)
5. Initiate parallel look-up for each new URI in μ'
6. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Ensure look-upis finished
Initiate look-up
Initiate look-upsand wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Wait until look-upis finished
Initiate look-up
Initiate look-upsand wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
● Even with URI prefetchingquery execution may block
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Wait until look-upis finished
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
● Even with URI prefetchingquery execution may block
● Possible solutions:● Program parallelism● Asynchronous pipeline
● Drawback: requires major rewrite of existing query engines
Ii - 1
for tpi - 1
Ii for tp
i
Ii + 1
for tpi + 1
Wait until look-upis finished
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Postponing Iterator
● Enabled by an extension of the iterator paradigm:● New function POSTPONE: take most recently provided
result back● Adjusted GETNEXT: either return the next result or return
a formerly postponed result
● POSTPONE allows to temporarily reject input solution μc u r
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
1. Substitute tpc u r
= μc u r
[ tpi ]
2. POSTPONE μc u r
if look-up requirement doesn't hold for tpc u r
3. Find matching triples match(tpc u r
) in queried data set
4. Create solution μ' for each t in match(tpc u r
)
5. Initiate parallel look-up for each new URI in μ'
6. Return each μc u r
U μ' as a result
Ii for tp
i
Results from Ii - 1
http://geo.db/country/US http://stats.example.org/USstatistics
http://geo.db/country/IT http://stats.example.org/ITstatistics
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
?cStats?c
μc u r
Postponing Iterator
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach➢ Introduction to the Iterator Paradigm➢ Application to Link Traversal based Query Execution➢ URI Prefetching➢ Extension to the Iterator Paradigm➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Evaluation
● Implementation: Semantic Web Client Library (SWClLib) http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
● Berlin SPARQL Benchmark (BSBM)● Simulates e-commerce scenario● Mix of 12 SPARQL queries● Generates datasets of different sizes (scaling factor)
● Simulation of the Web of Linked Data● Linked Data server publishes BSBM datasets
● Experiment● Adjusted BSBM queries link to the simulation server● Execute query mix with SWClLib
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Evaluation
10 20 30 40 50 60
0
50
100
150
200
250
w/o prefetchingw/ prefetchingnon-blocking + prefetchingall data retrieved in advance
avg.
exe
cuti
on
tim
e pe
r qu
e ry
mix
in s
eco n
ds
BSBM scaling factor
scal.factor # of triples # of entities
10 4,971 613
20 8,485 928
30 11,999 1,245
40 16,918 1,845
50 22,616 2,599
60 26,108 2,914
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Take-away Summary
● Novel query execution approach for the Web of Data:● Utilizes the characteristics of the Web● Traverses RDF links during query execution● Discovery of new data sources● No need to know all data sources in advance
● Implementation approach:● Iterator based execution with URI Prefetching● Extension of the iterator paradigm (POSTPONE)
● New research challenges:● Improving result completeness● Investigating suitable caching strategies
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Try it!
● SQUIN http://squin.org● Provides SWClLib functionality as a Web service● Accessible like a SPARQL endpoint
● Public SQUIN service at
http://squin.informatik.hu-berlin.de/SQUIN/
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
These slides have been created byOlaf Hartig
http://olafhartig.de
This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)