putting the spirit of the web back into semantic web … · 2010-11-22 · motivation vision:...

28
PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB QUERYING Cosmin Basca, Abraham Bernstein

Upload: others

Post on 10-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB QUERYING

Cosmin Basca, Abraham Bernstein

Page 2: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Motivation

  Vision: towards a globally query-able and truly open Semantic Web

  We want to:  Query the Web of Data (WoD) on-demand  Provide up-to date results (within the query execution

interval, typically seconds)   Impose no or limited restrictions on data publishers  Be flexible regarding participating triple stores  Preserve the “openness” of WoD

Page 3: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Openness

  By “openness” we mean:  Assume that servers are:

  Independent (unaware of other servers)  Heterogeneous

 Assume no control and limited knowledge over their distribution & availability

 Data publishing:  Not having to adhere to fixed guidelines

Page 4: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Motivating example

  Consider:  Sites holding LOD Linked Movie and DBPEDIA data  Find out which movies and related information, were

produced by “Producers Circle” studios SELECT  ?title  ?photoCollection  ?name  WHERE  {    

 ?film          dc:title      ?title;                                  movie:actor      ?actor;                                  owl:sameAs      ?sameFilm.  

 #  link  to  other  datasets  

 ?actor        a  foaf:Person;                                  movie:actor_name    ?name  .    

 ?sameFilm  dbpedia:hasPhotoCollection    ?photoCollection.      ?sameFilm  dbpedia:studio        ‘‘Producers  Circle’’.  

}  

Page 5: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Problem

  Key space  A given in SW via URIs  Tradeoff between globalism and performance (address

space vs. size in bytes)

  Joining datasets

  Currently no system / algorithm to achieve goal entirely

Page 6: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Problem

High

Local

Restrictiveness

Goal

Cloud

Global

Clustered

Fixed id partitioning

Triple levelFederation

Low

Sesame

URIInstance levelFederation

Intended Addressing Space

Page 7: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Problem

High

Local

Restrictiveness

Goal

YARS2

Cloud

Global

Clustered

Fixed id partitioning

Triple levelFederation

Low

AllegroGraph

4Store

Sesame

URIInstance levelFederation

Intended Addressing Space

Page 8: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Problem

High

Local

Restrictiveness

Goal

SemWiq

YARS2

Cloud

Global

Clustered

Fixed id partitioning

Triple levelFederation

Low

AllegroGraph

4Store

Sesame

DARQ

URIInstance levelFederation

Intended Addressing Space

Page 9: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Problem

RDF Peers

Hartig et. al.

High

Local

Restrictiveness

Goal

SemWiq

YARS2

Cloud

Global

Clustered

Fixed id partitioning

Triple levelFederation

Low

AllegroGraph

4Store

Sesame

DARQ

URIInstance levelFederation

Intended Addressing Space

Page 10: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Problem

RDF Peers

Hartig et. al.

High

Local

Restrictiveness

?

Goal

SemWiq

YARS2

Cloud

Global

Clustered

Fixed id partitioning

Triple levelFederation

Low

AllegroGraph

4Store

Sesame

DARQ

URIInstance levelFederation

Intended Addressing Space

Closer to Goal

Page 11: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Avalanche

!"#"$%&!"#"$%&!'()")*)$$%+"#),&!,-.%&'(")"&*&&&&!/#$.&+,-./.01&!"#"$%0&111&&&&222&&&!2-.%3#$.&+341+/5-6.7+/8&99:;8+7,1;6&$/;,01<<1=

Avalanche SPARQL endpoint

Page 12: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Avalanche

!"#"$%&!"#"$%&!'()")*)$$%+"#),&!,-.%&'(")"&*&&&&!/#$.&+,-./.01&!"#"$%0&111&&&&222&&&!2-.%3#$.&+341+/5-6.7+/8&99:;8+7,1;6&$/;,01<<1=

Endpoints Directory or Search Engine

Avalanche SPARQL endpoint

1

Page 13: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Avalanche

!"#"$%&!"#"$%&!'()")*)$$%+"#),&!,-.%&'(")"&*&&&&!/#$.&+,-./.01&!"#"$%0&111&&&&222&&&!2-.%3#$.&+341+/5-6.7+/8&99:;8+7,1;6&$/;,01<<1=

Endpoints Directory or Search Engine

Avalanche SPARQL endpoint

1 2

Page 14: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Avalanche

!"#"$%&!"#"$%&!'()")*)$$%+"#),&!,-.%&'(")"&*&&&&!/#$.&+,-./.01&!"#"$%0&111&&&&222&&&!2-.%3#$.&+341+/5-6.7+/8&99:;8+7,1;6&$/;,01<<1=

Endpoints Directory or Search Engine

Avalanche SPARQL endpoint

1 23

Page 15: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Challenges and Implications

  Web of Data is growing: LoD ~25B triples (Sept 2010)   Lack of (high) quality statistics (join estimations)   Physical constraints

  Bandwidth, latency, unavailability, many sites

  Completeness not considered   First K results

  Exponential search space due to flexibility   Efficient heuristics to search

Page 16: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Architecture

AVALANCHE Mediator Execution Pipeline

AVALANCHE endpoints Web Directory or Search Engine

query preprocessing phase

query execution phase

PlansQueue

Plan Generator

FinishedPlansQueue

ResultsQueue

Query Stopper

Executor

MaterializerExecutor

Executor

Executor

Materializer

Materializer

Materializer

Res

ults

Statistics Requester QueryQuery

Parser

Page 17: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Planning

AVALANCHE Mediator Execution Pipeline

AVALANCHE endpoints Web Directory or Search Engine

query preprocessing phase

query execution phase

PlansQueue

Plan Generator

FinishedPlansQueue

ResultsQueue

Query Stopper

Executor

MaterializerExecutor

Executor

Executor

Materializer

Materializer

Materializer

Res

ults

Statistics Requester QueryQuery

Parser

Page 18: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Planning

  Greedy multipath search inspired by Best First Search

  Total space is O(n3)!, but size increases by M * H with each exploratory step (H=number of sites, M=number of paths)

  In practice the space is tractable: most queries are not fully connected graphs!

  Can be further reduced  Windowed approach

Page 19: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Planning

  7 triple patterns and 6 unbounded variables if graph is undirected and fully connected : 240 possible paths   In practice we have a sparse directed graph 11 paths

  Search step: each path assigned to all servers involved   i.e. for 100 hosts: 1100 states

  Join (average) paths to form full query graph   4 average joins to full graph: 4400 plans (ordered)

SELECT  ?title  ?photoCollection  ?name  WHERE  {    

 ?film            dc:title            ?title;                                  movie:actor            ?actor;                                  owl:sameAs            ?sameFilm.          #  link  to  other  datasets  

 ?actor        a  foaf:Person;                                    movie:actor_name      ?name  .    

 ?sameFilm  dbpedia:hasPhotoCollection  ?photoCollection.      ?sameFilm  dbpedia:studio      ‘‘Producers  Circle’’.  

}  

n(n −1)2n−3

Page 20: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Planning Heuristics

  Default   Extended

Edges(N1)CNTN1

min(CNTN1,CNTN 2)

,first node

,otherwise U=

1

(L +CNTN 2B

+CNTN1 +CNTN 2CNTN1

) Edges(Query)Edges(N2)

,first node C=

EU=

w1⋅ JOINN1,N 2 + w2⋅UN1,N 2

w2⋅UN1,N 2

,N1 N2 selective

,otherwise

JOINN1,N 2 ≈ −1k⋅ln(m⋅ Z1 + Z2 − Z12

Z1⋅ Z2)

ln(1− 1m)

  L=latency   B=bandwidth   Cost to execute remote subquery   Cost to execute local subquery   Scaling factor (aid convergence)

  Bloom filters (expensive only selective queries)

  Zi=number of 0 bits in bloom filter i   K=number of bloom hash functions   M=size in bits of the bloom filter

Page 21: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Execution

AVALANCHE Mediator Execution Pipeline

AVALANCHE endpoints Web Directory or Search Engine

query preprocessing phase

query execution phase

PlansQueue

Plan Generator

FinishedPlansQueue

ResultsQueue

Query Stopper

Executor

MaterializerExecutor

Executor

Executor

Materializer

Materializer

Materializer

Res

ults

Statistics Requester QueryQuery

Parser

Page 22: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Execution

?sameFilm dbpedia:hasPhotoCollection ?photoCollection. ?sameFilm dbpedia:studio ‘‘Producers Circle’’.

?actor a foaf:Person. ?actor movie:actor_name ?name.

?film dc:title ?title. ?film movie:actor ?actor. ?film owl:sameAs ?sameFilm.

?sameFilm ?actor

q1 q2

q3

1) Join(q1,q2)

2) R1=Execute(q1)

3) Send(R1)

4) FR2=ExecuteFilter(R1)

5) Join(q2,q3)

6) Send(FR2)

7) FR3=ExecuteFilter(FR2)

8) Update(q3,q2)

10) Send(R3)

12) R2=Filter(FR2, FR3)

13) Send(R2)

14) R1=Filter(R1, R2) 9) R3=FR3

11) Update(q2,q1)

Page 23: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Materializing and Stopping

  materialization:   same as execution, but request string representation

from endpoints that completed the plan

  stopping:   timeout   relative saturation

 New results received over a sliding window

  first K results

Page 24: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Preliminary Results 5 sites, 35 million triples

0 1.5

3 4.5

6 7.5

9 10.5

12 13.5

Q1 Q1 Q2 Q2 Q3 Q3

Tim

e (s

econ

ds)

Queries

execution timeFirst Results (default)

Total Results (default)

First Results (extended)Total Results (extended)

Page 25: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Preliminary Results 5 sites, 35 million triples

0

40

80

120

160

200

240

280

Q1 Q1 Q2 Q2 Q3 Q3

#Res

ults

(uni

que)

Queries

# resultsFirst Results (default)

Total Results (default)

First Results (extended)Total Results (extended)

Page 26: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Preliminary Results 5 sites, 35 million triples

0 15 30 45 60 75 90

105 120 135 150 165 180

1 10 100 1000 10000

# N

ew R

esul

ts

# Total Results

Planner Convergence

Q1Q2Q3

Saturation Q1Saturation Q2Saturation Q3

Page 27: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Conclusions

  Avalanche:  Makes no or limited assumptions about data distribution

partitioning and availability  Provides up-to date results as exposed by the

endpoints  Flexible since it does not have knowledge about triple

store structure

Page 28: PUTTING THE SPIRIT OF THE WEB BACK INTO SEMANTIC WEB … · 2010-11-22 · Motivation Vision: towards a globally query-able and truly open Semantic Web We want to: Query the Web of

Demo

  See Avalanche live   visit us @ISWC demo and poster session

thank you