an experience on empirical research about rdf stream
DESCRIPTION
The invited talk I gave at the EMPIRICAL 2014 workshop at the ESWC 2014TRANSCRIPT
Dipartimento di
Elettronica, Informazione e
Bioingegneria
An Experience on Empirical
Research about RDF Stream
Processing
Daniele Dell’Aglio – [email protected]
Joint work with: Jean-Paul Calbimonte, Marco Balduini, Oscar Corcho
and Emanuele Della Valle
Dipartimento di Elettronica, Informazione e Bioingegneria
RDF Stream Processing in a nutshell
Continuous queries over RDF streams - infinite
sequences of time-stamped RDF statements (RDF
streams)
Bring together DSMS/CEP and Semantic Web research
fields
Several prototypes – with similar models – are available
today
Trend on evaluation and comparison of the existing
systems
26 May 2014 - EMPIRICAL@ESWC2014
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 2
Dipartimento di Elettronica, Informazione e Bioingegneria
The CQL model for RSPs
Transform a set of mappings in another set of
mappings
SPARQL 1.0/1.1 queries
Each set of mapping produced by the R2R operator
is transformed and appended to the output
stream
Operators: RStream, DStream, IStream
Converts the infinite stream of RDF elements in a
finite set of mappings
The window operators: time-based, tuple-based, …
S2R
operator
R2R
operator
R2S
operator
Input stream
Output stream Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 3
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
R2R operator
S2R - Time-based sliding window
S3
S4 S5
S6
S7
S8
S9 S10
S11
S12
S S1
S2
W(ω,β)
β
ω
t
width slide
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 4
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Implementations (oversimplified!)
C-SPARQL – RDF Store + Stream processor
RDF Store
Stream processor
Continuous query
continuous results
translator
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 5
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Implementations (oversimplified!)
C-SPARQL – RDF Store + Stream processor
CQELS: – Implemented from scratch. Focus on performance
RDF Store
Stream processor
Continuous query
continuous results
Native RSP Continuous
query continuous
results
translator
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 5
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Implementations (oversimplified!)
C-SPARQL – RDF Store + Stream processor
CQELS: – Implemented from scratch. Focus on performance
SPARQLstream: – Ontology-based stream query answering
RDF Store
Stream processor
Continuous query
continuous results
Native RSP Continuous
query continuous
results
translator
DSMS/CEP Continuous
query continuous
results rewriter
R2RML mappings
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 5
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Same inputs, different outputs…
And the continuous
query: – Where are Alice and
Bob, when they are together?
– With a tumbling window W(ω=β=5)
Execution 1° answer 2° answer
1 :hall [6] :kitchen [11]
2 :hall [5] :kitchen [10]
3 :hall [6] :kitchen [11]
4 - [7] - [12]
S1 S2 S3 S4 S
t 3 6 9 1
:alice :isIn :hall
:bob :isIn :hall
:alice :isIn :kitchen
:bob :isIn :kitchen
width slide
After 4 executions:
Let’s consider the following stream:
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 8
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
The first hypothesis
All the three systems show similar behaviours
Intuition: there are one or more parameters that are not
taken into account by the model
As consequence, the implementations can output
different correct answers
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 9
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
The first hypothesis
HP1: it is possible to have a unique correct answer if we
can control the time instant on which the sliding window
operator starts to work (t0)
S1 S2 S3 S4 S
t 3 6 9 1
:bob :isIn :hall :bob :isIn :kitchen
t0=0
:alice :isIn :hall :alice :isIn :kitchen
t0=1
t0=2
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 10
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
The experiment
We work on the difference between the time
instant on which the stream starts (ts) and the
query registration time (tq) – At each execution, we check the result
– We estimated the delay between tq and t0
tq
ts
Black box approach – we work on inputs/outputs
– the source code of all the systems
RSP
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 11
26 May 2014 - EMPIRICAL@ESWC2014
t0
Dipartimento di Elettronica, Informazione e Bioingegneria
Observation and explanation
As result, for each system – We identified the value of the t0 parameter
– We are able to produce the different results for each t0
value
Is it enough to claim that hypothesis 1 holds?
Exec 1° answer 2° answer
1 :hall [6] :kitchen [11]
2 :hall [5] :kitchen [10]
3 :hall [6] :kitchen [11]
4 - [7] - [12]
Window 1° answer 2° answer
t0=0 :hall [5] :kitchen [10]
t0=1 :hall [6] :kitchen [11]
t0=2 - [7] - [12]
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 12
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Some consideration on the experiment
Comparison:
– We ran the experiment multiple times to collect instances and check them
Reproducibility: can other researchers reproduce the
experiment?
– We released both the code and the data used for the experiment (see http://streamreasoning.org/Benchmarks/)
Repeatability: is the result universally valid?
– We changed inputs (streams and queries) and OS/JVM to verify if the hypothesis holds
– We repeated the experiment with different implementations (C-SPARQL, CQELS, etc.)
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 13
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Something more on repeatability…
We made some assumptions on the setting
26 May 2014 - EMPIRICAL@ESWC2014
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 14
S2R
R2R R2S S2R
S2R
From single
to multi
window
From single to
multi stream
Reasoning
q2
Static
knowledge Multiple
queries
Dipartimento di Elettronica, Informazione e Bioingegneria
As “side effect” of the first experiment, we
discovered that results of different systems are
not the same:
Intuition: t0 is not the only parameter our model
lacks
A more complex problem…
Exec 1° answer 2° answer
1 :hall [6] :kitchen [11]
2 :hall [5] :kitchen [10]
3 :hall [6] :kitchen [11]
4 - [7] - [12]
Exec 1° answer 2° answer
1 :hall [3] :kitchen [9]
2 No answers
3 :hall [3] :kitchen [9]
4 No answers
C-SPARQL CQELS
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 15
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
R2R operator
The SECRET framework
S3
S4 S5
S6
S7
S8
S9 S10
S11
S12
S S1
S2
W(ω,β)
β
ω
t0: When does the
window start?
(internal window
param)
TICK: When are
data stream
elements added to
the window?
Triple-based vs
graph-based
REPORT: When is the window content
made available to the R2R operator?
Non-empty content, Content-change,
Window-close, Periodic
t
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 16
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
SECRET and RSPs
HP2: given an input stream, a query, the value of t0 and
description of the RSP w.r.t. SECRET, we can determine
the answer that will be provided by the system
To investigate it, we built a software that evaluates in
batch the answer and matches it with the RSP one
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 17
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Observation and analysis
We prepared a set of seven
queries (to stress different part of
the sliding window)
We run each query multiple times
Most of the times, we can foresee the
answer that will be provided
CQ
ELS
C-S
PA
RQ
L
SP
AR
QL
stre
am
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 18
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Observation and analysis
We investigated the observations where there is
not a match, and we discovered that they were
errors in the implementations, such as: – Initialization
– Slide parameter
– Window contents
– Internal timestamp management
Conclusion: HP2 seems to be valid in the
considered setting
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 19
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
CSR-bench
The main outcome of our experience is CSR-bench, an
extension of the CSR benchmark
– More info at http://www.w3.org/wiki/CSRBench
Two main components:
– A common model for the RDF stream processor operational semantics
– An oracle (an automatic correctness validator), available at https://github.com/dellaglio/csrbench-oracle
– A test suite
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 20
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
References
Daniele Dell'Aglio, Marco Balduini, Emanuele Della Valle. On the need to
include functional testing in RDF stream engine benchmarks. 1st
International Workshop on Benchmarking RDF Systems (BeRSys2013)
Daniele Dell'Aglio, Jean-Paul Calbimonte, Marco Balduini, Óscar Corcho,
Emanuele Della Valle: On Correctness in RDF Stream Processor
Benchmarking. International Semantic Web Conference (2) 2013: 326-342
Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: C-
SPARQL: A continuous query language for RDF data streams. IJSC 4(1)
(2010) 3–25
Calbimonte, J.P., Jeung, H., Corcho, O., Aberer, K.: Enabling Query
Technologies for the Semantic Sensor Web. IJSWIS 8(1) (2012) 43–63
Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and
adaptive approach for unified processing of linked streams and linked data.
In: ISWC. (2011) 370–388
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 21
26 May 2014 - EMPIRICAL@ESWC2014
Dipartimento di Elettronica, Informazione e Bioingegneria
Thank you! Questions?
An Experience on Empirical Research about
RDF Stream Processing
Daniele Dell’Aglio
(DEIB, Politecnico di Milano)
Danie
le D
ell'
Agli
o -
Exper
imen
tal
rese
arc
h a
bout
RSP
s 22
26 May 2014 - EMPIRICAL@ESWC2014