wp3: provenance and access control irini fundulaki giorgos flouris institute of computer...
TRANSCRIPT
WP3: Provenance and Access Control
Irini FundulakiGiorgos Flouris
Institute of Computer Science-FORTH1st year review
Luxembourg, December 2011
18 24 30 366 120
Task 3.1ProvenanceManagement
Task 3.2Privacy, DRM and Access Control
Task 3.3Trust management
D 3.4 Trust management and inference system
FORTHFORTH
42 48
D 3.2 Provenance management and propagation through SPARQL queryand update languages
D 3.2 Provenance management and propagation through SPARQL queryand update languages
D 3.1 Access control
specification language, reasoning
and enforcement mechanisms
D 3.1 Access control
specification language, reasoning
and enforcement mechanisms
FORTHFORTH
EPFLEPFL
WP3: Work Plan View
D 3.3 Access control system andprivacy-aware languageD 3.3 Access control system andprivacy-aware language
Research Topics, Tasks and Partners
Objective: manage annotations of different forms and semantics over data, related to data access
Research Topics: Provenance, Access Control, Privacy, Digital Rights Management (DRM), Trust Management
Partners: FORTH, EPFL, KIT
Provenance
• Wikipedia: “… the origin or source of something or the history of the ownership or location of an object”
• W3C Incubator Group: “… is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource. […] Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.”
Provenance
• W3C Incubator Group: “With the arrival of massive amounts of Semantic Web Data […], provenance becomes an important factor in developing new Semantic Web applications.”
• Applications Data Trustworthiness, Reputation and Reliability Information Quality Data Integration and Exchange Reproducibility Argumentation (Decision Justification) Access Control Accountability Reasoning
Types of Provenance
• Coarse grained provenance used to reproduce a digital object or repeat an experiment (complex programs)
P´
I OP
I P O: coarse grained (workflow or dataflow provenance)
I´ O´
I’ P’ O’: fine grained (data provenance)
• Fine grained provenance refers to the transport of annotations between input and output data (query languages)
Workflow Provenance: Sensor Scenario
S1
S2
Readings Sea Temperature & Wind
Readings Sea Temperature & Wind
Complex Computationto predict the height of waves
Provenance:Complex Program executed on Input
Data
Data Provenance: Sensor Scenario
Provenance:annotations of the input tuples that
contributed to the query results
R2
sensor database
Sensor Latitude Annot.
S1
S2
23° 26’ 21”N
23° 26’ 21”N
t3
t4
R1Sensor Readgs Annot.
S1
S2
8B
2B
t1
Time
00:19
01:50 t2
sensor readings
DB Server
R1 R2
Sensor Latitude Annot.
S1
S2
23° 26’ 21”N
23° 26’ 21”N
{t1,t3}
{t2,t4}
Time
00:19
01:50
Readgs
8B
2B
Data Provenance Models
Annotation Models: provenance computation is coupled with a particular application and a particular assignment of the provenance of source data
When the annotation of the input tuple
changes, we must re-executethe query to obtain the annotation
of theresult tuples
R2Sensor Latitude Annot.
S1
S2
23° 26’ 21”N
23° 26’ 21”N
1
1
Sensor Readgs Annot.
S1
S2
8B
2B
1
Time
00:19
01:50 0
The annotation of a join tuple is computed using operator x 0 x 0 = 0, 1 x 0 = 0, 1 x 1 = 1
R1 R2
Sensor Latitude Annot.
S1
S2
23° 26’ 21”N
23° 26’ 21”N
1
Time
00:19
01:50
Readgs
8B
2B 0
R1
Data Provenance Models
Abstract Models: provenance annotations (referred to as tokens) and operators are abstract.
When the annotation of the input tuplechanges, the annotation of
the result tuple is re-computed byevaluating the annotation expression only
R1
R2
The annotation of a join tupleis modeled by the “x” operator
R1 R2
Sensor Latitude Annot.
S1
S2
23° 26’ 21”N
23° 26’ 21”N
T3
T4
Sensor Readgs Annot.
S1
S2
8B
2B
T1
Time
00:19
01:50 T2
Sensor Latitude Annot.
S1
S2
23° 26’ 21”N
23° 26’ 21”N
T1 x T3
T2 x T4
Time
00:19
01:50
Readgs
8B
2B
Data Provenance Models
Abstract Models:Abstract tokens and operators are assigned concrete values, only when the concrete value of an annotation must be computed
R1
R2
R1 R2
Sensor Latitude Annot.
S1
S2
23° 26’ 21”N
23° 26’ 21”N
T3
T4
Sensor Readgs Annot.
S1
S2
8B
2B
T1
Time
00:19
01:50 T2
Sensor Latitude Annot.
S1
S2
23° 26’ 21”N
23° 26’ 21”N
T1 x T3
T2 x T4
Time
00:19
01:50
Readgs
8B
2B
Data Quality Application:
• abstract tokens T1, T2, T3, T4 take values 1 and 0
• abstract operator “x” is replaced by logical AND
0
1
1
1
1
01 1
Abstract Data Provenance Models
• Benefits: – in the presence of provenance updates in the
input, we need to evaluate the value of the provenance of the affected tuples only
– different applications can assign different concrete values to abstract tokens and operators, for the same data
• Challenges: Trade-off between provenance storage over computation efficiency– storage of large provenance expressions– efficient computation of provenance for dynamic
data
Data Provenance
• RDFS reasoning– Given a set of RDF triples whose explicit
provenance is known, and RDFS reasoning rules what is the provenance of the implicit RDF triples?
• SPARQL– Given a set of RDF triples whose explicit
provenance is known, and a SPARQL query, what is the provenance of the query result?
RDFS Reasoning
(A1, sc, A3)
(A1, sc, A2)(A2, sc, A3)
(&r, sc, A2)
(&r, type, A1)(A1, sc, A2) C3C1
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
SSN Ontology
C2
&s1&s1
C4
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
SSN Ontology
C2
C3C1
&s1&s1
C4
type: sc (subclassOf):
Given a set of RDF triples (RDF Graph) whose explicit provenance is known, and RDFS entailment rules
what is the provenance of the implicit RDF triples?
Given a set of RDF triples (RDF Graph) whose explicit provenance is known, and RDFS entailment rules
what is the provenance of the implicit RDF triples?
?
?
??
RDFS Reasoning
• colors to capture the provenance of explicit and implicit data and schema RDF triples
• quadruples to represent provenance information
• Provenance model: commutative semi-group structure (C, +)
–C: set of colors,
–binary operation “+” to compose colors of the input triples
RDFS Reasoning
• Pediaditis P., Flouris G., Fundulaki I., Christophides V. On Explicit Provenance Management in RDF/S Graphs. In Theory and Practice of Provenance (TaPP-2009)
• Flouris G., Fundulaki I., Pediaditis P., Theoharis Y., Christophides V. Coloring RDF Triples to capture Provenance. In ISWC 2009.
Provenance for SPARQL
• We showed that existing provenance models for positive
relational algebra can capture the provenance of SPARQL
(without OPTIONAL)
• We follow the approach by Karvounarakis et. al. in
Provenance Semirings, PODS 2007 to develop a model for
full SPARQL
– records the input tuples and the operators used to
compute the query results
Given a set of RDF triples (RDF Graph) whose explicit provenance is known, and a SPARQL query
what is the provenance of the result?
Given a set of RDF triples (RDF Graph) whose explicit provenance is known, and a SPARQL query
what is the provenance of the result?
Provenance Model for SPARQL+
• K: set of provenance tokens• : operator for SPARQL join• : operator for SPARQL union
subject predicate object
S1 type Sensor
S1 Readgs &r1
S1 Latitude 23° 26’ 21”N
S2 type Sensor
S2 Readgs &r2
S2 Latitude 23° 26’ 21”N
&r1 value 8B
00:19time&r1
&r2 value 2B
01:50time&r2
prov
t1
t5
t2
t3
t6
t4
t7
t8
t9t10
select ?s, ?lwhere { ?s type Sensor . ?s latitude ?l }
SPARQL Query: return the sensor and its latitude
Provenance Model for SPARQL+
?s type Sensor . ?s Latitude ?lQ =
The evaluation of a triple pattern over T is a set of mappings (?variable, ?value)
?sS1
S2
1
?s type Sensor
1 t1t32
2
?s latitude ?l
?sS1
S2
?l23° 26’ 21”N
23° 26’ 23”N
t2t4
3
4
subject predicate object
S1 type Sensor
S1 Readgs &r1
S1 Latitude 23° 26’ 21”N
S2 type Sensor
S2 Readgs &r2
S2 Latitude 23° 26’ 21”N
&r1 value 8B
00:19time&r1
&r2 value 2B
01:50time&r2
prov
t1
t5
t2
t3
t6
t4
t7
t8
t9t10
Provenance Model for SPARQL+
?s type Sensor . ?s Latitude ?lQ =
?sS1
S2
1
?s type Sensor
1 t1t32
2
?s latitude ?l
?sS1
S2
?l23° 26’ 21”N
23° 26’ 23”N
t2t4
3
4
The result of a join between two triple patterns contains all mappings that have the same value for their common variable(s)
subject predicate object
S1 type Sensor
S1 Readgs &r1
S1 Latitude 23° 26’ 21”N
S2 type Sensor
S2 Readgs &r2
S2 Latitude 23° 26’ 21”N
&r1 value 8B
00:19time&r1
&r2 value 2B
01:50time&r2
prov
t1
t5
t2
t3
t6
t4
t7
t8
t9t10
3
?sS1
S2
?l23° 26’ 21”N
23° 26’ 23”N
t1
t3
t2
t4
Provenance for SPARQL
• Theoharis Y., Fundulaki I., Karvounarakis G., Christophides V. On Provenance of Queries on Linked Web Data. In IEEE Internet Computing:Provenance in Web Applications, 2011.
Access Control
• Refers to the ability to permit or deny the use of a particular resource by a particular entity
• Crucial for sensitive content since it ensures the selective exposure of information to different classes of users
RDF Access Control
• In general, an access control model specifies
– the access annotations
– conflict resolution policy to resolve ambiguous
access annotations
– default semantics used to annotate data that
are not in the scope of any authorization
• Access Authorizations specify (by a query) the
access annotations for data
Access Control
• Access Annotations can be – boolean values
• true/false (grant/deny access permission)– confidentiality levels
• low, medium, high• Conflict Resolution Policy depends on the type of access
annotations– boolean values:
• deny overrides grant access annotation– confidentiality levels
• high confidentiality overrides medium, medium overrides low
• Default Semantics depend on the type of access annotations
Fine-grained Access Control Framework for RDF Data
• We encode access annotations of RDF triples using
quadruples
• We propose an abstract access control model defined
by a set of abstract tokens and abstract operators to
model
– the computation of access annotations of RDF
triples considering RDFS inference
– the propagation of access annotations
– conflicting and missing access annotations
Abstract Tokens
• L: set of abstract access control tokens
• L default access token
– assigned to triples that have not an explicitly
assigned access token
Abstract Operators
• Entailment Operator ⊙ to compute the access
annotations of implied quadruples
• Propagation Operator to model the
propagation of access annotations
• Conflict Resolution Operator to resolve
ambiguous access annotations
Entailment Operator ⊙
• binary operator to model the computation of the annotation of an implicit RDF quadruple for the subclass, subproperty and type hierarchies in an RDF graph
– Properties:
• Associativity:
• Commutativity
(A1, sc, A2, l1)
(A2, sc, A3, l2)
(A1, sc, A3, l1 ⊙ l2)
⊙ l2l1l4 ⊙( ) ⊙ l2l1l4 ⊙( )=
l1l4 ⊙ = l1 l4⊙
The order of the application of inference rules is not important
Entailment Operator ⊙
(A1, sc, A3 ,l1 ⊙ l2)
(A1, sc, A2 ,l1) (A2, sc, A3 ,l2)
(&r, type, A2 ,l1 ⊙ l2)
(&r, type, A2 ,l1) (A2, sc, A3 ,l2)
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
l2
l3l1
&s1&s1
l4
rdfs:Classrdfs:Classl0
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
l2
l3l1
&s1&s1
l4
type: sc (subclassOf):
l0
rdfs:Classrdfs:Class
l1l4 ⊙
l1 ⊙ l2
⊙ l2l1l4 ⊙( )
Propagation Operator
• unary operator to model propagation of access annotations along the subclass/subproperty and type hierarchies in an RDF Graph– a class inherits the annotation of its superclass,
an instance of a class inherits the annotation of its class, etc.
– Properties:
• Idempotence:
(A1, type, class, l1)
(&r1, type, A1, (l1 ))
(&r1, type, A1, l2)
l0 l0 ( ( )) = ( )
We do not care how many times an annotationis propagated
Propagation Operator
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
l2
l3l1
&s1&s1
l4
type: sc (subclassOf):
l0
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
l2
l3l1
&s1&s1
l4
rdfs:Classrdfs:Class
l0
rdfs:Classrdfs:Class
l0
⊙ l2l1l4 ⊙( )
(&r, type, A1 ,l1)(A1, type, rdfs:Class ,l2)((&r, type, A1 , ) l2
Conflict Resolution Operator
• binary operator to resolve ambiguous access labels
– Properties:
• Associativity:
• Commutativity:
• Idempotence:
(A1, sc, A2, L1)(A1, sc, A2, L2) (A1, sc, A2, L1 L2)
l2l0 ( )l1 = l0 ( l2l1 )
l0 l1 = l0l1
l1 = l1l1
Computing Abstract Access Control Annotations
• assign access annotations to triples of the RDF
graph to obtain quadruples
• apply RDFS inference rules on quadruples to
obtain the implicit annotated quadruples
• apply propagation rules on quadruples to
compute their propagated annotations
• apply the conflict resolution operator to resolve
ambiguities
Computing Abstract Access Control Annotations (example)
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
l2
l3l1
l0
rdfs:Classrdfs:Class
l5
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
l2
l3l1
l0
rdfs:Classrdfs:Class
l5
⊙ l2l1 ⊙( ) l0⊙ l3l5( )⊙ l0
⊙ l2l1 ⊙( ) l0 ⊙ l3l5( )⊙ l0 l0(SensingDevice, type, rdfs:Class, )
Concrete Policies
• A concrete policy assigns concrete values to the abstract tokens and operators
• Example
– Boolean values assigned to abstract tokens
• false: deny access
• true: grant access
– Conjunction assigned to entailment operator
• an implied triple is accessible iff all its implying triples have been granted access
– Disjunction assigned to Conflict Resolution operator
• grant overrides deny annotation
– Identity assigned to propagation operator
Concrete Policy (example)
(SensingDevice, type, rdfs:Class, ⊙ l2l1 ⊙(( ) l0
l2
l3
l1l0
l5
false (F)
true (T)
Assignment of abstract tokens to values
Assignment of abstract operators to concrete ones
⊙ () ()
propagation (¬)negation
entailment conjunction
conflict resolution disjunction
(SensingDevice, type, rdfs:Class, ( (( (¬ F) )(F F ) F ) T T) T)
) ⊙ l3l5 )⊙ l0(( ) l0 )( )
T
References
Flouris G., Fundulaki I., Michou M., Papakonstantinou V., Antoniou G. Access Control for RDFS Graphs Using Abstract Models. Ongoing work.
Computing Abstract Access Control Expressions (example)
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
l2
l3l1
l0
rdfs:Classrdfs:Class
(A1, sc, A3 ,l1 ⊙ l2)
(A1, sc, A2 ,l1)(A2, sc, A3 ,l2)
(&r, type, A2 ,l1 ⊙ l2)
(&r, type, A2 ,l1)(A2, sc, A3 ,l2)
l5
Sensing DeviceSensing Device
DeviceDevice
Sensor
Sensor
SystemSystem
l2
l3l1
l0
rdfs:Classrdfs:Class
l5
l1 ⊙ l2
⊙ l3l5
⊙ l2l1 ⊙( ) l0⊙ l3l5( )⊙ l0
l0
(&r, type, A1 ,l1)(A1, type, rdfs:Class ,l2)
(&r, type, A1 , ) l2
(SensingDevice, type, rdfs:Class, ⊙ l2l1 ⊙( ) l0 ⊙ l3l5( )⊙ l0 l0 )