forth research activities planetdata wp1-3 meeting (frankfurt, nov10) giorgos flouris, irini...
Post on 29-Dec-2015
217 Views
Preview:
TRANSCRIPT
FORTH Research Activities FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, PlanetData WP1-3 Meeting (Frankfurt,
Nov10)Nov10)Giorgos Flouris, Irini Fundulaki – FORTHGiorgos Flouris, Irini Fundulaki – FORTH
Slide 2 of 40November 25, 2010, Frankfurt, Germany
FORTH in PD Research
WP2: Quality Assessment and Context◦T2.1: Data Quality Assessment and Repair
(FUB, M1-M42)WP3: Provenance and Access Policies
◦T3.1: Provenance Management(FORTH, M1-M36)
◦T3.2: Privacy, DRM, and Access Control(FORTH, M1-M42)
Slide 3 of 40November 25, 2010, Frankfurt, Germany
Presentation Outline
Three main topics/tasks◦Repair (T2.1)◦Provenance (T3.1)◦Access control, privacy, DRM (T3.2)
Outline◦Summary and objectives◦Introduction and motivation◦Existing work and research plan◦Innovation◦Interactions within the project
Slide 4 of 40November 25, 2010, Frankfurt, Germany
PART I: Repairs
WP2: Quality Assessment and Context◦T2.1: Data Quality Assessment and Repair
(FUB, M1-M42)
Objective of our work◦Study methodologies for repairing invalidities in
a way that will cause minimal effects upon the data
Extra:◦Apply the same methodologies for updates
Slide 5 of 40November 25, 2010, Frankfurt, Germany
Repairs: Introduction
Validity rules to guarantee:◦Special semantics (e.g., acyclic subsumptions)◦Application-specific rules or requirements (e.g.,
functional properties)Validity: important dimension of quality
◦Violated at design time◦Violated during updates or other changes◦Violated when validity rules change
Solution: Repair◦Given an invalid graph, produce a valid one
that is as close as possible to the original
Slide 6 of 40November 25, 2010, Frankfurt, Germany
Repairing Process
Repair
Invalid graph Valid graph
Main Challenges:1) Several potential repairs2) Must find the “closest” ones
Major Questions:1) How to determine potential repairs?2) How is “distance” measured?
Assumptions:1) RDF/S graphs2) Rules expressed in DED form
Slide 7 of 40November 25, 2010, Frankfurt, Germany
Example
Validity Rules:• properties should have a unique domain and range• subject/object of a property instance should be correctly classified per the property’s domain/range
A rdf:type rdfs:ClassB rdf:type rdfs:ClassC rdf:type rdfs:ClassP rdf:type rdf:PropertyP rdfs:range BP rdfs:domain AP rdfs:domain Cx P yx rdf:type Ay rdf:type B
BP
A CP
yxP
Slide 8 of 40November 25, 2010, Frankfurt, Germany
A rdf:type rdfs:ClassB rdf:type rdfs:ClassC rdf:type rdfs:ClassP rdf:type rdf:PropertyP rdfs:range BP rdfs:domain AP rdfs:domain Cx P yx rdf:type Ay rdf:type B
Example (Resolution #1)
Validity Rules:• properties should have a unique domain and range• subject/object of a property instance should be correctly classified per the property’s domain/range
Problem: two domains for the same propertySolution: delete one of the domains
BP
A CP
yxP
Slide 9 of 40November 25, 2010, Frankfurt, Germany
BA CP
yxP
Solution #1A rdf:type rdfs:ClassB rdf:type rdfs:ClassC rdf:type rdfs:ClassP rdf:type rdf:PropertyP rdfs:range BP rdfs:domain Ax P yx rdf:type Ay rdf:type B
Example (Resolution #2)
Validity Rules:• properties should have a unique domain and range• subject/object of a property instance should be correctly classified per the property’s domain/range
Solution #2A rdf:type rdfs:ClassB rdf:type rdfs:ClassC rdf:type rdfs:ClassP rdf:type rdf:PropertyP rdfs:range BP rdfs:domain Cx P yx rdf:type Ay rdf:type B
Problem: incorrect classificationSolution: change domain ORmake x instance of C ORdelete property instance[by the rule syntax]
BP
A C
yxP
Slide 10 of 40November 25, 2010, Frankfurt, Germany
Potential Repairs: Challenges
Easy to determine how to resolve a single violated rule, but …◦ Several violations◦ Several repairing options per violation
Resolution interdependencies:◦ Repairing one violation in a certain way may cause
another violation◦ Repairing one violation in a certain way may repair
multiple violationsNeed for an exhaustive, rule-based search to
determine all potential repairs◦ Tree-based search (recursive)
K
Slide 11 of 40November 25, 2010, Frankfurt, Germany
Selecting a Potential Repair
Which repair should be returned?◦We want the repaired KB to be as close as
possible to the originalUser-defined notion of “distance”
◦Specifications for selecting “preferred repairs”◦Based on user-defined preferences
Preferred repair depends on the context and application, for example:◦Under complete knowledge, prefer removals◦In an open setting, prefer additions
Slide 12 of 40November 25, 2010, Frankfurt, Germany
Determining Preferences
Provide specifications to determine the preferred repair◦Important features of a potential repair
E.g.: additions, schema changes etc◦Comparing the values of important features
E.g.: minimize, “around” etc◦Combine features (preferences)
E.g.: prioritize, pareto-preference etcFlight analogy
◦Minimize number of stops◦Minimize cost
Slide 13 of 40November 25, 2010, Frankfurt, Germany
Repairs: Summary
Framework for repairing invalidities in RDF/S graphs◦Potential repairs determined using syntactical
manipulations over the validity rules◦Preferred repairs determined using formal
preferencesResearch plan:
◦Formal description of a repair framework◦Develop, optimize, experiment with, study
repair algorithm
Slide 14 of 40November 25, 2010, Frankfurt, Germany
Innovation
Existing approaches:◦In-built preferences◦In-built validity rules
Our proposal is:◦Flexible: preferences can be set at run-time◦Adaptable: different rules and preferences◦Intuitive: easy-to-define preferences◦Very general: different repair policies from the
literature can be expressed in our framework◦Easy to be implemented: we can use off-the-
shelf implementations for preference evaluation
Slide 15 of 40November 25, 2010, Frankfurt, Germany
Interactions within PD
Only within the WP2 and Task 2.1
Related deliverable: D2.2 (M18)
Slide 16 of 40November 25, 2010, Frankfurt, Germany
An Extra: Updates
Apply an update on an RDF/S graph, in the presence of validity rules◦Originally a part of WP1 – not any more
Similar ideas as with repair◦Apply the update (in a naïve manner)◦Repair the result
Taking into account what the update wasPrinciples
◦Success (update must be applied)◦Validity (result must be valid)◦Minimal change (minimal “distance” -
preferences)
Slide 17 of 40November 25, 2010, Frankfurt, Germany
PART II: Provenance
WP3: Provenance and Access Policies◦T3.1: Provenance Management
(FORTH, M1-M36)
Objectives of our work◦Provenance for RDF and RDFS (inference)◦Provenance for SPARQL query and update◦Efficient storage schemes for provenance
Slide 18 of 40November 25, 2010, Frankfurt, Germany
Provenance: Introduction
Provenance: information on the origin of data◦ From where and how the piece of data was obtained
Allows/supports:◦ Assessment of data trustworthiness and quality◦ Reproducibility of experiments◦ Justification of decisions (e.g., argumentation)◦ Access control, privacy, DRM, trust
Focus on RDF/S◦ Inspired by DB provenance and annotation models
Slide 19 of 40November 25, 2010, Frankfurt, Germany
Main Challenge
RDF triples
RDFS inference rulesg h
f which provenance?
ab
c
e
d
Provenance tag = colour◦A subset I of URIs distinguished from the set of
class and property names or types
A
B
C
?
Slide 20 of 40November 25, 2010, Frankfurt, Germany
Annotation Models
Annotation models: ◦Annotation computation coupled with a
particular application and a particular assignment of source data annotations
X Y Annot
a b t
c d t
Y Z Annot
b e
X Y Z Annot
a b e
R1 R2R1 R2
ft tf
re-evaluate the query
t: trustedf: untrusted
Slide 21 of 40November 25, 2010, Frankfurt, Germany
X Y Annot
a b c1
c d c2
Y Z Annot
b e c3
X Y Z Annot
a b e c1 x c3
R1 R2R1 R2
tt
t
t Λ t
f
t Λ f
Abstract Annotation Models
Abstract annotation models: ◦Abstract provenance tokens and operators are
substituted by appropriate concrete tokens for a particular application and assignment
Slide 22 of 40November 25, 2010, Frankfurt, Germany
Inference and ProvenanceColours: a subset I of URIs distinguished from the
set of class and property names or types
To model colour propagation through inference rules we define an operation ‘+’ to compose colours
(I, ‘+’) is a commutative semigroup◦ c1 + c2 = c2 + c1 (commutativity)
◦ c1 + (c2 + c3) = (c1 + c2) + c3 (associativity)◦ c + c = c (idempotence)
Slide 23 of 40November 25, 2010, Frankfurt, Germany
Inference and Provenance
Why provenance◦Which explicit triples contributed to get an implicit one?
◦Ignore how (i.e., which rules were used)◦A single operator ‘+’ for all inference rules
Ignore how many times a triple was used◦‘+’: idempotent [c + c = c]
Ignore the order of application◦‘+’: commutativec1 + c2 = c2 + c1
◦‘+’: associative c1 + (c2+c3) = (c1+c2) + c3
Slide 24 of 40November 25, 2010, Frankfurt, Germany
SPARQL Provenance Model
SPARQL construct queries generate triples in a manner similar to inference◦Except that it is query-dependent
Similar problemsAbstract annotation models can capture
the provenance of SPARQL◦Queries that do not consider the OPTIONAL
Operator◦Monotonicity no longer holds in the case of
OPTIONAL
Slide 25 of 40November 25, 2010, Frankfurt, Germany
Work So Far (Provenance)
Provenance models for RDF/S◦Pediaditis, Flouris, Fundulaki, Christophides. On
Explicit Provenance Management in RDF/S Graphs. TAPP-09.
◦Flouris, Fundulaki, Pediaditis, Theoharis, Christophides. Coloring RDF Triples to Capture Provenance. ISWC-09.
Provenance models for SPARQL◦Theoharis, Fundulaki, Karvounarakis,
Christophides. On Provenance of Queries on Linked Web Data. To appear in IEEE Internet Computing: Jan/Feb 2011 - Provenance in Web Applications.
Slide 26 of 40November 25, 2010, Frankfurt, Germany
Research Plans (Provenance)
How provenance (more expressive)Provenance for dynamically evolving dataSupport OPTIONAL (in SPARQL)Efficient storage schemes for provenanceApply this work on privacy, DRM and
access control
Slide 27 of 40November 25, 2010, Frankfurt, Germany
Innovation
Use of abstract annotation models to model provenance propagation in the Semantic Web context (RDF/S)
Current state-of-the-art either concrete and designed for a given application or designed for the DB context
Advantages◦Easy to update the KB◦Easy to change or experiment with different
provenance propagation models◦Flexibility
Slide 28 of 40November 25, 2010, Frankfurt, Germany
Interactions within PD
Within WP3 the work is very relevant to:◦T3.2 (Privacy, DRM and Access Control) –
FORTH◦T3.3 (Trust Management) – EPFL◦Provenance essential in the above◦General approach related to annotation
models, tagging etc (T3.1, T3.2, T3.3)WP2 deals with provenance and
annotation models as well (KIT)◦Unsure about exact interaction and/or overlaps
Related deliverable: D3.2 (M36)
Slide 29 of 40November 25, 2010, Frankfurt, Germany
PART III: Access Control
WP3: Provenance and Access Policies◦T3.2: Privacy, DRM and Access Control
(FORTH, M1-M42)
Objectives of our work◦Access control specification language◦Access control enforcement mechanism◦Data model agnostic access control framework◦Privacy-aware framework (purpose)◦Effects of provenance and access control on
DRM
Slide 30 of 40November 25, 2010, Frankfurt, Germany
Access Control: Introduction
Crucial for sensitive content ◦Refers to the ability to permit or deny the use of a
particular resource by a particular entity ◦Ensures the selective exposure of information
to different classes of usersFocus
◦For RDF graphs◦Fine-grained (triple-level)
Slide 31 of 40November 25, 2010, Frankfurt, Germany
Permissions
Used to tag triples (+/- tags)◦Allow access for the user under question (+)◦Deny access for the user under question (-)
SPARQL query to identify which triples to tag
R = include/exclude (x, p, y) where TP, Cwhere◦(x, p, y) is a SPARQL triple pattern◦TP is a conjunction of triple patterns and◦C is a conjunction of constraints
Slide 32 of 40November 25, 2010, Frankfurt, Germany
Access Control Policies
Some triples are untagged (missing permissions)
Default Semantics◦Will access be granted by default?◦Access granted: +, access denied: -
Some triples are multiply tagged with different tags (ambiguous permissions)
Conflict Resolution◦Will access be granted to multiply tagged
triples?◦Access granted: +, access denied: -
Slide 33 of 40November 25, 2010, Frankfurt, Germany
Accessible Triples“include” permissions “exclude” permissions
all triples(in the graph)
Slide 34 of 40November 25, 2010, Frankfurt, Germany
Our Work (Access Control)
Access control framework for RDF graphs◦Flouris, Fundulaki, Michou, Antoniou.
Controlling Access to RDF Graphs. FIS-10.At the moment
◦RDF only (RDFS inference not supported)◦Focus on read-only operations (no update or
write permissions can be set)◦Implementation exists (repository-independent
and portable across platforms)◦Specific access permissions allowed (+/-)
Slide 35 of 40November 25, 2010, Frankfurt, Germany
Research Plans
Abstract access control models◦More expressive tags (e.g., permission levels)
Access control for RDFS ◦Requires more expressive policies◦Support inference◦Support propagation in access control◦“Safe” access control policies
Access control for dynamic dataAccess control for edits (not only read)Data model agnostic access control
◦Extension/generalization of existing work
Slide 36 of 40November 25, 2010, Frankfurt, Germany
Privacy
Privacy: controlling access to private data ◦Access control, enhanced with the notion of
purpose◦Ensure the selective exposure of sensitive data
to different requesters and requester purposesApply our access control model for privacy
◦Privacy-aware framework◦Enhance our model with the notion of purpose
Slide 37 of 40November 25, 2010, Frankfurt, Germany
Digital Rights Management
DRM◦ Specification of digital rights
◦ Controlling access/usage based on digital rights
◦ Prevent/detect abuse of data (violation of digital rights)
Importance◦ One must know what he can (legally) do with the data
Effects of provenance and access control on DRM◦ DRM very related to provenance and access control
models◦ Identify peculiarities of DRM, extend the approach
Slide 38 of 40November 25, 2010, Frankfurt, Germany
InnovationCurrent state-of-the-art concrete and designed
for a given applicationGeneral approaches apply for general annotation
models in the DB contextGenerality
◦ Data model agnostic◦ Abstract access control policies◦ Policies support propagation and inference
Advantages◦ Easy to update the KB◦ Easy to change or experiment with different access
control policies◦ Flexible
Slide 39 of 40November 25, 2010, Frankfurt, Germany
Interactions within PD
Within WP3 the work is very relevant to:◦T3.1 (Provenance Management) – FORTH◦T3.3 (Trust Management) – EPFL◦Provenance essential for access control◦Trust management related to access control◦General approach related to annotation models
(applicable in T3.1, T3.2, T3.3)
Related deliverables: D3.1 (M24), D3.3 (M42)
Slide 40 of 40November 25, 2010, Frankfurt, Germany
Conclusion
Research activities of FORTH within PlanetData◦ T2.1: Repair (plus update)◦ T3.1: Provenance◦ T3.2: Privacy, DRM, and Access Control
Innovative work, focusing on generality, flexibility, adaptability
Work already started◦ Basic ideas and preliminary results established◦ Some publications also◦ Research plans established (subject to change)
Interactions mainly within the respective WPs◦ WP3: KIT, EPFL – interactions to be defined/discussed
top related