forth research activities planetdata wp1-3 meeting (frankfurt, nov10) giorgos flouris, irini...

40
FORTH Research Activities FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, PlanetData WP1-3 Meeting (Frankfurt, Nov10) Nov10) Giorgos Flouris, Irini Fundulaki – FORTH Giorgos Flouris, Irini Fundulaki – FORTH

Upload: derick-rodgers

Post on 29-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

FORTH Research Activities FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, PlanetData WP1-3 Meeting (Frankfurt,

Nov10)Nov10)Giorgos Flouris, Irini Fundulaki – FORTHGiorgos Flouris, Irini Fundulaki – FORTH

Page 2: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 2 of 40November 25, 2010, Frankfurt, Germany

FORTH in PD Research

WP2: Quality Assessment and Context◦T2.1: Data Quality Assessment and Repair

(FUB, M1-M42)WP3: Provenance and Access Policies

◦T3.1: Provenance Management(FORTH, M1-M36)

◦T3.2: Privacy, DRM, and Access Control(FORTH, M1-M42)

Page 3: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 3 of 40November 25, 2010, Frankfurt, Germany

Presentation Outline

Three main topics/tasks◦Repair (T2.1)◦Provenance (T3.1)◦Access control, privacy, DRM (T3.2)

Outline◦Summary and objectives◦Introduction and motivation◦Existing work and research plan◦Innovation◦Interactions within the project

Page 4: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 4 of 40November 25, 2010, Frankfurt, Germany

PART I: Repairs

WP2: Quality Assessment and Context◦T2.1: Data Quality Assessment and Repair

(FUB, M1-M42)

Objective of our work◦Study methodologies for repairing invalidities in

a way that will cause minimal effects upon the data

Extra:◦Apply the same methodologies for updates

Page 5: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 5 of 40November 25, 2010, Frankfurt, Germany

Repairs: Introduction

Validity rules to guarantee:◦Special semantics (e.g., acyclic subsumptions)◦Application-specific rules or requirements (e.g.,

functional properties)Validity: important dimension of quality

◦Violated at design time◦Violated during updates or other changes◦Violated when validity rules change

Solution: Repair◦Given an invalid graph, produce a valid one

that is as close as possible to the original

Page 6: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 6 of 40November 25, 2010, Frankfurt, Germany

Repairing Process

Repair

Invalid graph Valid graph

Main Challenges:1) Several potential repairs2) Must find the “closest” ones

Major Questions:1) How to determine potential repairs?2) How is “distance” measured?

Assumptions:1) RDF/S graphs2) Rules expressed in DED form

Page 7: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 7 of 40November 25, 2010, Frankfurt, Germany

Example

Validity Rules:• properties should have a unique domain and range• subject/object of a property instance should be correctly classified per the property’s domain/range

A rdf:type rdfs:ClassB rdf:type rdfs:ClassC rdf:type rdfs:ClassP rdf:type rdf:PropertyP rdfs:range BP rdfs:domain AP rdfs:domain Cx P yx rdf:type Ay rdf:type B

BP

A CP

yxP

Page 8: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 8 of 40November 25, 2010, Frankfurt, Germany

A rdf:type rdfs:ClassB rdf:type rdfs:ClassC rdf:type rdfs:ClassP rdf:type rdf:PropertyP rdfs:range BP rdfs:domain AP rdfs:domain Cx P yx rdf:type Ay rdf:type B

Example (Resolution #1)

Validity Rules:• properties should have a unique domain and range• subject/object of a property instance should be correctly classified per the property’s domain/range

Problem: two domains for the same propertySolution: delete one of the domains

BP

A CP

yxP

Page 9: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 9 of 40November 25, 2010, Frankfurt, Germany

BA CP

yxP

Solution #1A rdf:type rdfs:ClassB rdf:type rdfs:ClassC rdf:type rdfs:ClassP rdf:type rdf:PropertyP rdfs:range BP rdfs:domain Ax P yx rdf:type Ay rdf:type B

Example (Resolution #2)

Validity Rules:• properties should have a unique domain and range• subject/object of a property instance should be correctly classified per the property’s domain/range

Solution #2A rdf:type rdfs:ClassB rdf:type rdfs:ClassC rdf:type rdfs:ClassP rdf:type rdf:PropertyP rdfs:range BP rdfs:domain Cx P yx rdf:type Ay rdf:type B

Problem: incorrect classificationSolution: change domain ORmake x instance of C ORdelete property instance[by the rule syntax]

BP

A C

yxP

Page 10: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 10 of 40November 25, 2010, Frankfurt, Germany

Potential Repairs: Challenges

Easy to determine how to resolve a single violated rule, but …◦ Several violations◦ Several repairing options per violation

Resolution interdependencies:◦ Repairing one violation in a certain way may cause

another violation◦ Repairing one violation in a certain way may repair

multiple violationsNeed for an exhaustive, rule-based search to

determine all potential repairs◦ Tree-based search (recursive)

K

Page 11: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 11 of 40November 25, 2010, Frankfurt, Germany

Selecting a Potential Repair

Which repair should be returned?◦We want the repaired KB to be as close as

possible to the originalUser-defined notion of “distance”

◦Specifications for selecting “preferred repairs”◦Based on user-defined preferences

Preferred repair depends on the context and application, for example:◦Under complete knowledge, prefer removals◦In an open setting, prefer additions

Page 12: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 12 of 40November 25, 2010, Frankfurt, Germany

Determining Preferences

Provide specifications to determine the preferred repair◦Important features of a potential repair

E.g.: additions, schema changes etc◦Comparing the values of important features

E.g.: minimize, “around” etc◦Combine features (preferences)

E.g.: prioritize, pareto-preference etcFlight analogy

◦Minimize number of stops◦Minimize cost

Page 13: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 13 of 40November 25, 2010, Frankfurt, Germany

Repairs: Summary

Framework for repairing invalidities in RDF/S graphs◦Potential repairs determined using syntactical

manipulations over the validity rules◦Preferred repairs determined using formal

preferencesResearch plan:

◦Formal description of a repair framework◦Develop, optimize, experiment with, study

repair algorithm

Page 14: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 14 of 40November 25, 2010, Frankfurt, Germany

Innovation

Existing approaches:◦In-built preferences◦In-built validity rules

Our proposal is:◦Flexible: preferences can be set at run-time◦Adaptable: different rules and preferences◦Intuitive: easy-to-define preferences◦Very general: different repair policies from the

literature can be expressed in our framework◦Easy to be implemented: we can use off-the-

shelf implementations for preference evaluation

Page 15: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 15 of 40November 25, 2010, Frankfurt, Germany

Interactions within PD

Only within the WP2 and Task 2.1

Related deliverable: D2.2 (M18)

Page 16: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 16 of 40November 25, 2010, Frankfurt, Germany

An Extra: Updates

Apply an update on an RDF/S graph, in the presence of validity rules◦Originally a part of WP1 – not any more

Similar ideas as with repair◦Apply the update (in a naïve manner)◦Repair the result

Taking into account what the update wasPrinciples

◦Success (update must be applied)◦Validity (result must be valid)◦Minimal change (minimal “distance” -

preferences)

Page 17: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 17 of 40November 25, 2010, Frankfurt, Germany

PART II: Provenance

WP3: Provenance and Access Policies◦T3.1: Provenance Management

(FORTH, M1-M36)

Objectives of our work◦Provenance for RDF and RDFS (inference)◦Provenance for SPARQL query and update◦Efficient storage schemes for provenance

Page 18: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 18 of 40November 25, 2010, Frankfurt, Germany

Provenance: Introduction

Provenance: information on the origin of data◦ From where and how the piece of data was obtained

Allows/supports:◦ Assessment of data trustworthiness and quality◦ Reproducibility of experiments◦ Justification of decisions (e.g., argumentation)◦ Access control, privacy, DRM, trust

Focus on RDF/S◦ Inspired by DB provenance and annotation models

Page 19: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 19 of 40November 25, 2010, Frankfurt, Germany

Main Challenge

RDF triples

RDFS inference rulesg h

f which provenance?

ab

c

e

d

Provenance tag = colour◦A subset I of URIs distinguished from the set of

class and property names or types

A

B

C

?

Page 20: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 20 of 40November 25, 2010, Frankfurt, Germany

Annotation Models

Annotation models: ◦Annotation computation coupled with a

particular application and a particular assignment of source data annotations

X Y Annot

a b t

c d t

Y Z Annot

b e

X Y Z Annot

a b e

R1 R2R1 R2

ft tf

re-evaluate the query

t: trustedf: untrusted

Page 21: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 21 of 40November 25, 2010, Frankfurt, Germany

X Y Annot

a b c1

c d c2

Y Z Annot

b e c3

X Y Z Annot

a b e c1 x c3

R1 R2R1 R2

tt

t

t Λ t

f

t Λ f

Abstract Annotation Models

Abstract annotation models: ◦Abstract provenance tokens and operators are

substituted by appropriate concrete tokens for a particular application and assignment

Page 22: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 22 of 40November 25, 2010, Frankfurt, Germany

Inference and ProvenanceColours: a subset I of URIs distinguished from the

set of class and property names or types

To model colour propagation through inference rules we define an operation ‘+’ to compose colours

(I, ‘+’) is a commutative semigroup◦ c1 + c2 = c2 + c1 (commutativity)

◦ c1 + (c2 + c3) = (c1 + c2) + c3 (associativity)◦ c + c = c (idempotence)

Page 23: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 23 of 40November 25, 2010, Frankfurt, Germany

Inference and Provenance

Why provenance◦Which explicit triples contributed to get an implicit one?

◦Ignore how (i.e., which rules were used)◦A single operator ‘+’ for all inference rules

Ignore how many times a triple was used◦‘+’: idempotent [c + c = c]

Ignore the order of application◦‘+’: commutativec1 + c2 = c2 + c1

◦‘+’: associative c1 + (c2+c3) = (c1+c2) + c3

Page 24: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 24 of 40November 25, 2010, Frankfurt, Germany

SPARQL Provenance Model

SPARQL construct queries generate triples in a manner similar to inference◦Except that it is query-dependent

Similar problemsAbstract annotation models can capture

the provenance of SPARQL◦Queries that do not consider the OPTIONAL

Operator◦Monotonicity no longer holds in the case of

OPTIONAL

Page 25: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 25 of 40November 25, 2010, Frankfurt, Germany

Work So Far (Provenance)

Provenance models for RDF/S◦Pediaditis, Flouris, Fundulaki, Christophides. On

Explicit Provenance Management in RDF/S Graphs. TAPP-09.

◦Flouris, Fundulaki, Pediaditis, Theoharis, Christophides. Coloring RDF Triples to Capture Provenance. ISWC-09.

Provenance models for SPARQL◦Theoharis, Fundulaki, Karvounarakis,

Christophides. On Provenance of Queries on Linked Web Data. To appear in IEEE Internet Computing: Jan/Feb 2011 - Provenance in Web Applications.

Page 26: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 26 of 40November 25, 2010, Frankfurt, Germany

Research Plans (Provenance)

How provenance (more expressive)Provenance for dynamically evolving dataSupport OPTIONAL (in SPARQL)Efficient storage schemes for provenanceApply this work on privacy, DRM and

access control

Page 27: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 27 of 40November 25, 2010, Frankfurt, Germany

Innovation

Use of abstract annotation models to model provenance propagation in the Semantic Web context (RDF/S)

Current state-of-the-art either concrete and designed for a given application or designed for the DB context

Advantages◦Easy to update the KB◦Easy to change or experiment with different

provenance propagation models◦Flexibility

Page 28: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 28 of 40November 25, 2010, Frankfurt, Germany

Interactions within PD

Within WP3 the work is very relevant to:◦T3.2 (Privacy, DRM and Access Control) –

FORTH◦T3.3 (Trust Management) – EPFL◦Provenance essential in the above◦General approach related to annotation

models, tagging etc (T3.1, T3.2, T3.3)WP2 deals with provenance and

annotation models as well (KIT)◦Unsure about exact interaction and/or overlaps

Related deliverable: D3.2 (M36)

Page 29: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 29 of 40November 25, 2010, Frankfurt, Germany

PART III: Access Control

WP3: Provenance and Access Policies◦T3.2: Privacy, DRM and Access Control

(FORTH, M1-M42)

Objectives of our work◦Access control specification language◦Access control enforcement mechanism◦Data model agnostic access control framework◦Privacy-aware framework (purpose)◦Effects of provenance and access control on

DRM

Page 30: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 30 of 40November 25, 2010, Frankfurt, Germany

Access Control: Introduction

Crucial for sensitive content ◦Refers to the ability to permit or deny the use of a

particular resource by a particular entity ◦Ensures the selective exposure of information

to different classes of usersFocus

◦For RDF graphs◦Fine-grained (triple-level)

Page 31: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 31 of 40November 25, 2010, Frankfurt, Germany

Permissions

Used to tag triples (+/- tags)◦Allow access for the user under question (+)◦Deny access for the user under question (-)

SPARQL query to identify which triples to tag

R = include/exclude (x, p, y) where TP, Cwhere◦(x, p, y) is a SPARQL triple pattern◦TP is a conjunction of triple patterns and◦C is a conjunction of constraints

Page 32: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 32 of 40November 25, 2010, Frankfurt, Germany

Access Control Policies

Some triples are untagged (missing permissions)

Default Semantics◦Will access be granted by default?◦Access granted: +, access denied: -

Some triples are multiply tagged with different tags (ambiguous permissions)

Conflict Resolution◦Will access be granted to multiply tagged

triples?◦Access granted: +, access denied: -

Page 33: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 33 of 40November 25, 2010, Frankfurt, Germany

Accessible Triples“include” permissions “exclude” permissions

all triples(in the graph)

Page 34: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 34 of 40November 25, 2010, Frankfurt, Germany

Our Work (Access Control)

Access control framework for RDF graphs◦Flouris, Fundulaki, Michou, Antoniou.

Controlling Access to RDF Graphs. FIS-10.At the moment

◦RDF only (RDFS inference not supported)◦Focus on read-only operations (no update or

write permissions can be set)◦Implementation exists (repository-independent

and portable across platforms)◦Specific access permissions allowed (+/-)

Page 35: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 35 of 40November 25, 2010, Frankfurt, Germany

Research Plans

Abstract access control models◦More expressive tags (e.g., permission levels)

Access control for RDFS ◦Requires more expressive policies◦Support inference◦Support propagation in access control◦“Safe” access control policies

Access control for dynamic dataAccess control for edits (not only read)Data model agnostic access control

◦Extension/generalization of existing work

Page 36: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 36 of 40November 25, 2010, Frankfurt, Germany

Privacy

Privacy: controlling access to private data ◦Access control, enhanced with the notion of

purpose◦Ensure the selective exposure of sensitive data

to different requesters and requester purposesApply our access control model for privacy

◦Privacy-aware framework◦Enhance our model with the notion of purpose

Page 37: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 37 of 40November 25, 2010, Frankfurt, Germany

Digital Rights Management

DRM◦ Specification of digital rights

◦ Controlling access/usage based on digital rights

◦ Prevent/detect abuse of data (violation of digital rights)

Importance◦ One must know what he can (legally) do with the data

Effects of provenance and access control on DRM◦ DRM very related to provenance and access control

models◦ Identify peculiarities of DRM, extend the approach

Page 38: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 38 of 40November 25, 2010, Frankfurt, Germany

InnovationCurrent state-of-the-art concrete and designed

for a given applicationGeneral approaches apply for general annotation

models in the DB contextGenerality

◦ Data model agnostic◦ Abstract access control policies◦ Policies support propagation and inference

Advantages◦ Easy to update the KB◦ Easy to change or experiment with different access

control policies◦ Flexible

Page 39: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 39 of 40November 25, 2010, Frankfurt, Germany

Interactions within PD

Within WP3 the work is very relevant to:◦T3.1 (Provenance Management) – FORTH◦T3.3 (Trust Management) – EPFL◦Provenance essential for access control◦Trust management related to access control◦General approach related to annotation models

(applicable in T3.1, T3.2, T3.3)

Related deliverables: D3.1 (M24), D3.3 (M42)

Page 40: FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH

Slide 40 of 40November 25, 2010, Frankfurt, Germany

Conclusion

Research activities of FORTH within PlanetData◦ T2.1: Repair (plus update)◦ T3.1: Provenance◦ T3.2: Privacy, DRM, and Access Control

Innovative work, focusing on generality, flexibility, adaptability

Work already started◦ Basic ideas and preliminary results established◦ Some publications also◦ Research plans established (subject to change)

Interactions mainly within the respective WPs◦ WP3: KIT, EPFL – interactions to be defined/discussed