1 webdamexchange and webdamlog: some models for web data management alban galland inria saclay &...

32
1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

Post on 19-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

1

WebdamExchange and WebdamLog: some models for web data management Alban GallandINRIA Saclay & ENS Cachan

Grenoble, 10/12/2010

Page 2: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

2

Organization

• Introduction

• Representing all Web information as logical sentences

• Representing all Web data management as logical rules

• Some clues about implementation

• Conclusion

Page 3: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

Introduction

Page 4: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

4

Context of the work presented here

• ERC Grant Webdam on Web Data Management of Serge Abiteboul with two INRIA teams, Leo-Iasi (ex Gemo, INRIA Saclay) and Dahu (LSV, ENS Cachan)

• Joint work with many people: Émilien Antoine, Serge Abiteboul, Meghyn Bienvenu, David Gross-Amblard, Amélie Marian, Bruno Marnette, Neoklis Polyzotis, Philippe Rigaux, Marie-Christine Rousset…

Page 5: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

5

Context: Web data management

• Scale: lots of users, servers, large volume of data…

• Distribution heterogeneity: Cloud (social networks), P2P (DHT, gossiping)…

• Security heterogeneity: login, https, crypto, hidden URL…

• Terminology heterogeneity: annotation, semantic Web, ontologies…

• Incomplete information: inconsistencies, belief, trust…

• The heterogeneity keeps increasing with new systems and new applications arriving

• Consequence 1: difficulty to perform data integration/management

• Consequence 2: impossibility to keep control over its own data

Page 6: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

6

Thesis: Web data = distributed knowledge

• Work plan1. Represent all Web information as logical sentences

2. Represent all Web data management as logical rules

3. Develop a system to validate these ideas

• Motivation for the approach• Facilitate the design/implementation of complex systems

• Facilitate the control/surveillance of complex systems

• Use reasoning to optimize query evaluation

• Use reasoning for semantics/ontologies

• Use reasoning to manage access control and protect data

• Use reasoning to analyze properties of systems

Page 7: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

7

Motivating example

• Alice : get me the pictures of my friends where I am with Bob?

• What is going on:• Find the friends of Alice (The iPhone of Alice may remember it)

• For each answer, say Sue, find where Sue keeps her pictures (She may keep her pictures on Picasa)

• Find the means to access Sue’s pictures (Alice may ask the private url to a common friend)

• Find the photos with Bob and Alice (e.g. by querying the meta-data)

Page 8: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

8

Motivating example

• Alice : get me the pictures of my friends where I am with Bob?

• Issues: heterogeneity of friends• Heterogeneity of hosting: Some keep their pictures on trusted servers

such as Picasa, some put in on untrusted DHT, some have them on their smartphones…

• Heterogeneity of access-control: Some are public, some use login-password, some use private url, some use cryptography…

• Heterogeneity of data description: they may use different models of meta-data (taxonomies, ontologies…)

Page 9: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

Representing all Web information as logical sentences

Page 10: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

10

The information belongs to someone

• Each information belongs to a principal• A principal has an identity (URI) which can be authenticated

• Two kinds of principal: peer and virtual principal

• A peer: alice-laptop, alice-iPhone, picasa, facebook, dht-peer-124, …• Storage and processing capabilities

• A peer typically has a URL and can be sent query/update requests

• A virtual principal: alice, alice-friends, roc14• A virtual principal relies on peers for storage and processing

Page 11: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

11

The kind of information we are talking about

• Data: pictures, movies, music, emails, ebooks, reports

• Localization: bookmarks, knowledge such as Alice has an account in Facebook, Sue puts her pictures in Picasa

• Access: login/password, access rights on servers

• Annotations /Ontologies: semantic tags in Picasa ,RDFS, OWL

• Services: search engines, yellow pages, dictionaries…

• Incomplete information: beliefs, probabilistic information…

• And more…

Page 12: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

12

Logical statements to represent information

• Data: • Document: picture34@alice-iPhone(picture34.jpg,09/12/2009,…)

• Collection: pictures@alice(picture34@alice-iPhone)

• Localization: where@alice(picture37, picasa/alice)

• Access right: isOwner@picasa/alice(alice)

• Access secret : ownSecret@picasa/alice(“alice”, “HG-FT23”)

• Ontologies: [email protected](“alice”, human-being)

• Services: [email protected]($Person, $City, $Y)

• Belief: picture34@alice-iPhone(picture34.jpg,09/12/2009,…,75%)

• Etc.

Page 13: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

13

WebdamExchange focus: authenticated knowledge

• Base statement: • someone states picture37@alice (….)

• It is annotated with a proof that “someone” can write data of alice

• In the cryptographic setting, it is a signature of the whole statement using the write secret key of alice

• Keeping trace of provenance: • alice-laptop states picture37@alice (….) requester bob at 12:30,

10/08/2009

• alice-Laptop is the performer (the peer who did the update of the data of Alice)

• bob is the requester (the peer or the user who requested the update)

• The content is possibly encrypted: • alice-laptop states picture37@alice (….) protected for reader@alice

requester bob at 12:30, 10/08/2009

Page 14: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

14

WebdamExchange focus: authenticated knowledge

• Communication: external knowledge is knowledge about other principals: • alice-laptop says (alice-laptop states picture37@Alice (….) requester

bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009

• alice-laptop is the performer of the communication

• sue-iphone is the receiver of the communication

• External knowledge is authenticated by the performer and is stored by the receiver .

• The external knowledge keep a trusted trace of the provenance and communication are pilled-up: • sue-iphone says (alice-laptop says (alice-laptop states picture37@Alice

(….) requester bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009) to bob-iphone at 13:10, 15/10/2009

• The time is the time of the performer, there is no global clock

Page 15: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

15

The model covers a wide range of data

• The model does not prescribe any particular architecture for distribution• Gossiping, DHT, centralized server

• Combination of these

• Based on an abstract notion of localization

• The model does not prescribe how access control is enforced, e.g.:• Documents in Web servers with access protected by login/password

• Documents protected by cryptographic keys in public sites

• Based on an abstract notion of secret and hint

Page 16: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

16

Summary of WebdamExchange

• All the information forms a trusted knowledge base

• Each peer manages some portion of the knowledge base

• Now, we have to use this distributed knowledge base … for the management of the distributed knowledge base!

Page 17: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

Representing all Web data management as logical rules

Page 18: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

18

From WebdamExchange to WebdamLog

• The logical part of the WebdamExchange statements can easily be translated into datalog facts.

• Most of the reasoning of the system can be done using the logical form and datalog-like rules

• It motivates WebdamLog, a rule-based language for web data management

Page 19: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

19

Why datalog?

• Datalog: very popular in the 90’s, prehistory by Web time+ Nicer/more compact syntax; easy to extend

- Recursion not really essential

• Datalog extensions• Negation and aggregate functions tons of works on that

• Updates, time, trees, distribution fewer works on it

• We use a datalog-like language influenced by• Active XML for distribution and intensional data

• Hellerstein’s Dedalus for time and performance

Page 20: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

20

Webdamlog

• Facts are of the form: m@p(a1,...,an) (sorted)

• Rules are of the form:• R@P(U) :- (not) R1@P1(U1), …, (not) Rn@Pn(Un)

• R,Ri are message terms

• P,Pi are peer terms

• U,Ui are tuples of terms

• Safety condition

• Intuition: if the body holds for some valuation v, the message vR@vP(vU) is sent to the peer vP

• Issue: what happen if the body of the rules mentions different peers?

Page 21: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

21

Webdamlog

System:

• A finite set of peers

• Each peer p in has a local program P(p) and some delegated program D(p) consisting of finite sets of rules

• Each peer p in has a database I(p), consisting of a finite set of facts of the form m@p(u)

Semantics:

• in a state (P,D,I), choose randomly some p • Evaluate (P(p)UD(p))(I(p))

• This defines the new database I’(p)

• This adds facts and update rules of the other peers to define (D’(q),I’(q)) for each q

• The changes to each q are installed synchronously – we will see how to avoid it if desired

• Choose another peer and keep going (in a fair way)

Peer1 Peer2

Peer3 Peer4

Page 22: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

22

Features of WebdamLog illustrated

• Alice: get me the pictures of my friends where I am with Bob?• result@alice-iphone($photo,$X) :-

friends@alice-iphone($X),findPhotos@alice-iphone($X,$R,$P),$R@$P($Photo,$Meta),contains@$P($Meta, “Alice”) , contains@$P($Meta, “Bob”)

• Peers and messages as data: they are reified

• friends@alice-iphone is extensional, in I(alice-iphone)

• findPhotos@alice-iphone is intensional, in P(alice-iphone)UD(alice-iphone)

• $R@$P is bounded to a relation of (possibly) another peer

• contains@$P is a service of that peer

Page 23: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

23

Features of WebdamLog illustrated

• Delegation of rules

• Alice: get me the pictures of my friends where I am with Bob?• result@alice-iphone($Photo,$X) :-

friends@alice-iphone($X),findPhotos@alice-iphone($X,$R,$P),$R@$P($Photo,$Meta),contains@$P($Meta, “Alice”) , contains@$P($Meta, “Bob”)

• friends@alice-iphone(Sue);

• findPhotos@alice-iphone(Sue,photos,picasa/sue) :-

• Then alice-iphone installs the following rule at picasa/sue:• result@alice-iphone($Photo,Sue) :-

photos@picasa/sue($Photo,$Meta),contains@picasa/sue($Meta, “Alice”) , contains@picasa/sue($Meta, “Bob”)

• picasa/sue will send the photos as extensional facts to alice-iphone. When Alice terminates her query, it cancels all the delegations.

Page 24: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

24

Managing rules at other peers

• This is complex• Regarding implementation, one manages instantiations of rules, i.e.,

rules and valuation

• The content of valuations may be constantly changing

• There could be some negations in the rules

• This is a security risk• Someone else is installing data (facts) or code (rules) in a peer

• Need to control that carefully

Page 25: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

25

Does it means something?

• Some not-so trivial theorems about positive case or stratified negation case insuring • Church-rosser properties (convergence)

• Natural simulation by centralized systems

• Some even-less-trivial theorems about comparing expressivity of different variations of WebdamLog: without exchanging rules, without exchanging intensional data, with time-stamp…

Page 26: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

26

More refined asynchronicity

• To model message from peer p to peer q, we may use a “peer” netpq that captures the network

• Replace a call m@q(u) at p by m@netpq(u)

• netpq should just relay messages: $M@q($U) :- $M@netpq($U)

• Problem: all messages from p to q in the net arrive at the same time

• Better with time • m@netpq(u,t) where t is the time of the send at p

• $M@q(U) :- $M@netpq (U,T), min( T , $M@netpq (U,T)) , using min aggregate function

Page 27: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

27

Summary of WebdamLog

• Peer are asynchronically running their datalog programs

• They exchange facts and delegations of rules

Page 28: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

Some clues about implementation

Page 29: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

29

Implementation

• We are implementing two kinds of peers• WEP (Webdam Exchange Peer) – all functionalities

• IWEP (iPad Webdam Exchange Peer) – limited functionalities; rely on proxies

• We are implementing a social network on top of the system

Page 30: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

Conclusion

Page 31: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

31

Some cool results and still a lot of works

• WebdamExchange and WebdamLog models capture some nice problems of web data management: distribution, access control…• Their good semantics allow us to prove theorems!

• We are implementing the corresponding system!

• Many issues are still open• Concurrency, optimization, implementation

• Defining and verifying protocols (access control is not violated, one gets all the information one has access to)

• Looking for a killer application

Page 32: 1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010