sovereign information integration

26
Sovereign Information Sovereign Information Integration Integration Rakesh Agrawal Rakesh Agrawal Jt. Work with Srikant & Evfimievski Evfimievski

Upload: tyler

Post on 21-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Sovereign Information Integration. Rakesh Agrawal Jt. Work with Srikant & Evfimievski. Outline. Motivation Problem Statement Protocols Challenges. Information Integration Today. Assumption: Information in each database can be freely shared. Mediator. Q. R. Q. R. Centralized. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sovereign Information Integration

Sovereign Information Sovereign Information IntegrationIntegration

Rakesh AgrawalRakesh Agrawal

Jt. Work with Srikant & EvfimievskiEvfimievski

Page 2: Sovereign Information Integration

OutlineOutline

Motivation Motivation Problem StatementProblem Statement ProtocolsProtocols ChallengesChallenges

Page 3: Sovereign Information Integration

Assumption: Information in each database can be Assumption: Information in each database can be freely shared.freely shared.

Information Integration TodayInformation Integration Today

Mediator

Q R

Federated

Q R

Centralized

Page 4: Sovereign Information Integration

Need for a new style of Need for a new style of information sharinginformation sharing

Compute queries across databases so that no more Compute queries across databases so that no more information than necessary is revealed (without information than necessary is revealed (without using a trusted third party).using a trusted third party).

Need is driven by several trends:Need is driven by several trends:– End-to-end integration of information systems End-to-end integration of information systems

across companies.across companies.– Simultaneously compete and cooperate.Simultaneously compete and cooperate.– Security: need-to-know information sharingSecurity: need-to-know information sharing

Page 5: Sovereign Information Integration

Selective Document SharingSelective Document Sharing

R is shopping for R is shopping for technology.technology.

S has intellectual S has intellectual property it may want to property it may want to license.license.

First find the specific First find the specific technologies where there technologies where there is a match, and then is a match, and then reveal further information reveal further information about those.about those.

R

ShoppingList

S

TechnologyList

Example 2: Govt. agencies sharing information on a

need-to-know basis.

Page 6: Sovereign Information Integration

Medical Research Medical Research

Validate hypothesis Validate hypothesis between adverse between adverse reaction to a drug and a reaction to a drug and a specific DNA sequence.specific DNA sequence.

Researchers should not Researchers should not learn anything beyond 4 learn anything beyond 4 counts:counts:

MayoClinic

DNA Sequences

DrugReactions

Adverse ReactionAdverse Reaction No Adv. ReactionNo Adv. Reaction

Sequence PresentSequence Present ?? ??

Sequence AbsentSequence Absent ?? ??

Page 7: Sovereign Information Integration

CaveatsCaveats

Schema Discovery & HeterogeneitySchema Discovery & Heterogeneity Multiple QueriesMultiple Queries

Page 8: Sovereign Information Integration

And of course…

Mediator

Q R

Mediator

Q R

Minimal

Necessary

Hybrids of Centralized, Federated, and Sovereign Architectures

Page 9: Sovereign Information Integration

OutlineOutline

MotivationMotivation Problem StatementProblem Statement ProtocolsProtocols ChallengesChallenges

Page 10: Sovereign Information Integration

R S R must not

know that S has b & y

S must not know that R has a & x

uu

vv

RSaa

uu

vv

xx

bb

uu

vv

yy

R

S

Count (R S) R & S do not learn

anything except that the result is 2.

Minimal Necessary SharingMinimal Necessary Sharing

Page 11: Sovereign Information Integration

Problem Statement:Problem Statement:Minimal SharingMinimal Sharing

Given:Given:– Two parties (honest-but-curious): R (receiver) and S Two parties (honest-but-curious): R (receiver) and S

(sender)(sender)– Query Q spanning the tables R and SQuery Q spanning the tables R and S– Additional (pre-specified) categories of information IAdditional (pre-specified) categories of information I

Compute the answer to Q and return it to R without revealing Compute the answer to Q and return it to R without revealing any additional information to either party, any additional information to either party, except for the except for the information contained in Iinformation contained in I– For intersection, intersection size & equijoin, For intersection, intersection size & equijoin,

I = { |R| , |S| }I = { |R| , |S| }

– For equijoin size, I also includes the distribution of duplicates & For equijoin size, I also includes the distribution of duplicates & some subset of information in R some subset of information in R S S

Page 12: Sovereign Information Integration

A Possible ApproachA Possible Approach

Secure Multi-Party ComputationSecure Multi-Party Computation– Given two parties with inputs x and y, compute f(x,y) such Given two parties with inputs x and y, compute f(x,y) such

that the parties learn only f(x,y) and nothing else.that the parties learn only f(x,y) and nothing else.– Can be solved by building a combinatorial circuit, and Can be solved by building a combinatorial circuit, and

simulating that circuit [Yao86].simulating that circuit [Yao86].

Prohibitive cost for database-size problems.Prohibitive cost for database-size problems.– Intersection of two relations of a million records each Intersection of two relations of a million records each

would require 144 days (Yao’s protocol)would require 144 days (Yao’s protocol)

Page 13: Sovereign Information Integration

OutlineOutline

MotivationMotivation Problem StatementProblem Statement ProtocolsProtocols ChallengesChallenges

Page 14: Sovereign Information Integration

Intersection Protocol: IntuitionIntersection Protocol: Intuition

Want to encrypt the value in R and S and compare Want to encrypt the value in R and S and compare the encrypted values.the encrypted values.

However, want an encryption function such that it However, want an encryption function such that it can only be jointly computed by R and S, not can only be jointly computed by R and S, not separately.separately.

Page 15: Sovereign Information Integration

Commutative EncryptionCommutative Encryption

Commutative encryption F is a computable function Commutative encryption F is a computable function f : Key F X Dom F -> Dom F, satisfying: f : Key F X Dom F -> Dom F, satisfying:

– For all e, e’ For all e, e’ Key F, f Key F, fe e oo ffe’ e’ = f= fe’ e’ oo ffee

(The result of encryption with two different keys is the same, (The result of encryption with two different keys is the same, irrespective of the order of encryption)irrespective of the order of encryption)

– Each Each ffe e is a bijection.is a bijection.

(Two different values will have different encrypted values)(Two different values will have different encrypted values)

– The distribution of <x, fThe distribution of <x, fee(x), y, f(x), y, fee(y)> is indistinguishable from the (y)> is indistinguishable from the

distribution of <x, fdistribution of <x, fee(x), y, z>; x, y, z (x), y, z>; x, y, z rr Dom F and e Dom F and e rr Key F. Key F.

(Given a value x and its encryption f(Given a value x and its encryption fee(x), for a new value y, we (x), for a new value y, we

cannot distinguish between fcannot distinguish between fee(y) and a random value z. Thus we (y) and a random value z. Thus we

cannot encrypt y nor decrypt fcannot encrypt y nor decrypt fee(y).)(y).)

Page 16: Sovereign Information Integration

Example Commutative Example Commutative EncryptionEncryption

ffee(x) = x(x) = xee mod p mod p

wherewhere– p: safe prime number, i.e., both p and q=(p-1)/2 p: safe prime number, i.e., both p and q=(p-1)/2

are primesare primes– encryption key e encryption key e 1, 2, …, q-1 1, 2, …, q-1– Dom F: all quadratic residues modulo pDom F: all quadratic residues modulo p

Commutativity: powers commuteCommutativity: powers commute(x(xdd mod p) mod p)ee mod p = x mod p = xdede mod p = (x mod p = (xee mod p) mod p)dd mod p mod p

Indistinguishability follows from Decisional Diffie-Indistinguishability follows from Decisional Diffie-Hellman Hypothesis (DDH)Hellman Hypothesis (DDH)

Page 17: Sovereign Information Integration

Intersection ProtocolIntersection Protocol

RS

R S

Secret key

r s

fs(S )We apply fs on h(S), where h is a hash function, not

directly on S.

Shorthand for { fs(x) | x S }

Page 18: Sovereign Information Integration

R

Intersection ProtocolIntersection Protocol

S

R S

fs(S)fs(S )

fr(fs(S ))

r s

fs(fr(S ))

Commutative property

Page 19: Sovereign Information Integration

R

Intersection ProtocolIntersection Protocol

S

R

S

fr(R )

fr(R )

fs(fr(S ))

<y, fs(y)> for y fr(R)

r s

<x, fs(fr(x))> for x R

<y, fs(y)> for y fr(R)

Since R knows<x, y=fr(x)>

Page 20: Sovereign Information Integration

Intersection Size ProtocolIntersection Size Protocol

R S

R S

fr(R ) fs(S )

fs(S ) fr(R )

fr(fs(S ))

r s

fs(fr(R ))

fr(fs(R))

R cannot map z fr(fs(R))

back to x R.

Not <y, fs(y)> for y fr(R)

Page 21: Sovereign Information Integration

Equijoin Protocol: IntuitionEquijoin Protocol: Intuition

R needs some extra information ext(v) for values v R needs some extra information ext(v) for values v R R S. S.– ext(v): information about the other attributes in ext(v): information about the other attributes in

S for those records where S.A = v S for those records where S.A = v S has second secret key s’S has second secret key s’ For each value v For each value v S, S,

– S generates an encryption key S generates an encryption key = f = fs’s’(v), and(v), and

– encrypts ext(v) using encryption function K with key encrypts ext(v) using encryption function K with key .. R to learns fR to learns fs’s’(v) only for v (v) only for v R. R.

– ff-1-1r r (f(fs’ s’ (f(frr(v))) = f(v))) = f-1-1

r r (f(fr r (f(fs’s’(v))) = f(v))) = fs’s’(v)(v)

Page 22: Sovereign Information Integration

Equi Join and Join SizeEqui Join and Join Size

See Sigmod03 paper See Sigmod03 paper Also gives the correctness proofs as well as the Also gives the correctness proofs as well as the

cost analysis of protocolscost analysis of protocols

Page 23: Sovereign Information Integration

Related WorkRelated Work

[Naor & Pinkas 99]: Two protocols for list [Naor & Pinkas 99]: Two protocols for list intersection problemintersection problem– Oblivious evaluation of n polynomials of degree n each.Oblivious evaluation of n polynomials of degree n each.– Oblivious evaluation of nOblivious evaluation of n22 linear polynomials. linear polynomials.

[Huberman et al 99]: find people with common [Huberman et al 99]: find people with common preferences, without revealing the preferences.preferences, without revealing the preferences.– Intersection protocols are similar Intersection protocols are similar

[Clifton et al, 2003]: Secure set union and set [Clifton et al, 2003]: Secure set union and set intersectionintersection– Similar protocolsSimilar protocols

Page 24: Sovereign Information Integration

Summary and ChallengesSummary and Challenges

New applications require us to go beyond traditional New applications require us to go beyond traditional centralized and federated information integration: sovereign centralized and federated information integration: sovereign information integrationinformation integration

Need models of minimal disclosure and corresponding Need models of minimal disclosure and corresponding protocols forprotocols for– other database operationsother database operations

– combination of operationscombination of operations Need faster protocolsNeed faster protocols Need further study of tradeoff between efficiency andNeed further study of tradeoff between efficiency and

– additional information disclosedadditional information disclosed

– approximationapproximation

Page 25: Sovereign Information Integration

Privacy Preserving Data MiningPrivacy Preserving Data Mining

0

200

400

600

800

1000

1200

2 10 18 26 34 42 50 58 66 74 82

Original Randomized Reconstructed

50 | 40K | ... 30 | 70K | ...

Randomizer Randomizer

Reconstructdistribution

of Age

Reconstructdistributionof Salary

Data Mining Algorithms

Data Mining Model

65 | 20K | ... 25 | 60K | ...

Alice’s age

Alice’s salary

Bob’s age

30+35

0

20

40

60

80

100

120

10 20 40 60 80 100 150 200

Randomization Level

Original Randomized Reconstructed

Insight: Preserve privacy at the individual level, while still building accurate data mining models at the aggregate level.

Add random noise to individual values to protect privacy.

EM algorithm to estimate original distribution of values given randomized values + randomization function.

Algorithms for building classification models and discovering association rules on top of privacy-preserved data with only small loss of accuracy.

Page 26: Sovereign Information Integration

Hippocratic DatabaseHippocratic Database

PrivacyPolicy

DataCollection

Queries

PrivacyMetadataCreator

Store

PrivacyConstraintValidator

DataAccuracyAnalyzer

AuditInfo

AuditInfo

AuditTrail

QueryIntrusionDetector

AttributeAccessControl

PrivacyMetadata

Other

DataRetentionManager

RecordAccessControl

EncryptionSupport

DataCollectionAnalyzer

## NameName AgeAge PhonePhone

11 AdamsAdams 1010 111-1111111-1111

33 -- -- 333-3333333-3333

44 DanielsDaniels 4040 --

050

100150200

250300

0.01 0.1 0.2 0.5 1

Application Selectivity

Qu

ery

Execu

tio

n T

ime

(seco

nd

s)

Original Queries

Rewritten Queries

Table Size: 10 million, no index

Vision: Database systems that Vision: Database systems that take responsibility for the take responsibility for the privacy of data they manage, privacy of data they manage, while not impeding the flow of while not impeding the flow of information.information.

Architectural principles derived Architectural principles derived from current privacy from current privacy legislation.legislation.