confidential computing - analysing data without seeing data

44
www.csiro.au Data Analy1cs WITHOUT Seeing the Data Max O> … with input from the en1re N1 Team max.o>@data61.csiro.au

Upload: maximilian-ott

Post on 15-Apr-2017

81 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Confidential Computing - Analysing Data Without Seeing Data

www.csiro.au

DataAnaly1csWITHOUTSeeingtheDataMaxO>…withinputfromtheen1reN1Teammax.o>@data61.csiro.au

Page 2: Confidential Computing - Analysing Data Without Seeing Data

FutureValueofData

Data Analytics Without Seeing the Data 2|

time

value

release

Data decays with time!

Page 3: Confidential Computing - Analysing Data Without Seeing Data

FutureValueofData

Data Analytics Without Seeing the Data 3|

time

value

release

Joined with another data set – more value!!

Page 4: Confidential Computing - Analysing Data Without Seeing Data

FutureValueofData

Data Analytics Without Seeing the Data 4|

time

value

release New analytics techniques – more value!!

Page 5: Confidential Computing - Analysing Data Without Seeing Data

FutureValueofData

Data Analytics Without Seeing the Data 5|

time

value

release Data decay

+ Joining new data

+ New analytics techniques

Uncertain future value Unknown future risk

Page 6: Confidential Computing - Analysing Data Without Seeing Data

Challenge

Computa.on

Result

Confidential

Learnthis!

LearnNOTHING

DataAnaly.csWithoutSeeingtheData6|

Page 7: Confidential Computing - Analysing Data Without Seeing Data

TheProblem

Howcanwelearnvaluableinsightsfromsensi1vedatafrommul1pleorganisa.ons?

Insights

Sensitive data

Sensitive data

Joint Analysis

Confidential Confidential

DataAnaly.csWithoutSeeingtheData7|

Page 8: Confidential Computing - Analysing Data Without Seeing Data

ThreeBasicBuildingBlocks

• Privatecomputa.on• Arithme.conencryptednumbers

• Distributed,confiden.alanaly.cs• Distributedalgorithms,computa.on&protocols

• PrivateRecordLinkage•  Privacypreservingrecordlevelmatching

DataAnaly.csWithoutSeeingtheData8|

Page 9: Confidential Computing - Analysing Data Without Seeing Data

Solu1on(1):Privatecomputa1on

3 E7117593598749643033862322306020184392520845976281563526294981559259516861516633702469933935260534155369128712003211669147527394965883186987430405887069486581926553537132809459595364742532851158563479115837779718562708357817416015729957944589069202390269842442766563604072938327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597109…..........

655353713280945959536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805720754176976456091364980410538327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089…....

2 E

+ “+”

9536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805118867293477923338583556495053804260814673481851259710956280997821095895622448011352839812888469270046257630846965506077009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089270046257630846…....

D5

= =

DataAnaly.csWithoutSeeingtheData9|

Page 10: Confidential Computing - Analysing Data Without Seeing Data

Solu1on(1):Privatecomputa1on

3 E7117593598749643033862322306020184392520845976281563526294981559259516861516633702469933935260534155369128712003211669147527394965883186987430405887069486581926553537132809459595364742532851158563479115837779718562708357817416015729957944589069202390269842442766563604072938327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597109…..........

655353713280945959536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805720754176976456091364980410538327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089…....

2 E

+ “+”

9536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805118867293477923338583556495053804260814673481851259710956280997821095895622448011352839812888469270046257630846965506077009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089270046257630846…....

D5

= =

10| DataAnaly.csWithoutSeeingtheData

Page 11: Confidential Computing - Analysing Data Without Seeing Data

Solu1on(2):Distributedanaly1cs

Compute

DataDept2

Compute

DataN1 Secure computeConfidentiality boundary

Dataalwaysremainsconfiden1altothesourceins.tu.on

Dept1

Compute N1 Coordinator

Messagescontainingencrypteddata

11| DataAnaly.csWithoutSeeingtheData

Page 12: Confidential Computing - Analysing Data Without Seeing Data

Solu1on(3):PrivateRecordLinkage

DatasetA DatasetB

Tori Mckone 7/06/1921 F

Tori Mackon 6/07/1921 F

Victoria Mckon 7/06/1921 F ?

?

12| DataAnaly.csWithoutSeeingtheData

Page 13: Confidential Computing - Analysing Data Without Seeing Data

UseCases

Page 14: Confidential Computing - Analysing Data Without Seeing Data

Scoring

Model

OwnData

OtherData

Quality

??

15| DataAnaly.csWithoutSeeingtheData

Page 15: Confidential Computing - Analysing Data Without Seeing Data

SuspiciousAc1vi1esNeedtoreport?

Model Builder

16| DataAnaly.csWithoutSeeingtheData

Page 16: Confidential Computing - Analysing Data Without Seeing Data

IndustryusingGovData

Model Builder

OwnData

GovData

17| DataAnaly.csWithoutSeeingtheData

Page 17: Confidential Computing - Analysing Data Without Seeing Data

Benchmarking

OwnData

Model Builder

18| DataAnaly.csWithoutSeeingtheData

Page 18: Confidential Computing - Analysing Data Without Seeing Data

DeviceAnaly1cs

Data Analytics Without Seeing the Data

Modelofnormalbehaviour

OK OK NG OK

PrivateModeling

learn

deploy

OK NG OK

19|

Page 19: Confidential Computing - Analysing Data Without Seeing Data

PrivateComputa1on

Page 20: Confidential Computing - Analysing Data Without Seeing Data

Homomorphicencryp1on

Partial Homomorphic

Encryption

Somewhat Homomorphic

Encryption

Fully Homomorphic

Encryption

Allows either addition or multiplication of encrypted numbers

Allows evaluation of low order polynomials

Allows evaluation of arbitrary functions

Mor

e ge

nera

l

Fast

er

DataAnaly.csWithoutSeeingtheData21|

Page 21: Confidential Computing - Analysing Data Without Seeing Data

PaillierEncryp1on

c = gmrnmodn2Encryption of m:

D E m1( ).E m2( )modn2( ) =m1 +m2 modn

D E m1( )m2 modn2( ) =m1m2 modn

Addition of encrypted numbers:

Multiplication of encrypted number by a scalar:

DataAnaly.csWithoutSeeingtheData22|

Page 22: Confidential Computing - Analysing Data Without Seeing Data

PaillierEncryp1on

c = gmrnmodn2Encryption of m:

Addition of encrypted numbers:

Multiplication of encrypted number by a scalar:

gm1 × gm2 = gm1+m2

gm1( )m2= gm1m2

DataAnaly.csWithoutSeeingtheData23|

Page 23: Confidential Computing - Analysing Data Without Seeing Data

PaillierImplementa1ons

• Python–opensource• www.github.com/nicta/python-paillier

• Java–opensource• www.github.com/nicta/javallier

• Javascript–s.llundercloseddevelopment

24| DataAnaly.csWithoutSeeingtheData

Page 24: Confidential Computing - Analysing Data Without Seeing Data

Distributed,Confiden1alAnaly1cs

Page 25: Confidential Computing - Analysing Data Without Seeing Data

DistributedCompu1ngwithaTwist

Compute

DataOrg2

Compute

DataN1 Secure computeConfidentiality boundary

Dataalwaysremainsconfiden1altothesourceorganisa.on

Org1

Compute N1 Coordinator

MessagescontainingONLYencrypteddata

DataAnaly.csWithoutSeeingtheData26|

Page 26: Confidential Computing - Analysing Data Without Seeing Data

GraphComputa1onEngine

Domains

CE

CE

CE

DF DF

CE

DF

CE

Coordinator

Worker

Workers

Properties

M

M

M

M M

Messages

M JSON Message

CE AKKA actors

DF Data frames

27| DataAnaly.csWithoutSeeingtheData

Page 27: Confidential Computing - Analysing Data Without Seeing Data

N1Analy1csPla[orm

Privacy Technologies

Partial homomorphic encryption

Private Record Linkage

Irreversible aggregation

Distributed Graph Computation Engine

Analytics Statistics Regression Clustering

Data Auth

Machine Learning Learn Evaluate Deploy

Network

DataAnaly.csWithoutSeeingtheData28|

Page 28: Confidential Computing - Analysing Data Without Seeing Data

Logis1cRegression

p x;θ( ) = 11+ e−θ .x

L θ( ) = yi log p xi;θ( )+ 1− yi( )i=0

n

∑ log 1− p xi;θ( )( )

Logis.cfunc.on

Loglikelihood

Minimisefor:

Evaluate:

θ

Requires“securelog”and“secureinverse”protocolusingPaillierencryp.on

29| DataAnaly.csWithoutSeeingtheData

Builds on Han et al. 2010 “Privacy Preserving Gradient Descent Methods”

Page 29: Confidential Computing - Analysing Data Without Seeing Data

ExamplePaillierLogis1cRegression

Org B

CE CE

Coordinator

Worker

Secure Log

Logistic Learner

Secure Inverse

M JSON Message

CE AKKA actors

DF Data frames

Gradient Descent

Private key holder

Features & labels Features

Org A

N1Analytics

30| DataAnaly.csWithoutSeeingtheData

Page 30: Confidential Computing - Analysing Data Without Seeing Data

Performance

•  Learning•  Learntmodelshavethesame

accuracyasunencryptedcalcula.ons

•  “Privatelearning”is(1000x)slowerduetoencryptedcomputa.ons.Learning.mesareseveralhours.

•  Deployment•  Ascorecanbegeneratedinreal

.me(<50ms)•  Customerdatathatcontributesto

thescoreremainsprivate.

��� ���� ������������� (����)

���

����

�������

�������� ���� (�)

�������� �������� ����������������� ���� ��� ����

31| DataAnaly.csWithoutSeeingtheData

Page 31: Confidential Computing - Analysing Data Without Seeing Data

Scaling

Coordinator

Data Provider 1

Data Provider 2

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

��������

●●

● ●●

■■ ■ ■

◆◆ ◆

0 100 200 300 400Cores

5

10

50

100

500Minutes

Learning time scaling

● 10,000x10 features

■ 100,000x10 features

◆ 1,000,000x10 features

32| DataAnaly.csWithoutSeeingtheData

Page 32: Confidential Computing - Analysing Data Without Seeing Data

Confiden1alRecordLinkage

Page 33: Confidential Computing - Analysing Data Without Seeing Data

RecordLinkageChallenge

DatasetA DatasetB

Tori Mckone 7/06/1921 F

Tori Mackon 6/07/1921 F

Victoria Mckon 7/06/1921 F ?

?

41| DataAnaly.csWithoutSeeingtheData

Page 34: Confidential Computing - Analysing Data Without Seeing Data

Solu1on(3):PrivateRecordLinkage

JaneDoe

PaulDoe

JimClark

KateClark

ShanBo

RegPal

JanetDoe

BobDoe

JimClark

KatClark

ShanBo

JoeSmith

a8bf342

f72630b

14oe54

a72bef4

7830530

4bf6021

a8bf242

b3894f3

14oe54

672bef4

7830530

80ac364FuzzyMatching

Onewayhashfunc.ons Onewayhashfunc.ons

42| DataAnaly.csWithoutSeeingtheData

Page 35: Confidential Computing - Analysing Data Without Seeing Data

PrivateRecordLinkage

FuzzyMatcher

SharedSecretSaltHasher

PersonallyIden.fiableInforma.on

AnonymousBloomfilter

Hasher

PersonallyIden.fiableInforma.on

AnonymousBloomfilter

LinkageTableN1

CompanyA CompanyB

PIIcannotberecoveredfromthehashes43| DataAnaly.csWithoutSeeingtheData

Page 36: Confidential Computing - Analysing Data Without Seeing Data

PrivateRecordLinkage

44|

44

Organisa.onB

FuzzyMatcher

Organisa.onA

N1Analy.cs

A's$PII$dataName DOB Gender

John/Smith 12/01/82 MMark/Gorgon 1/12/90 MHanna/Smith 4/02/78 F

… … …… … …

Juliet/Baker 2/11/72 F

B's$PII$dataName DOB Gender

Mark.Gorgon 1/12/90 MJuliet.Baker 2/11/72 F

Andrew.Roberts 4/02/93 M… … …… … …

Hanna.Smith 4/02/78 F

A's$Cryptographic$HashesRow Key

1 10110110...001010102 01110110...110101013 10011001...10100110… …… …

100000 01101011...00101101

B's$Cryptographic$HashesRow Key

1 01110110…110101012 01101011...001011013 01111000…00110011… …… …

100000 10011101...10100111

SharedSecretSaltHasher Hasher

Linkage(TableRow$A Row$B

1 X2 13 100000… …… …

100000 X

Similar in approach to MERLIN - Ranbaduge, Vatsalan, Christen (2015) DataAnaly.csWithoutSeeingtheData

Page 37: Confidential Computing - Analysing Data Without Seeing Data

Probabilis1cRecordLinkage

Commoncategoricalfeatures(e.gpostcode,agerange,gender)

Recordlinkagecanbeaprivacyissue

45| DataAnaly.csWithoutSeeingtheData

Page 38: Confidential Computing - Analysing Data Without Seeing Data

Classifica1onwithoutiden1tylinking

46|

FeaturesLabe

lsRadosFeatures

Shared

feature

Labe

ls*

LabelPropor.ons

Learning from Label Proportions

Patrini, Nock, Caetano, & Rivera, NIPS (2014), (Almost) No label no cry

DataAnaly.csWithoutSeeingtheData

Page 39: Confidential Computing - Analysing Data Without Seeing Data

Classifica1onwithoutiden1tylinking

47|

FeaturesLabe

lsRadosFeatures

Shared

feature

Labe

ls*

EncryptedLabelPropor.ons

Learning from Encrypted Label Proportions

DataAnaly.csWithoutSeeingtheData

Page 40: Confidential Computing - Analysing Data Without Seeing Data

CurrentStatus

Page 41: Confidential Computing - Analysing Data Without Seeing Data

CurrentCapabili1esofN1pla[orm

•  Standarddataanaly.cstechniquesonconfiden.aldata:•  Correla.onanalysis•  Classifica.on/predic.on•  Regression•  Clustering/outlierdetec.on

•  Automatedprivaterecordlinkage

•  Finegrainedauthorisa.onandaccesscontrol

Dept1

Org2

Comp3Privaterecord

linkage

Sta.s.cs Classifiers AnomalyDetec.on

Privateanaly.cs

Federatedmodel–NocentraldatabaseDataiskeptlocaltothesource

49| DataAnaly.csWithoutSeeingtheData

Page 42: Confidential Computing - Analysing Data Without Seeing Data

Betaprogram

• Notopensourced(yet!)• Lookingforpartnerswhowanttouseoursystemintheirapplica1ons

• S.llsomewarts,butworkingincommercialsesng

50| DataAnaly.csWithoutSeeingtheData

Page 43: Confidential Computing - Analysing Data Without Seeing Data

Acknowledgements

51|

Engineering Mr. Brian Thorne Dr. Mentari Djatmiko Dr. Guillaume Smith Dr. Wilko Hanecka Dr. Hamish Ivey-Law

Research Dr. Richard Nock Mr. Giorgio Patrini Dr. Roksana Borelli Dr. Arik Friedman Prof. Hugh Durrant-Whyte

Business Mr. Warren Bradey Ms. Shelley Copsey

Lead: Dr. Stephen Hardy

DataAnaly.csWithoutSeeingtheData

Page 44: Confidential Computing - Analysing Data Without Seeing Data

www.csiro.au

DataAnaly1csWithoutSeeingtheDataMaxO>…withinputfromtheen1reN1Teammax.o>@data61.csiro.au