opaque: an oblivious and encrypted distributed analytics ...platformlab.stanford.edu/seminar...

281
Opaque: An Oblivious and Encrypted Distributed Analytics Platform Raluca Ada Popa Joint work with: Wenting Zheng, Ankur Dave, Jethro Beekman, Joseph Gonzalez, and Ion Stoica UC Berkeley [NSDI’17]

Upload: tranphuc

Post on 18-Nov-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Raluca Ada Popa

Joint work with: Wenting Zheng, Ankur Dave, Jethro Beekman, Joseph Gonzalez, and Ion Stoica

UC Berkeley

[NSDI’17]

Page 2: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Complex analytics run on sensitive data

client cloud provider

sensitive data

Page 3: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Complex analytics run on sensitive data

client cloud provider

sensitive data

Page 4: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Complex analytics run on sensitive data

client

SparkSQL MLLib GraphX Spark

Streaming

cloud provider

sensitive data

Page 5: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform
Page 6: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cloud attackers

client cloud provider

sensitive data

Page 7: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cloud attackers

client cloud provider

sensitive data

Page 8: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cloud attackers

client cloud provider

sensitive data

Page 9: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cloud attackers

client cloud provider

sensitive data

Page 10: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cloud attackers

client cloud provider

sensitive data

Page 11: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Attacker has full access to all cloud software

Threat model

Page 12: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

How to protect data and computation

while preserving functionality?

Page 13: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

How to protect data and computation

while preserving functionality?

relational algebra

Page 14: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cryptographic approaches• Generic functionality: fully homomorphic encryption, ObliVM

[RAD’78,Gentry’09]

Page 15: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cryptographic approaches• Generic functionality: fully homomorphic encryption, ObliVM

too slow[RAD’78,Gentry’09]

Page 16: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cryptographic approaches• Generic functionality: fully homomorphic encryption, ObliVM

too slow[RAD’78,Gentry’09]

• Specialized solutions: CryptDB, Cipherbase, Monomi, [..], BlindSeer, [FJK+15], Arx

Page 17: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cryptographic approaches• Generic functionality: fully homomorphic encryption, ObliVM

too slow

restricted functionality

[RAD’78,Gentry’09]

• Specialized solutions: CryptDB, Cipherbase, Monomi, [..], BlindSeer, [FJK+15], Arx

Page 18: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cryptographic approaches• Generic functionality: fully homomorphic encryption, ObliVM

too slow

restricted functionality

[RAD’78,Gentry’09]

Alternative: hardware enclaves

• Specialized solutions: CryptDB, Cipherbase, Monomi, [..], BlindSeer, [FJK+15], Arx

Page 19: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Hardware enclaves 101

Page 20: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

• Hardware-enforced isolated execution environment

Hardware enclaves

memoryon die

core cache

(Intel SGX)

Page 21: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

• Hardware-enforced isolated execution environment

Hardware enclaves

memoryon die

core cache MEE

(Intel SGX)

Page 22: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

• Hardware-enforced isolated execution environment

Hardware enclaves

memoryon die

core cache MEE

(Intel SGX)

Page 23: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

• Hardware-enforced isolated execution environment

• Data decrypted only on the processor

Hardware enclaves

memoryon die

core cache MEE

(Intel SGX)

Page 24: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

• Hardware-enforced isolated execution environment

• Data decrypted only on the processor

Hardware enclaves

memoryon die

core cache MEE

• Protect against an attacker who has root access or compromised OS

(Intel SGX)

Page 25: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Remote attestation

Client Server

enclave

Page 26: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

Client Server

enclave

Page 27: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

client code

Client Server

enclave

Page 28: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

client code

Client Server

enclave

Page 29: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

client codehash

Client Server

enclave

Page 30: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

client code

hash

Client Server

enclave

Page 31: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

client code

hash

Client Server

enclave

Page 32: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

client code

hash

Client Server

enclave

Page 33: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

client code

hash

Client Server

enclave

untrusted OS

Page 34: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Assumption

memoryprocessor

Attacker is restricted to software attacks only, and does not exploit timing. Attacker controls the software stack.

machine

Page 35: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Assumption

memoryprocessor

Attacker is restricted to software attacks only, and does not exploit timing. Attacker controls the software stack.

machine

Page 36: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Assumption

memoryprocessor

Attacker is restricted to software attacks only, and does not exploit timing. Attacker controls the software stack.

machine

• for the implementation, attacker does not see accesses to a small memory region (due to T-SGX, page pinning, super pages, … )

Page 37: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enclave-based systems• Systems supporting relational algebra: Haven,

Scone

Page 38: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enclave-based systems• Systems supporting relational algebra: Haven,

Scone• not distributed• data access pattern leakage [XCP ’15, OCFGKS ’15]

Page 39: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Enclave-based systems• Systems supporting relational algebra: Haven,

Scone

• Distributed systems: VC3, M2R, Ohrimenko et al.’16: do not enable relational algebra and query planning

• not distributed• data access pattern leakage [XCP ’15, OCFGKS ’15]

Page 40: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Access patterns leakage

Page 41: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Access patternsmemory

processormachine 0

Page 42: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Access patternsmemory

processoraddresses

machine 0

Page 43: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Access patternsmemory

processoraddresses

machine 0

network messages machine 1

Page 44: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

Page 45: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

ID Name Age Disease

12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes98329 Ronald S. Ogden 53 Cancer32591 Donna R. Bridges 26 Diabetes

Page 46: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

ID Name Age Disease

12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes98329 Ronald S. Ogden 53 Cancer32591 Donna R. Bridges 26 Diabetes

SELECT count(*) FROM medical GROUP BY disease

Page 47: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 48: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 49: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 50: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 51: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 52: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 53: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Public information:Diabetes twice as commonas cancer

Page 54: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Public information:Diabetes twice as commonas cancer

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 55: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Public information:Diabetes twice as commonas cancer

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 56: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Public information:Diabetes twice as commonas cancer

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 57: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 58: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 59: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 60: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 61: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

??? Diabetes

??? Diabetes

??? Cancer

??? Diabetes

??? Cancer

??? Diabetes

Page 62: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

??? Diabetes

??? Diabetes

??? Cancer

??? Diabetes

??? Cancer

??? Diabetes

??? Cancer

Page 63: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Learns that Alice has cancer

Page 64: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Leakage from prior work

• Memory access patterns attacks [XCP15] extracted complete text documents and photo outlines

• Network access patterns [OCF+15] extracted age, gender, address of individuals

Page 65: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Goal: oblivious distributed analytics

Page 66: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Goal: oblivious distributed analytics

access patterns are independent of data content

Page 67: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque*: oblivious and encrypted distributed analytics platform

* Oblivious Platform for Analytic QUEries

Spark SQLOpaque

SQL ML Graph Analytics

Page 68: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Security guarantees (informal)

Page 69: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Security guarantees (informal)• Data encryption and authentication

Page 70: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Security guarantees (informal)• Data encryption and authentication

Page 71: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Security guarantees (informal)• Data encryption and authentication

• Computation integrity: the client can check that the computation result was not affected by an attacker

Page 72: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Security guarantees (informal)• Data encryption and authentication

• Computation integrity: the client can check that the computation result was not affected by an attacker

Page 73: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Security guarantees (informal)• Data encryption and authentication

• Computation integrity: the client can check that the computation result was not affected by an attacker

• Obliviousness: The memory and network accesses of a query is the same for any two inputs with the same size characteristics (input/outputs)

Page 74: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Security guarantees (informal)• Data encryption and authentication

• Computation integrity: the client can check that the computation result was not affected by an attacker

• Obliviousness: The memory and network accesses of a query is the same for any two inputs with the same size characteristics (input/outputs)• When enabling padding, Opaque hides output

sizes as well

Page 75: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Achieving practical obliviousness is not easy

Page 76: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Achieving practical obliviousness is not easy

Obliviousness typically comes with high overheads

• For example, the state-of-the-art system, ObliVM, is six orders of magnitude slower than regular computation

Page 77: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform
Page 78: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Page 79: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Data encryption and authentication

Page 80: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Computation verification

Data encryption and authentication

Page 81: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Data encryption and authentication

Page 82: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 83: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 84: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Client Server

Database

Scheduler

1 2 3

Page 85: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3

Page 86: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 87: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3

Query

query = SELECT sum(*) FROM table

Page 88: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3

Query

query = SELECT sum(*) FROM table

Page 89: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 90: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 91: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 92: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 93: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3

query = SELECT sum(*) FROM table

Page 94: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

10 13 4

Page 95: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

10

13

4

Page 96: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

27

Page 97: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

27

Page 98: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Problem: cloud can alter distributed computation

Page 99: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Problem: cloud can alter distributed computation

• Drop data

Page 100: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Problem: cloud can alter distributed computation

• Drop data

• Modify data

Page 101: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Problem: cloud can alter distributed computation

• Drop data

• Modify data

• Skip task

Page 102: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Problem: cloud can alter distributed computation

• Drop data

• Modify data

• Skip task

• Replay old state

Page 103: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: drop data

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 104: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: drop data

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3

query = SELECT sum(*) FROM table

Page 105: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: drop data

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

10 13 4

Page 106: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: drop data

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

10

13

Page 107: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: drop data

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

23

Page 108: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Example: drop data

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

23

Page 109: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computationInvariant: if computation does not abort, the execution completed so far is correct

Page 110: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computationInvariant: if computation does not abort, the execution completed so far is correct

Page 111: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computationInvariant: if computation does not abort, the execution completed so far is correct

If the computation is complete, then the entire query was executed correctly

Page 112: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 113: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 15Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 114: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 15Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 115: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 1510

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 116: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 15

1013

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 117: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 15

1013

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 118: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 15

1013

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 119: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 15

1013

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 120: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 15

1013

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 121: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Self-verifying computation 20

1413 15

1013

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 122: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 123: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 124: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

SELECT count(*) FROM medical GROUP BY disease

1

2

Page 125: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

SELECT count(*) FROM medical GROUP BY disease

1

2

There can be many partitions

Page 126: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivioussort

[CLRS, Leighton ‘85]

Map Sort

SELECT count(*) FROM medical GROUP BY disease

Page 127: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivioussort

[CLRS, Leighton ‘85]

Map Sort

SELECT count(*) FROM medical GROUP BY disease

Page 128: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivioussort

[CLRS, Leighton ‘85]

Map Sort

SELECT count(*) FROM medical GROUP BY disease

Page 129: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivioussort

[CLRS, Leighton ‘85]

Map Sort

SELECT count(*) FROM medical GROUP BY disease

????

Page 130: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Map Sort

Oblivioussort

[CLRS, Leighton ‘85]

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 131: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivioussort

[CLRS, Leighton ‘85]

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 132: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 133: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

The “Diabetes” group is split!

Page 134: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

The “Diabetes” group is split!

How to aggregate obliviously and in parallel?

Page 135: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

The “Diabetes” group is split!

How to aggregate obliviously and in parallel?It can span over many partitions

Page 136: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 137: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 138: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan

Statistics

Statistics

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 139: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Statistics

Statistics

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 140: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Statistics

Statistics

Partial agg.

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 141: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Partial agg.

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Cancer;Diabetes:1

Diabetes;Diabetes:3

Page 142: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Partial agg.

Oblivious aggregation

Diabetes:1

SELECT count(*) FROM medical GROUP BY disease

Page 143: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Oblivious aggregation

DUMMY

Diabetes:1

SELECT count(*) FROM medical GROUP BY disease

Page 144: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Oblivious aggregation

DUMMY

Diabetes:1

SELECT count(*) FROM medical GROUP BY disease

Page 145: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Oblivious aggregation

DUMMY

Diabetes:1

SELECT count(*) FROM medical GROUP BY disease

Page 146: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Scan

Oblivious aggregation

DUMMY

Diabetes:1

SELECT count(*) FROM medical GROUP BY disease

Page 147: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Scan

Oblivious aggregation

DUMMY

Diabetes:1

SELECT count(*) FROM medical GROUP BY disease

Page 148: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Scan

Oblivious aggregation

DUMMY

Diabetes:1

DUMMY

Cancer: 2

DUMMY

SELECT count(*) FROM medical GROUP BY disease

Page 149: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Scan

Oblivious aggregation

DUMMY

Diabetes:1

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

SELECT count(*) FROM medical GROUP BY disease

Page 150: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Scan

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

Diabetes:1

DummyDUMMY

Diabetes:1

SELECT count(*) FROM medical GROUP BY disease

Page 151: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

Diabetes

Diabetes

Cancer

Diabetes

Cancer

Diabetes

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

SELECT count(*) FROM medical GROUP BY disease

Page 152: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

SELECT count(*) FROM medical GROUP BY disease

Page 153: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

Sort

Oblivioussort

[CLRS, Leighton ‘85]

SELECT count(*) FROM medical GROUP BY disease

Page 154: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

Sort

Oblivioussort

[CLRS, Leighton ‘85]

SELECT count(*) FROM medical GROUP BY disease

Page 155: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

Cancer: 2

Diabetes:4

Sort

Oblivioussort

[CLRS, Leighton ‘85]

SELECT count(*) FROM medical GROUP BY disease

Page 156: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

Cancer: 2

Diabetes:4

Sort

Oblivioussort

[CLRS, Leighton ‘85]

Final result

SELECT count(*) FROM medical GROUP BY disease

Page 157: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

Cancer: 2

Diabetes:4

Sort

Oblivioussort

[CLRS, Leighton ‘85]

Final result

SELECT count(*) FROM medical GROUP BY disease

Aggregation hastwo sorts…

Page 158: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Oblivious aggregation

Cancer: 2

Diabetes:4

Sort

Oblivioussort

[CLRS, Leighton ‘85]

Final result

SELECT count(*) FROM medical GROUP BY disease

Aggregation hastwo sorts…

Can we do better?

Page 159: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 160: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 161: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

Page 162: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Page 163: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

medical

Filter

Aggregation

Logical op.

Page 164: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Insight 1

Page 165: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Insight 1

1. Split each logical operator into smaller Opaque operators

Page 166: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Insight 1

1. Split each logical operator into smaller Opaque operators

2. Take a global view across the plan to remove some Opaque operators

Page 167: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

Filter

Aggregation

Logical op.

Page 168: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimizationOpaque op.

medical

Filter

Aggregation

Logical op.

Page 169: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

Opaque op.

medical

Filter

Aggregation

Logical op.

Page 170: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

Opaque op.

medical

Filter

Aggregation

Logical op.

Page 171: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op.

Page 172: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

Page 173: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

Page 174: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

Page 175: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

ProjectProject

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

Page 176: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

ProjectProject

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

Page 177: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

ProjectProject

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

Page 178: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

ProjectProject

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes

00001

98329 Ronald S. Ogden 53 Cancer 0

Page 179: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

ProjectProject

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes

00001

98329 Ronald S. Ogden 53 Cancer 0

Page 180: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes

00001

98329 Ronald S. Ogden 53 Cancer 0

Page 181: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

O-sort

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes

00001

98329 Ronald S. Ogden 53 Cancer 0

Page 182: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

O-sort

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes

00001

98329 Ronald S. Ogden 53 Cancer 0

Page 183: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

O-sort

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

32591 Donna R. Bridges 26 Diabetes

0000

198329 Ronald S. Ogden 53 Cancer 0

Page 184: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

O-sort

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

32591 Donna R. Bridges 26 Diabetes

0000

198329 Ronald S. Ogden 53 Cancer 0

Page 185: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

32591 Donna R. Bridges 26 Diabetes

0000

198329 Ronald S. Ogden 53 Cancer 0

Page 186: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

Filter

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

32591 Donna R. Bridges 26 Diabetes

0000

198329 Ronald S. Ogden 53 Cancer 0

Page 187: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

Filter

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

0000

98329 Ronald S. Ogden 53 Cancer 0

Page 188: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Filter

Project

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

0000

98329 Ronald S. Ogden 53 Cancer 0

Page 189: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Agg.

O-sort

O-sort

Filter

Project

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

0000

98329 Ronald S. Ogden 53 Cancer 0

Page 190: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Agg.

O-sort

O-sort

Filter

Project

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

0000

98329 Ronald S. Ogden 53 Cancer 0

Page 191: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Agg.

O-sort

O-sort

Filter

Project

O-sort

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes

0000

98329 Ronald S. Ogden 53 Cancer 0

Page 192: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Agg.

O-sort

O-sort

Filter

Project

O-sort

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op.

12809 Amanda D. Edwards 40 Diabetes

29489 Robert R. McGowan 56 Diabetes

13744 Kimberly R. Seay 51 Cancer

18740 Dennis G. Bates 32 Diabetes0

0

0

0

98329 Ronald S. Ogden 53 Cancer 0

Page 193: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

O-sort

Agg.

O-sort

O-sort

Filter

Project

O-sort

Rule-based optimization

medical

Scan

Opaque op.

medical

Filter

Aggregation

Logical op.

12809 Amanda D. Edwards 40 Diabetes

29489 Robert R. McGowan 56 Diabetes

13744 Kimberly R. Seay 51 Cancer

18740 Dennis G. Bates 32 Diabetes0

0

0

0

98329 Ronald S. Ogden 53 Cancer 0

Can we remove any sort?

Page 194: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Page 195: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Page 196: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Page 197: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Sort on 0/1 column

Page 198: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Sort on 0/1 column

Page 199: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Sort on 0/1 column

Sort on Disease

Page 200: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Sort on 0/1 column

Sort on Disease+

Page 201: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Sort on 0/1 column

Sort on Disease+

=

Page 202: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Sort on 0/1 column

Sort on Disease+

Sort on (0/1, Disease)

=

Page 203: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Page 204: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

O-sort

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Page 205: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Filter

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

Page 206: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Scan

Rule-based optimization

medical

Agg.

O-sort

Opaque op.

medical

Filter

Aggregation

Logical op.

O-sort

Filter

Project

Page 207: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Scan

Rule-based optimization

medical

Scan

Agg.

O-sort

Opaque op.

medical

Filter

Aggregation

Logical op.

O-sort

Filter

Project

Page 208: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Scan

Rule-based optimization

medical

Scan

Agg.

O-sort

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

O-sort

Filter

Project

Page 209: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Scan

Rule-based optimization

medical

Agg.

O-sort

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

O-sort

Filter

Project

Page 210: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Scan

Rule-based optimization

medical

Agg.

O-sort

Opaque op.

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes32591 Donna R. Bridges 26 Diabetes98329 Ronald S. Ogden 53 Cancer

O-sort

Filter

ProjectProject

Page 211: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes 029489 Robert R. McGowan 56 Diabetes 013744 Kimberly R. Seay 51 Cancer 018740 Dennis G. Bates 32 Diabetes 032591 Donna R. Bridges 26 Diabetes 198329 Ronald S. Ogden 53 Cancer 0

Project

Filter

Page 212: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op. 12809 Amanda D. Edwards 40 Diabetes 029489 Robert R. McGowan 56 Diabetes 013744 Kimberly R. Seay 51 Cancer 018740 Dennis G. Bates 32 Diabetes 032591 Donna R. Bridges 26 Diabetes 198329 Ronald S. Ogden 53 Cancer 0

Filter

Page 213: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

O-sort

12809 Amanda D. Edwards 40 Diabetes 029489 Robert R. McGowan 56 Diabetes 013744 Kimberly R. Seay 51 Cancer 018740 Dennis G. Bates 32 Diabetes 032591 Donna R. Bridges 26 Diabetes 198329 Ronald S. Ogden 53 Cancer 0

multi-column sort

Filter

Page 214: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

O-sort

12809 Amanda D. Edwards 40 Diabetes 029489 Robert R. McGowan 56 Diabetes 013744 Kimberly R. Seay 51 Cancer 018740 Dennis G. Bates 32 Diabetes 032591 Donna R. Bridges 26 Diabetes 198329 Ronald S. Ogden 53 Cancer 0

multi-column sort

Filter

Page 215: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

O-sort

12809 Amanda D. Edwards 40 Diabetes 0

29489 Robert R. McGowan 56 Diabetes 0

13744 Kimberly R. Seay 51 Cancer 0

18740 Dennis G. Bates 32 Diabetes 0

32591 Donna R. Bridges 26 Diabetes 1

98329 Ronald S. Ogden 53 Cancer 0

multi-column sort

Filter

Page 216: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

12809 Amanda D. Edwards 40 Diabetes 0

29489 Robert R. McGowan 56 Diabetes 0

13744 Kimberly R. Seay 51 Cancer 0

18740 Dennis G. Bates 32 Diabetes 0

32591 Donna R. Bridges 26 Diabetes 1

98329 Ronald S. Ogden 53 Cancer 0

multi-column sort

Filter

Page 217: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

12809 Amanda D. Edwards 40 Diabetes 0

29489 Robert R. McGowan 56 Diabetes 0

13744 Kimberly R. Seay 51 Cancer 0

18740 Dennis G. Bates 32 Diabetes 0

32591 Donna R. Bridges 26 Diabetes 1

98329 Ronald S. Ogden 53 Cancer 0

multi-column sort

FilterFilter

Page 218: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

12809 Amanda D. Edwards 40 Diabetes 0

29489 Robert R. McGowan 56 Diabetes 0

13744 Kimberly R. Seay 51 Cancer 0

18740 Dennis G. Bates 32 Diabetes 0

98329 Ronald S. Ogden 53 Cancer 0

multi-column sort

FilterFilter

Page 219: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

12809 Amanda D. Edwards 40 Diabetes 0

29489 Robert R. McGowan 56 Diabetes 0

13744 Kimberly R. Seay 51 Cancer 0

18740 Dennis G. Bates 32 Diabetes 0

98329 Ronald S. Ogden 53 Cancer 0

multi-column sort

Filter

Page 220: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Rule-based optimization

medical

O-sort

Scan

Agg.

O-sort

Opaque op.

Project

medical

Filter

Aggregation

Logical op.

12809 Amanda D. Edwards 40 Diabetes 0

29489 Robert R. McGowan 56 Diabetes 0

13744 Kimberly R. Seay 51 Cancer 0

18740 Dennis G. Bates 32 Diabetes 0

98329 Ronald S. Ogden 53 Cancer 0

multi-column sort

Filter

Eliminated one oblivious sort!

Page 221: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 222: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Observation: not all tables are sensitive

Page 223: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Observation: not all tables are sensitive

P_ID

D_ID

Name

Age

Hospitalizedpatients

D_ID

Name

G_ID

Disease

M_ID

D_ID

Name

Cost

Medication

Page 224: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Observation: not all tables are sensitive

P_ID

D_ID

Name

Age

Hospitalizedpatients

D_ID

Name

G_ID

Disease

M_ID

D_ID

Name

Cost

Medication

P_ID

D_ID

Name

Age

Hospitalizedpatients

Page 225: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Observation: not all tables are sensitive

P_ID

D_ID

Name

Age

Hospitalizedpatients

D_ID

Name

G_ID

Disease

M_ID

D_ID

Name

Cost

Medication

P_ID

D_ID

Name

Age

Hospitalizedpatients

Opaque can operate in mixed sensitivity:sensitive tables are run with oblivious operators

Page 226: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

⨝⨝⨝

A B C DC

Observation: not all tables are sensitive

Page 227: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

⨝⨝⨝

A B C DC

Observation: not all tables are sensitive

Page 228: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

⨝⨝⨝

A B C DC

Observation: not all tables are sensitive

Not oblivious!

Page 229: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

⨝⨝⨝

A B C DC

Observation: not all tables are sensitive

Page 230: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

⨝⨝⨝

A B C DC

Observation: not all tables are sensitive

Sensitivity propagation: propagate obliviousness from leaf to root

Page 231: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

⨝⨝⨝

A B C DC

Observation: not all tables are sensitive

Sensitivity propagation: propagate obliviousness from leaf to root

Page 232: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

⨝⨝⨝

A B C D

C

Observation: not all tables are sensitive

Sensitivity propagation: propagate obliviousness from leaf to root

Page 233: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Insight 2

Sensitivity propagationintroduces a new dimension to

query optimization

Page 234: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimizationP_ID

D_ID

Name

Age

Hospitalizedpatients

D_ID

Name

G_ID

Disease

M_ID

D_ID

Name

Cost

Medication

Find the least costly medication for each patient

Page 235: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimizationP_ID

D_ID

Name

Age

Hospitalizedpatients

D_ID

Name

G_ID

Disease

M_ID

D_ID

Name

Cost

Medication

Find the least costly medication for each patient

Assumption: |P| < |D| < |M|

Page 236: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimizationP_ID

D_ID

Name

Age

Hospitalizedpatients

D_ID

Name

G_ID

Disease

M_ID

D_ID

Name

Cost

Medication

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Find the least costly medication for each patient

Assumption: |P| < |D| < |M|

Page 237: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimizationP_ID

D_ID

Name

Age

Hospitalizedpatients

D_ID

Name

G_ID

Disease

M_ID

D_ID

Name

Cost

Medication

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Find the least costly medication for each patient

Assumption: |P| < |D| < |M|

Page 238: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimizationP_ID

D_ID

Name

Age

Hospitalizedpatients

D_ID

Name

G_ID

Disease

M_ID

D_ID

Name

Cost

Medication

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Find the least costly medication for each patient

3-way join

Assumption: |P| < |D| < |M|

Page 239: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimization

Patient Disease Medication

⨝ 𝝪⨝

Patient

Disease

Medication

𝝪⨝

SQL optimizer with new cost:

Page 240: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimization

Patient Disease Medication

⨝ 𝝪⨝

Patient

Disease

Medication

𝝪⨝

SQL optimizer with new cost:

More selective non-oblivious join

Page 241: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimization

Patient Disease Medication

⨝ 𝝪⨝

Patient

Disease

Medication

𝝪⨝

SQL optimizer with new cost:

More selective non-oblivious join

Page 242: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimization

Patient Disease Medication

⨝ 𝝪⨝

Patient

Disease

Medication

𝝪⨝

SQL optimizer with new cost and sensitivity propagation:

Page 243: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimization

Patient Disease Medication

⨝ 𝝪⨝

Patient

Disease

Medication

𝝪⨝

SQL optimizer with new cost and sensitivity propagation:

Fewer oblivious joins

Page 244: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Cost-based optimization

Patient Disease Medication

⨝ 𝝪⨝

Patient

Disease

Medication

𝝪⨝

SQL optimizer with new cost and sensitivity propagation:

Fewer oblivious joins

Page 245: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation setup

Page 246: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation setup• Single machine experiments:

• Intel Xeon E3-1280 v5, 4 cores, 64 GB RAM

• Intel SGX: 128 MB of enclave page cache (EPC)

Page 247: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation setup• Single machine experiments:

• Intel Xeon E3-1280 v5, 4 cores, 64 GB RAM

• Intel SGX: 128 MB of enclave page cache (EPC)

• Distributed experiments

• A cluster of 5 SGX machines

Page 248: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation

Page 249: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation• How does Opaque compare to Spark SQL?

Page 250: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation• How does Opaque compare to Spark SQL?

• Big Data Benchmark (BDB); 4 queries total

Page 251: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation• How does Opaque compare to Spark SQL?

• Big Data Benchmark (BDB); 4 queries total• Queries 1, 2, 3: filter, aggregation, join

Page 252: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation• How does Opaque compare to Spark SQL?

• Big Data Benchmark (BDB); 4 queries total• Queries 1, 2, 3: filter, aggregation, join• 1 million records

Page 253: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation• How does Opaque compare to Spark SQL?

• Big Data Benchmark (BDB); 4 queries total• Queries 1, 2, 3: filter, aggregation, join• 1 million records

• How does Opaque compare to state-of-the-art oblivious systems?

Page 254: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation• How does Opaque compare to Spark SQL?

• Big Data Benchmark (BDB); 4 queries total• Queries 1, 2, 3: filter, aggregation, join• 1 million records

• How does Opaque compare to state-of-the-art oblivious systems?• GraphSC (oblivious graph analytics)

Page 255: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Evaluation• How does Opaque compare to Spark SQL?

• Big Data Benchmark (BDB); 4 queries total• Queries 1, 2, 3: filter, aggregation, join• 1 million records

• How does Opaque compare to state-of-the-art oblivious systems?• GraphSC (oblivious graph analytics)

• PageRank

Page 256: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Page 257: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Data encryption, authentication, computation verification

Page 258: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Page 259: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Page 260: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Page 261: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Overhead: -0.47x to 2.3x

Page 262: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Overhead: -0.47x to 2.3x

+ Obliviousness

Page 263: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Overhead: -0.47x to 2.3x

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

+ Obliviousness

Page 264: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Overhead: -0.47x to 2.3x

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

+ Obliviousness

Page 265: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Overhead: -0.47x to 2.3x

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

+ Obliviousness

Page 266: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Big Data Benchmark (distributed)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

Data encryption, authentication, computation verification

Overhead: -0.47x to 2.3x

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQLOpaque

+ Obliviousness

Overhead: 21x to 45x

Page 267: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

PageRank: comparison with GraphSC (single machine)

Page 268: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

How does Opaque fit among practical encrypted databases*?

*single-cloud, relational DBs

Page 269: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Confidentiality is a beast

Page 270: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Confidentiality is a beast

• Sharp tradeoff between performance and confidentiality. No practical encrypted database has perfect confidentiality.

Page 271: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Confidentiality is a beast

• Sharp tradeoff between performance and confidentiality. No practical encrypted database has perfect confidentiality.

fun

Page 272: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Confidentiality is a beast

• Sharp tradeoff between performance and confidentiality. No practical encrypted database has perfect confidentiality.

fun

• Non-ideal confidentiality is meaningful: it removes classes of attackers, and leaks much less data

Page 273: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Spectrum of contributions

plaintext

no leakage

property-preserving encryption

semantic security

hide access patterns

hide result sizes

timing, query, structure of input

all leaks

Page 274: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Spectrum of contributions

plaintext

no leakage

property-preserving encryption

semantic security

hide access patterns

hide result sizes

timing, query, structure of input

all leaks

better performance

Page 275: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Spectrum of contributions

plaintext

no leakage

property-preserving encryption

semantic security

hide access patterns

hide result sizes

timing, query, structure of input

all leaks CryptDBMonomi

Cipherbase

better performance

Page 276: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Spectrum of contributions

plaintext

no leakage

property-preserving encryption

semantic security

hide access patterns

hide result sizes

timing, query, structure of input

all leaks CryptDBMonomi

Cipherbase

Haven, SCONE, VC3,Arx, BlindSeer, [FJK+15]

better performance

Page 277: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Spectrum of contributions

plaintext

no leakage

property-preserving encryption

semantic security

hide access patterns

hide result sizes

timing, query, structure of input

all leaks CryptDBMonomi

Cipherbase

Haven, SCONE, VC3,Arx, BlindSeer, [FJK+15] Opaque

better performance

Page 278: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Spectrum of contributions

plaintext

no leakage

property-preserving encryption

semantic security

hide access patterns

hide result sizes

timing, query, structure of input

all leaks CryptDBMonomi

Cipherbase

Haven, SCONE, VC3,Arx, BlindSeer, [FJK+15]

Opaque+pad

Opaque

better performance

Page 279: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Spectrum of contributions

plaintext

no leakage

property-preserving encryption

semantic security

hide access patterns

hide result sizes

timing, query, structure of input

all leaks

Each of these systems protects against classes of attackers

CryptDBMonomi

Cipherbase

Haven, SCONE, VC3,Arx, BlindSeer, [FJK+15]

Opaque+pad

Opaque

better performance

Page 280: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Spectrum of contributions

plaintext

no leakage

property-preserving encryption

semantic security

hide access patterns

hide result sizes

timing, query, structure of input

all leaks

Each of these systems protects against classes of attackers

CryptDBMonomi

Cipherbase

Haven, SCONE, VC3,Arx, BlindSeer, [FJK+15]

Opaque+pad

Opaque

Opaque offers strong security guarantees

better performance

Page 281: Opaque: An Oblivious and Encrypted Distributed Analytics ...platformlab.stanford.edu/Seminar Talks/Raluca_Popa.pdf · Opaque: An Oblivious and Encrypted Distributed Analytics Platform

ConclusionOpaque is an oblivious and encrypted distributed analytics platform

Spark SQLOpaque

SQL ML Graph Analytics

Open source: github.com/ucbrise/opaque