opaque: an oblivious and encrypted distributed analytics ...cs261/fa18/...opaque: an oblivious and...

34
Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave, Jethro Beekman, Raluca Ada Popa, Joseph Gonzalez, and Ion Stoica UC Berkeley [NSDI’17]

Upload: others

Post on 24-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Joint work with: Wenting Zheng, Ankur Dave, Jethro Beekman, Raluca Ada Popa, Joseph Gonzalez, and Ion Stoica

UC Berkeley

[NSDI’17]

Page 2: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Complex analytics run on sensitive data

client

SparkSQL MLLib GraphX Spark

Streaming

cloud provider

sensitive data

Page 3: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,
Page 4: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Cloud attackers

client cloud provider

sensitive data

Page 5: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Attacker has full access to all cloud software

Threat model

Page 6: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

How to protect data and computation

while preserving functionality?

relational algebra

Page 7: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Hardware enclaves 101

Page 8: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

• Hardware-enforced isolated execution environment

• Data decrypted only on the processor

Hardware enclaves

memoryon die

core cache MEE

• Protect against an attacker who has root access or compromised OS

(Intel SGX)

Page 9: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Enables verifying which code runs in the enclave and performing key exchange

Remote attestation

client code

hash

Client Server

enclave

untrusted OS

Page 10: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Attacker model

memoryprocessor

Attacker is restricted to software attacks only, and does not exploit timing. Attacker controls the software stack.

machine

Page 11: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Access patterns leakage

Page 12: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Access patternsmemory

processoraddresses

machine 0

network messages machine 1

Page 13: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Example: network access pattern leakage

ID Name Age Disease

12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes98329 Ronald S. Ogden 53 Cancer32591 Donna R. Bridges 26 Diabetes

SELECT count(*) FROM medical GROUP BY disease

Page 14: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Public information:Diabetes twice as commonas cancer

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 15: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Example: network access pattern leakage

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Learns that Alice has cancer

??? Diabetes

??? Diabetes

??? Cancer

??? Diabetes

??? Cancer

??? Diabetes

??? Cancer

Page 16: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Leakage from prior work

• Memory access patterns attacks [XCP15] extracted complete text documents and photo outlines

• Network access patterns [OCF+15] extracted age, gender, address of individuals

Page 17: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Goal: oblivious distributed analytics

access patterns are independent of data content

Page 18: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Opaque*: oblivious and encrypted distributed analytics platform

* Oblivious Platform for Analytic QUEries

Spark SQLOpaque

SQL ML Graph Analytics

Page 19: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Security guarantees (informal)• Data encryption and authentication

• Computation integrity: the client can check that the computation result was not affected by an attacker

• Obliviousness: The memory and network accesses of a query is the same for any two inputs with the same size characteristics (input/outputs)• When enabling padding, Opaque hides output

sizes as well

Page 20: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Achieving practical obliviousness is not easy

Obliviousness typically comes with high overheads

• For example, the state-of-the-art system, ObliVM, is six orders of magnitude slower than regular computation

Page 21: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 22: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Query execution

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3

Query

query = SELECT sum(*) FROM table

10 13 427

Page 23: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Problem: cloud can alter distributed computation

• Drop data

• Modify data

• Skip task

• Replay old state

Page 24: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Example: drop data

Spark Driver

Opaque

Catalyst

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

10 13 423

Page 25: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Self-verifying computationInvariant: if computation does not abort, the execution completed so far is correct

If the computation is complete, then the entire query was executed correctly

Page 26: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Self-verifying computation 20

1413 1510

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 27: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Opaque components

Distributed oblivious operatorsOblivious

FilterOblivious

AggregationOblivious

Join

Computation verification

Rule-based opt. Cost-based opt.

Data encryption and authentication

Oblivious query planning

Cost model

Page 28: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

SELECT count(*) FROM medical GROUP BY disease

1

2

There can be many partitions

Page 29: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivioussort

[CLRS, Leighton ‘85]

Map Sort

SELECT count(*) FROM medical GROUP BY disease

????

Page 30: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Map Sort

Oblivioussort

[CLRS, Leighton ‘85]

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 31: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

The “Diabetes” group is split!

How to aggregate obliviously and in parallel?It can span over many partitions

Page 32: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Scan

Statistics

Statistics

Partial agg.

Oblivious aggregation

DUMMYDiabetes:1

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

SELECT count(*) FROM medical GROUP BY disease

Cancer;Diabetes:1

Diabetes;Diabetes:3

Page 33: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Scan

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

Diabetes:1

DummyDUMMY

Diabetes:1

SELECT count(*) FROM medical GROUP BY disease

Page 34: Opaque: An Oblivious and Encrypted Distributed Analytics ...cs261/fa18/...Opaque: An Oblivious and Encrypted Distributed Analytics Platform Joint work with: Wenting Zheng, Ankur Dave,

Oblivious aggregation

DUMMY

Cancer: 2

DUMMY

DUMMY

DUMMY

Diabetes:4

Sort

Oblivioussort

[CLRS, Leighton ‘85]

Final result

SELECT count(*) FROM medical GROUP BY disease

Aggregation hastwo sorts…

Can we do better?