sgx ir - secure information retrieval with trusted processors · secure information retrieval with...

33
UT DALLAS Erik Jonsson School of Engineering & Computer Science SGX IR Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu The University of Texas at Dallas FEARLESS engineering 1 / 29

Upload: others

Post on 14-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science

FEARLESS engineering

SGX IRSecure Information Retrieval with Trusted Processors

Fahad Shaon, Murat Kantarcioglu

The University of Texas at Dallas

FEARLESS engineering 1 / 29

Page 2: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Problem - Secure Cloud based Information Retrieval

Encrypted Data

Encrypted Result

Encrypted Search Query

Build a secure information retrieval system

I User stores encrypted files in cloud server

I Perform selective retrieval

FEARLESS engineering 2 / 29

Page 3: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Build Block - Intel SGX

I We use Intel SGX - Software Guard Extensions

I SGX is new Intel instruction set

I Allows us to create secure compartment inside processor,called Enclave

I Privileged softwares, such as, OS, Hypervisor, can not directlyobserve data and computation inside enclave

FEARLESS engineering 3 / 29

Page 4: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Threat Model - Intel SGX

Encrypted Data & Code

Encrypted Result

Disk

Memory CPU

Enclave

Server

Adversary can control hypervisor, OS, memory, disk of the server

FEARLESS engineering 4 / 29

Page 5: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

State of The Art

I Relevant search or indexing systems that uses SGX -HardIDX (Fuhry et al., 2017), Rearguard (Sun et al., 2018),Oblix (Mishra et al., 2018), Hardware-supportedORAM (Hoang et al., 2019)

I These works mainly focus on building efficient data structuresfor searching using SGX

I Assume inverted index is built and/or build the index in client

I Did not look into ranked retrieval

FEARLESS engineering 5 / 29

Page 6: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Challenges - Access Pattern Leakage

Challenge: Access Pattern Leakage

I Adversary can observe memory accesses in SGX

I Memory access reveals about encrypted data (Islam, Kuzu,and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)

Solution

I Data Obliviousness - we build custom data obliviousindexing algorithms

FEARLESS engineering 6 / 29

Page 7: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Challenges - Access Pattern Leakage

Challenge: Access Pattern Leakage

I Adversary can observe memory accesses in SGX

I Memory access reveals about encrypted data (Islam, Kuzu,and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)

Solution

I Data Obliviousness - we build custom data obliviousindexing algorithms

FEARLESS engineering 6 / 29

Page 8: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Data Obliviousness - Oblivious Select

I Data Obliviousness: Program executes same path for allinput of same size

I Example: x == y ? a : b

oblivousSelect(a, b, x, y):

...

mov %[x],%%eax

mov %[y],%%ebx

xor %%eax , %%ebx

...

mov %[a],%%ecx

mov %[b],%%edx

cmovz %%ecx ,%%edx

...

mov %%edx , %[out]

FEARLESS engineering 7 / 29

Page 9: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Data Obliviousness - Oblivious Select

I Data Obliviousness: Program executes same path for allinput of same size

I Example: x == y ? a : b

oblivousSelect(a, b, x, y):

...

mov %[x],%%eax

mov %[y],%%ebx

xor %%eax , %%ebx

...

mov %[a],%%ecx

mov %[b],%%edx

cmovz %%ecx ,%%edx

...

mov %%edx , %[out]

FEARLESS engineering 7 / 29

Page 10: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Data Obliviousness - Oblivious Select

I Data Obliviousness: Program executes same path for allinput of same size

I Example: x == y ? a : b

oblivousSelect(a, b, x, y):

...

mov %[x],%%eax

mov %[y],%%ebx

xor %%eax , %%ebx

...

mov %[a],%%ecx

mov %[b],%%edx

cmovz %%ecx ,%%edx

...

mov %%edx , %[out]

FEARLESS engineering 7 / 29

Page 11: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Challenge - Memory Constraint

Challenge: Memory Constraint

I SGX (v1) only 90MB enclave

Solution

I Blocking - Break large data into small blocks

I We utilize SGXBigMatrix (Shaon et al., 2017) primitives

I BigMatrix handles the complexity of data blocking

FEARLESS engineering 8 / 29

Page 12: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Challenge - Memory Constraint

Challenge: Memory Constraint

I SGX (v1) only 90MB enclave

Solution

I Blocking - Break large data into small blocks

I We utilize SGXBigMatrix (Shaon et al., 2017) primitives

I BigMatrix handles the complexity of data blocking

FEARLESS engineering 8 / 29

Page 13: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Objectives - Summary

Encrypted Intermediate Data

Encrypted Result

Pre-Processing

Encrypted Search Query

Final Processing

I Very low client side processing

I Build index securely in the cloud using SGX

I Build data oblivious algorithms

I Support ranked retrieval

FEARLESS engineering 9 / 29

Page 14: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

SGX IR - Document and Query Types

I Text DataI Ranked document retrieval using TF-IDF (Token Frequency

and Inverse Document Frequency)

I Image DataI Face recognition using Eigenface

FEARLESS engineering 10 / 29

Page 15: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Text Pre-Processing - Client

TokenizationStemming

TokenIDGeneration

Cryptography is the practice and

study of techniques for secure

communication ...

cryptographi practic studi

techniqu secur commun

cryptographi

practic

studi

techniqu

secur

1

2

3

4

5

EncryptedBigMatrixGeneration

tok-id

1

2

3

doc-id

11

1

1

...

freq

...

2

10

6

...

I We tokenize and stem the input text files

I We build a matrix I with token id, document id, andfrequency columns

I Finally, we encrypt I and upload

I Single round of read and write is required

FEARLESS engineering 11 / 29

Page 16: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Text Indexing - Server

doc-idtok-id freq

1 1 22 1 3... ... ...

8 2 11 2 5... ... ...

17 3 81 4 1... ... ...

counttok-id sum

1 8 20

2 4 9

3 7 154 5 3

# # #... ... ...

... ... ...

5 1 2

6 1 1

doc-idtok-id freq

1,0 1 2

7 2

... ... ...

1 3

9 10... ... ...

1,1... ...

2,0... ... ...

2,1

9 103,0

...Indexing

I Input I, we output two matrices

I U ′ containing total frequencies of the tokens, for IDFcalculation

I T containing equal length blocks of token to documentfrequency mapping for TF calculation

FEARLESS engineering 12 / 29

Page 17: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Text Indexing - IDF - Server

doc-idtok-id freq

1 1 22 1 3... ... ...

8 2 11 2 5... ... ...

17 3 81 4 1... ... ...

Sort Count & Sum Sort and

Adjust

doc-idtok-id freq

1 1 21 2 5

... ... ...

1 4 1... ... ...

2 1 3

2 5 10

3 6 4

... ... ...

counttok-id sum

1 0 0# # #... ... ...

2 8 20# # #... ... ...

3 4 9# # #... ... ...

counttok-id sum

1 8 20

2 4 9

3 7 154 5 3

# # #... ... ...

... ... ...

5 1 2

6 1 1

I I ′ ← Obliviously sort I on token id columnI We generate U , to keep count and sum of frequencies

I c← I ′[i].token id 6= I ′[i− 1].token idI U [i].sum← obliviousSelect(sum,#, 1, c)I sum← obliviousSelect(sum, 0, 1, c) + I[i].frequency

I Finally, we sort this matrix so that the dummy entries go tothe bottom

FEARLESS engineering 13 / 29

Page 18: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Text Indexing - TF - Block Size Optimization

I We can read document frequency of tokens from matrix I ′

I This will reveal number of documents having a specific token

I So, we split I ′ into equal length blocks

I We optimize block size b from count column of U ′ usingtechnique outline in (Shaon and Kantarcioglu, 2016)

I We assume the frequency follow Pareto distributionI Mathematically find the value minimize the padding

FEARLESS engineering 14 / 29

Page 19: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Text Indexing - TF - Padding Generation

We regenerate token id with bucket number function σ

doc-idtok-id freq

1 1 21 2 5

... ... ...

1 4 1... ... ...

2 1 3

2 5 10

3 6 4

... ... ...

Regenerate TokenId

doc-idtok-id freq

1,0 1 2

7 2

... ... ...

1 3

9 10... ... ...

1,1... ... ...

2,0... ... ...

2,1

9 103,0

We generate padding

Generate Padding Rows

counttok-id sum

1 8 20

2 4 9

3 7 154 5 3

# # #... ... ...

... ... ...

5 1 2

6 1 1

doc-idtok-id freq

1,1 # #... ... ...

... ... ...

... ... ...

2,1... ... ...

3,1

# # #

# #

## #

# #

Finally we merge and sort X and J to get the output T matrix.

FEARLESS engineering 15 / 29

Page 20: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

TF - IDF Calculation

I On T we run term frequency functions - (log normalization)

1 + log(tft,d)

I On U ′ we run document frequency functions, such as, IDF

logN

dft

I Query result we use T for TF and U ′ for IDF

FEARLESS engineering 16 / 29

Page 21: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Bitonic Sorting of Arbitrary Input Size

I Sorting is one of the most frequently used operations

I We use arbitrary length Bitonic sort version (Lang, 1998)

I However, existing definition is recursive

I Not suitable for memory constrained environments like SGX

I So, we propose a non-recursive algorithm without usingstack

FEARLESS engineering 17 / 29

Page 22: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Bitonic Sort Non Recursive Algorithm - Concept

Concept

I We can express a number asN = 2xm+...+2x3+2x2+2x1

I Merge network can sort adescending and an ascendingblock into ascending orderblock

I We sort then merge fromsmallest to biggest block

Sort Merge

FEARLESS engineering 18 / 29

Page 23: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Bitonic Sort Non Recursive Algorithm

1: for d = 0 to dlog2(N)e do2: if ((N >> d) & 1) 6= 0 then3: start← (−1 << (d+ 1)) & N4: size← 1 << d5: dir ← (size & N &−N) 6= 06: bitonicSort2K(start, size, dir)7: if !dir then8: bitonicMerge(start,N − start, 1)9: end if

10: end if11: end for

FEARLESS engineering 19 / 29

Page 24: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Face recognition indexing

I We adopt EigenFace

I Pre-processing and matching face are simple matrix operations

I Core problem to solve obliviously is eigenvector calculation

I We adopt Jacobi method of eigenvector calculation

FEARLESS engineering 20 / 29

Page 25: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Eigenvector calculation - Jacobi method

Oblivious Value Extract

Oblivious Column Extract

Rotate

Oblivious Column Assign

Oblivious Row Assign

Calculate & Ressign

We find the max off-diagonal element at Ak,l, then rotate columnk and l. Repeat until A becomes diagonal. The diagonal valuesare eigenvalues.

FEARLESS engineering 21 / 29

Page 26: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Experimental Evaluations

We implemented a prototype using Intel SGX SDK 2.6 for Linux

Setup

I Processor Intel Xeon E3-1270

I Memory 64GB

I OS Ubuntu 18.04

I SGX SDK Version 2.6 for Linux

FEARLESS engineering 22 / 29

Page 27: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Experimental Results - Bitonic Sort and Text Indexing

0

5

10

15

20

25

30

0

5x10

6

1x10

7

1.5

x107

2x10

7

2.5

x107

3x10

7

3.5

x107

So

rtin

g t

ime

(min

)

Number of rows

Sorting timeSorting time next 2

k

Bitonic sort

0 20 40 60 80

100 120 140 160

950

1000

1050

1100

1150

1200

1250

1300

1350

Ind

ex p

rep

arat

ion

(se

c)

Data set size (MB)

Encryption onlyIncremental id

MD5 hashSHA256 hashMurmur hash

Client end processing cost onEnron Dataset

FEARLESS engineering 23 / 29

Page 28: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Experimental Results - Text Indexing

100

120

140

160

180

200

220

240

260

280

950

1000

1050

1100

1150

1200

1250

1300

1350

Ind

exin

g t

ime

(min

)

Data set size (MB)

Oblivious index buildingNon-Oblivious index building

SGX index processing on EnronDataset

0

0.2

0.4

0.6

0.8

1

950

1000

1050

1100

1150

1200

1250

1300

1350

ND

CG

Sco

re

Data set size (MB)

NDCG Score compare to Lucene

NDCG results compare to ApaceLucene on Enron Dataset

FEARLESS engineering 24 / 29

Page 29: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Experimental Result - Eigenvector Calculation

15

20

25

30

35

40

800

850

900

950

1000

1050

1100

1150

1200

Sca

lin

g a

nd

pro

ject

tim

e (s

ec)

Number of faces

Scaling and project time (sec)

Pre-processing overhead

0

50

100

150

200

250

300

350

400

800

850

900

950

1000

1050

1100

1150

1200

1250

Cal

cula

tio

n t

ime

(Ho

urs

)

Matrix elements

Oblivious Eigenface calculationNon-oblivious Eigenface calculation

Eigenvector calculation time

FEARLESS engineering 25 / 29

Page 30: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Thank you

Questions / Comments

I Fahad Shaon - [email protected]

I Murat Kantarcioglu - [email protected]

Acknowledgments: This research is supported in part by NIHaward 1R01HG006844, NSF awards CICI-1547324, IIS-1633331,CNS-1837627, OAC-1828467 and ARO award W911NF-17-1-0356.

FEARLESS engineering 26 / 29

Page 31: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

References I

Fuhry, Benny et al. (2017). “HardIDX: Practical and secure indexwith SGX”. In: IFIP Annual Conference on Data andApplications Security and Privacy. Springer, pp. 386–408.

Hoang, Thang et al. (2019). “Hardware-supported ORAM ineffect: Practical oblivious search and update on very largedataset”. In: Proceedings on Privacy Enhancing Technologies2019.1, pp. 172–191.

Islam, Mohammad Saiful, Mehmet Kuzu, and Murat Kantarcioglu(2012). “Access Pattern disclosure on Searchable Encryption:Ramification, Attack and Mitigation.”. In: NDSS. Vol. 20, p. 12.

Lang, H.W. (1998). Bitonic sorting network for n not a power of 2.Accessed 05/31/2020. url: http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/bitonic/oddn.htm.

FEARLESS engineering 27 / 29

Page 32: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

References II

Mishra, Pratyush et al. (2018). “Oblix: An efficient oblivioussearch index”. In: 2018 IEEE Symposium on Security andPrivacy (SP). IEEE, pp. 279–296.

Naveed, Muhammad, Seny Kamara, and Charles V Wright (2015).“Inference attacks on property-preserving encrypted databases”.In: Proceedings of the 22nd ACM SIGSAC Conference onComputer and Communications Security. ACM, pp. 644–655.

Shaon, Fahad and Murat Kantarcioglu (2016). “A practicalframework for executing complex queries over encryptedmultimedia data”. In: IFIP Annual Conference on Data andApplications Security and Privacy. Springer, pp. 179–195.

FEARLESS engineering 28 / 29

Page 33: SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

References III

Shaon, Fahad et al. (2017). “SGX-BigMatrix: A PracticalEncrypted Data Analytic Framework With Trusted Processors”.In: Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security. CCS 17. Dallas, Texas,USA: Association for Computing Machinery, 12111228. isbn:9781450349468. doi: 10.1145/3133956.3134095.

Sun, Wenhai et al. (2018). “REARGUARD: Secure keyword searchusing trusted hardware”. In: IEEE INFORM.

FEARLESS engineering 29 / 29