lowering the bar: deep learning for side channel analysis · deep learning: 90% success rate deep...

45
1 Lowering the Bar: Deep Learning for Side Channel Analysis Guilherme Perin, Baris Ege, Jasper van Woudenberg @jzvw Blackhat, August 9, 2018

Upload: others

Post on 25-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

1

Lowering the Bar:

Deep Learning for

Side Channel Analysis

Guilherme Perin, Baris Ege,

Jasper van Woudenberg @jzvw

Blackhat, August 9, 2018

Page 2: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

2

Before

BLACKHAT 2018

Leakage

modeling

Signal

processing

Page 3: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

3

After

BLACKHAT 2018

Page 4: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

4

Power / EM side

channel analysis

BLACKHAT 2018

Page 5: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

5BLACKHAT 2018

Page 6: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

6

Power analysis

BLACKHAT 2018

Some crypto algorithm

Page 7: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

7

Example (huge) leakage

BLACKHAT 2018

Data leakage Noise

Page 8: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

8

Signal processing (demo)

Raw trace

Processed trace

BLACKHAT 2018

Page 9: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

9

Misalignment (demo)

BLACKHAT 2018

Page 10: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

10

AES-128 first round attack

𝑖0

𝑖4

𝑖8

𝑖12

𝑖1

𝑖5

𝑖9

𝑖13

𝑖2

𝑖6

𝑖10

𝑖14

𝑖3

𝑖7

𝑖11

𝑖15

𝑘0

𝑘4

𝑘8

𝑘12

𝑘1

𝑘5

𝑘9

𝑘13

𝑘2

𝑘6

𝑘10

𝑘14

𝑘3

𝑘7

𝑘11

𝑘15

𝑥0

𝑥4

𝑥8

𝑥12

𝑥1

𝑥5

𝑥9

𝑥13

𝑥2

𝑥6

𝑥10

𝑥14

𝑥3

𝑥7

𝑥11

𝑥15

𝑠0

𝑠4

𝑠8

𝑠12

𝑠1

𝑠5

𝑠9

𝑠13

𝑠2

𝑠6

𝑠10

𝑠14

𝑠3

𝑠7

𝑠11

𝑠15

S-BOX

Key Addition

Unknown

Known

Leakage model,Power prediction

Page 11: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

11

Points of interest selection

BLACKHAT 2018

Data leakage Noise

Correlation, T-test, Difference of Means

Samples showing statistical dependency between intermediate (key-related) data and power consumption.

Page 12: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

12

Concept of Template Analysis

BLACKHAT 2018

Open Sample

CiphertextKeys Templates

LeakageModelMeasure

Closed Sample

Fixed Key Analysis

InputMeasure

Input

Learn (Profiling) Phase

Attack (Exploitation) Phase

Page 13: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

13

Key recovery

BLACKHAT 2018

Number of traces

Key

Byt

e R

ank AES key bytes 0-15

Page 14: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

14

The actual process

Setup

Processing

Acquisition

Analysis

BLACKHAT 2018

Page 15: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

15

Deep learning

background

BLACKHAT 2018

Page 16: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

16

Deep Learning

BLACKHAT 2018

dog

dog

dog

cat

cat

cat

Data with labels

Page 17: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

17

Deep Learning

BLACKHAT 2018

machineCat (%)

Dog (%)

Error function

BACK-PROPAGATION ALGORITHM

Train a machine to classify these data

Data with labels

Page 18: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

18

Deep Learning

BLACKHAT 2018

Test the machine on new data

Train a machine to classify these data

Data with labels Cat (%)

Dog (%)

Trained machine

Page 19: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

19

Deep Learning

BLACKHAT 2018

Is classification accuracy good

enough?

No

Yes

We are done!

Change parameters

Test the machine on new data

Train a machine to classify these data

Data with labels Trained

machineCat

Machine = Deep Neural Network

Page 20: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

20

Input Layer (the size is

equivalent to the number of samples)

Dense Layers(classifiers)

Output Layer(the size is

equivalent to the number of classes)

Conv. Layers(feature extractor + encoding)

The convolutional layers are able to detect the features independently of their positions

Convolutional Neural Networks (CNNs)

Page 21: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

21

Creating training/test/validation data sets

BLACKHAT 2018

samples HW = 5

samples HW = 7

samples HW = 3

samples HW = 4

features label

Leakage model

Page 22: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

22

Classification

BLACKHAT 2018

HW = 4

HW = 5

HW = 6

HW = 7

Key enumeration using output probabilities

(Bayes)0.65

0.15

0.05

0.08

0.01

0.02

0.01

0.02

0.02

Softmax (σ𝑝𝑖 = 1)

Trained Model

Trace (samples)

Page 23: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

23

Deep learning on

side channels

in practice

BLACKHAT 2018

Page 24: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

24

Step 1: Define initial hyper-parameters (demo)

BLACKHAT 2018

HW = 0

HW = 1

Page 25: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

25

Step 2: Make sure it’s capable of learning

BLACKHAT 2018

• Increase the number of training traces and observe the training and validation accuracy

• Overfitting too fast? • Training accuracy: 100% | Validation accuracy: low

• Neural network is too big for the number of traces and samples

Page 26: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

26

Step 3: Make it generalize

Make sure the training accuracy/recall is increasing

0.111 Validation recall stays above the minimum

threshold value = model is generalizing

0.111 = 1/9 (9 is the number of classes – HW of a byte)

NN is learning

from its

training set

BLACKHAT 2018

Page 27: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

27

Step 3: Make it generalize

Regularization techniques:

L1, L2 (penalty applied to the weights)

Dropout

Data Augmentation (+traces)

Early Stopping

BLACKHAT 2018

𝑥2

𝑥1

𝑥2

𝑥1

𝑥2

𝑥1Linear Separation Good Regularization Overfitting

Low Training AccuracyLow Validation Accuracy

Good Training AccuracyGood Validation Accuracy

High Training AccuracyLow Validation Accuracy

Page 28: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

28

Step 4: Key Recovery

In this analysis, we only need slightly-above coin flip accuracy!

BLACKHAT 2018

Number of traces

Key

Byt

e R

ank

Page 29: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

29

Getting keys from

the thingz!

BLACKHAT 2018

Page 30: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

30

Piñata AES-128 with misalignment (demo)

BLACKHAT 2018

Page 31: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

31

Bypassing Misalignment with CNNs

Neural Network: Input Layer > ConvLayer > 36 > 36 > 36 > Output Layer

Training/validation/test sets: 90000/5000/5000 traces of 500 samples

Leakage Model: HW of S-Box Out (Round 1) → 9 classes

Results for key byte 0:

Use Data Augmentation

as regularization

technique to improve

generalization

BLACKHAT 2018Number of traces

Key

Byt

e R

ank

Page 32: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

32

Breaking protected ECC on Piñata

Supervised deep learning attack:

- Curve25519, Montgomery ladder, scalar blinding

- Messy signal

- Brute-force methods for ECC are needed if test accuracy < 100%

- Need to get (almost) all bits from one trace!

BLACKHAT 2018

Page 33: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

33

Breaking protected ECC

Unsupervised/Supervised Horizontal Attack: 60% success rate

Deep learning: 90% success rate

Deep learning( + data augmentation): 99.4% success rate

Data augmentation: 25k → 200k traces.

Misaligned traces

BLACKHAT 2018

4 Dense Layers

(100 Neurons)

3 Conv Layers

(10 filters)Input

(4000)

Output

(2 Classes)

RELU TANH SOFTMAX

Page 34: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

34

Breaking AES with First-Order Masking (demo)

Target published in 2013 (http://www.dpacontest.org/v4/)

40k traces available

AES-256 (Atmel ATMega-163 smart card)

Countermeasure: Rotating S-box Masking (RSM)

BLACKHAT 2018

Page 35: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

35

Breaking AES with First-Order Masking

𝑥 ⊕𝑚𝑦 ⊕𝑚

Combine data: (𝑥 ⊕𝑚)⊕ (𝑦 ⊕𝑚) = 𝑥 ⊕ 𝑦

Combine samples: 𝑡 𝑖 × 𝑡 𝑗

Brute-force 𝑖, 𝑗 → Quadratic complexity

Remove the relationship between

power consumption (EM) and predictable data

BLACKHAT 2018

𝑥 ⊕𝑚 𝑦⊕𝑚

𝑖 𝑗

Page 36: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

36

Breaking AES with First-Order Masking

BLACKHAT 2018

Challenge: Training key == validation key

Correct key byte candidate

(good generalization)

Wrong key byte candidate

(poor generalization)

1/9

Page 37: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

37

Breaking AES with First-Order Masking

BLACKHAT 2018

Overfitting can be verified by checking where the NN is learning

Correct key byte candidate

(CNN learns from specific

and leaky samples)

Wrong key byte candidate

(CNN overfits because it

can’t distinguish leaky

samples from noise)

Page 38: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

38

Breaking AES with First-Order Masking

Neural Network: Input Layer > ConvLayer > 50 > 50 > 50 > Output Layer

Training/validation/test sets: 36000/2000/2000 traces

Leakage Model: HW of S-Box Out (Round 1) → 9 classes

1/9

BLACKHAT 2018

Results for key byte 0:

The processing of 8 traces is sufficient to recover the key

Page 39: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

39

1st cool thing

BLACKHAT 2018

This shouldn’t work

Page 40: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

40

2nd cool thing

BLACKHAT 2018

DL is up there with dozens of SCA research teams

Page 41: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

41

Wrapping up

BLACKHAT 2018

Page 42: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

42

I want to learn more!

BLACKHAT 2018

Deeplearningbook.org introtodeeplearning.com bookstores nostarch

?

By Colin & Jasper

Page 43: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

43

Key takeaways

BLACKHAT 2018

• DL does SCA art + science and scales

• DL requires network fiddling, the bar is low, not yet at 0

• Automation needed to put a dent in embedded insecurity

Page 44: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

44

References

BLACKHAT 2018

• http://www.deeplearningbook.org/

• S. Haykin, “Neural Networks and Learning Machines”.

• E. Cagli et al, “Breaking Cryptographic Implementations Using Deep Learning”

https://eprint.iacr.org/2016/921.pdf

• Benadjila et al, “Study of Deep Learning Techniques for Side-Channel Analysis and Introduction to

ADCAD Database” https://eprint.iacr.org/2018/053.pdf

• H. Maghrebi et al, “Convolutional Neural Networks with Data Augmentation Against Jitter-Based

Countermeasures” https://eprint.iacr.org/2017/740

• Zhang et al, “Understanding deep learning requires re-thinking generalization”

https://arxiv.org/abs/1611.03530

• Keskar et al, “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima”,

https://arxiv.org/pdf/1609.04836.pdf

• Shwartz and Tishby, “Opening the black-box of Deep Learning via Information”,

https://arxiv.org/abs/1703.00810

Page 45: Lowering the Bar: Deep Learning for Side Channel Analysis · Deep learning: 90% success rate Deep learning( + data augmentation): 99.4% success rate Data augmentation: 25k →200k

45

Challenge your security

Riscure B.V.

Frontier Building, Delftechpark 49

2628 XJ Delft

The Netherlands

Phone: +31 15 251 40 90

www.riscure.com

Riscure North America

550 Kearny St., Suite 330

San Francisco, CA 94108 USA

Phone: +1 650 646 99 79

[email protected]

Riscure China

Room 2030-31, No. 989, Changle Road, Shanghai 200031

China

Phone: +86 21 5117 5435

[email protected]

[email protected]

@jzvw