lowering the bar: deep learning for side channel analysis · deep learning: 90% success rate deep...
TRANSCRIPT
1
Lowering the Bar:
Deep Learning for
Side Channel Analysis
Guilherme Perin, Baris Ege,
Jasper van Woudenberg @jzvw
Blackhat, August 9, 2018
2
Before
BLACKHAT 2018
Leakage
modeling
Signal
processing
3
After
BLACKHAT 2018
4
Power / EM side
channel analysis
BLACKHAT 2018
5BLACKHAT 2018
6
Power analysis
BLACKHAT 2018
Some crypto algorithm
7
Example (huge) leakage
BLACKHAT 2018
Data leakage Noise
8
Signal processing (demo)
Raw trace
Processed trace
BLACKHAT 2018
9
Misalignment (demo)
BLACKHAT 2018
10
AES-128 first round attack
𝑖0
𝑖4
𝑖8
𝑖12
𝑖1
𝑖5
𝑖9
𝑖13
𝑖2
𝑖6
𝑖10
𝑖14
𝑖3
𝑖7
𝑖11
𝑖15
𝑘0
𝑘4
𝑘8
𝑘12
𝑘1
𝑘5
𝑘9
𝑘13
𝑘2
𝑘6
𝑘10
𝑘14
𝑘3
𝑘7
𝑘11
𝑘15
𝑥0
𝑥4
𝑥8
𝑥12
𝑥1
𝑥5
𝑥9
𝑥13
𝑥2
𝑥6
𝑥10
𝑥14
𝑥3
𝑥7
𝑥11
𝑥15
𝑠0
𝑠4
𝑠8
𝑠12
𝑠1
𝑠5
𝑠9
𝑠13
𝑠2
𝑠6
𝑠10
𝑠14
𝑠3
𝑠7
𝑠11
𝑠15
S-BOX
Key Addition
Unknown
Known
Leakage model,Power prediction
11
Points of interest selection
BLACKHAT 2018
Data leakage Noise
Correlation, T-test, Difference of Means
Samples showing statistical dependency between intermediate (key-related) data and power consumption.
12
Concept of Template Analysis
BLACKHAT 2018
Open Sample
CiphertextKeys Templates
LeakageModelMeasure
Closed Sample
Fixed Key Analysis
InputMeasure
Input
Learn (Profiling) Phase
Attack (Exploitation) Phase
13
Key recovery
BLACKHAT 2018
Number of traces
Key
Byt
e R
ank AES key bytes 0-15
14
The actual process
Setup
Processing
Acquisition
Analysis
BLACKHAT 2018
15
Deep learning
background
BLACKHAT 2018
16
Deep Learning
BLACKHAT 2018
dog
dog
dog
cat
cat
cat
Data with labels
17
Deep Learning
BLACKHAT 2018
machineCat (%)
Dog (%)
Error function
BACK-PROPAGATION ALGORITHM
Train a machine to classify these data
Data with labels
18
Deep Learning
BLACKHAT 2018
Test the machine on new data
Train a machine to classify these data
Data with labels Cat (%)
Dog (%)
Trained machine
19
Deep Learning
BLACKHAT 2018
Is classification accuracy good
enough?
No
Yes
We are done!
Change parameters
Test the machine on new data
Train a machine to classify these data
Data with labels Trained
machineCat
Machine = Deep Neural Network
20
Input Layer (the size is
equivalent to the number of samples)
Dense Layers(classifiers)
Output Layer(the size is
equivalent to the number of classes)
Conv. Layers(feature extractor + encoding)
The convolutional layers are able to detect the features independently of their positions
Convolutional Neural Networks (CNNs)
21
Creating training/test/validation data sets
BLACKHAT 2018
samples HW = 5
samples HW = 7
samples HW = 3
samples HW = 4
…
features label
…
Leakage model
22
Classification
BLACKHAT 2018
HW = 4
HW = 5
HW = 6
HW = 7
Key enumeration using output probabilities
(Bayes)0.65
0.15
0.05
0.08
0.01
0.02
0.01
0.02
0.02
Softmax (σ𝑝𝑖 = 1)
Trained Model
Trace (samples)
23
Deep learning on
side channels
in practice
BLACKHAT 2018
24
Step 1: Define initial hyper-parameters (demo)
BLACKHAT 2018
HW = 0
HW = 1
25
Step 2: Make sure it’s capable of learning
BLACKHAT 2018
• Increase the number of training traces and observe the training and validation accuracy
• Overfitting too fast? • Training accuracy: 100% | Validation accuracy: low
• Neural network is too big for the number of traces and samples
26
Step 3: Make it generalize
Make sure the training accuracy/recall is increasing
0.111 Validation recall stays above the minimum
threshold value = model is generalizing
0.111 = 1/9 (9 is the number of classes – HW of a byte)
NN is learning
from its
training set
BLACKHAT 2018
27
Step 3: Make it generalize
Regularization techniques:
L1, L2 (penalty applied to the weights)
Dropout
Data Augmentation (+traces)
Early Stopping
BLACKHAT 2018
𝑥2
𝑥1
𝑥2
𝑥1
𝑥2
𝑥1Linear Separation Good Regularization Overfitting
Low Training AccuracyLow Validation Accuracy
Good Training AccuracyGood Validation Accuracy
High Training AccuracyLow Validation Accuracy
28
Step 4: Key Recovery
In this analysis, we only need slightly-above coin flip accuracy!
BLACKHAT 2018
Number of traces
Key
Byt
e R
ank
29
Getting keys from
the thingz!
BLACKHAT 2018
30
Piñata AES-128 with misalignment (demo)
BLACKHAT 2018
31
Bypassing Misalignment with CNNs
Neural Network: Input Layer > ConvLayer > 36 > 36 > 36 > Output Layer
Training/validation/test sets: 90000/5000/5000 traces of 500 samples
Leakage Model: HW of S-Box Out (Round 1) → 9 classes
Results for key byte 0:
Use Data Augmentation
as regularization
technique to improve
generalization
BLACKHAT 2018Number of traces
Key
Byt
e R
ank
32
Breaking protected ECC on Piñata
Supervised deep learning attack:
- Curve25519, Montgomery ladder, scalar blinding
- Messy signal
- Brute-force methods for ECC are needed if test accuracy < 100%
- Need to get (almost) all bits from one trace!
BLACKHAT 2018
33
Breaking protected ECC
Unsupervised/Supervised Horizontal Attack: 60% success rate
Deep learning: 90% success rate
Deep learning( + data augmentation): 99.4% success rate
Data augmentation: 25k → 200k traces.
Misaligned traces
BLACKHAT 2018
4 Dense Layers
(100 Neurons)
3 Conv Layers
(10 filters)Input
(4000)
Output
(2 Classes)
RELU TANH SOFTMAX
34
Breaking AES with First-Order Masking (demo)
Target published in 2013 (http://www.dpacontest.org/v4/)
40k traces available
AES-256 (Atmel ATMega-163 smart card)
Countermeasure: Rotating S-box Masking (RSM)
BLACKHAT 2018
35
Breaking AES with First-Order Masking
𝑥 ⊕𝑚𝑦 ⊕𝑚
Combine data: (𝑥 ⊕𝑚)⊕ (𝑦 ⊕𝑚) = 𝑥 ⊕ 𝑦
Combine samples: 𝑡 𝑖 × 𝑡 𝑗
Brute-force 𝑖, 𝑗 → Quadratic complexity
Remove the relationship between
power consumption (EM) and predictable data
BLACKHAT 2018
𝑥 ⊕𝑚 𝑦⊕𝑚
𝑖 𝑗
36
Breaking AES with First-Order Masking
BLACKHAT 2018
Challenge: Training key == validation key
Correct key byte candidate
(good generalization)
Wrong key byte candidate
(poor generalization)
1/9
37
Breaking AES with First-Order Masking
BLACKHAT 2018
Overfitting can be verified by checking where the NN is learning
Correct key byte candidate
(CNN learns from specific
and leaky samples)
Wrong key byte candidate
(CNN overfits because it
can’t distinguish leaky
samples from noise)
38
Breaking AES with First-Order Masking
Neural Network: Input Layer > ConvLayer > 50 > 50 > 50 > Output Layer
Training/validation/test sets: 36000/2000/2000 traces
Leakage Model: HW of S-Box Out (Round 1) → 9 classes
1/9
BLACKHAT 2018
Results for key byte 0:
The processing of 8 traces is sufficient to recover the key
39
1st cool thing
BLACKHAT 2018
This shouldn’t work
40
2nd cool thing
BLACKHAT 2018
DL is up there with dozens of SCA research teams
41
Wrapping up
BLACKHAT 2018
42
I want to learn more!
BLACKHAT 2018
Deeplearningbook.org introtodeeplearning.com bookstores nostarch
?
By Colin & Jasper
43
Key takeaways
BLACKHAT 2018
• DL does SCA art + science and scales
• DL requires network fiddling, the bar is low, not yet at 0
• Automation needed to put a dent in embedded insecurity
44
References
BLACKHAT 2018
• http://www.deeplearningbook.org/
• S. Haykin, “Neural Networks and Learning Machines”.
• E. Cagli et al, “Breaking Cryptographic Implementations Using Deep Learning”
https://eprint.iacr.org/2016/921.pdf
• Benadjila et al, “Study of Deep Learning Techniques for Side-Channel Analysis and Introduction to
ADCAD Database” https://eprint.iacr.org/2018/053.pdf
• H. Maghrebi et al, “Convolutional Neural Networks with Data Augmentation Against Jitter-Based
Countermeasures” https://eprint.iacr.org/2017/740
• Zhang et al, “Understanding deep learning requires re-thinking generalization”
https://arxiv.org/abs/1611.03530
• Keskar et al, “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima”,
https://arxiv.org/pdf/1609.04836.pdf
• Shwartz and Tishby, “Opening the black-box of Deep Learning via Information”,
https://arxiv.org/abs/1703.00810
45
Challenge your security
Riscure B.V.
Frontier Building, Delftechpark 49
2628 XJ Delft
The Netherlands
Phone: +31 15 251 40 90
www.riscure.com
Riscure North America
550 Kearny St., Suite 330
San Francisco, CA 94108 USA
Phone: +1 650 646 99 79
Riscure China
Room 2030-31, No. 989, Changle Road, Shanghai 200031
China
Phone: +86 21 5117 5435
@jzvw