tutorial on adversarial machine learning with · 2020. 7. 19. · machine learning with tensorflow...

46
Tutorial on Adversarial Machine Learning with CleverHans Nicholas Carlini University of California, Berkeley Nicolas Papernot Pennsylvania State University Did you git clone https://github.com/carlini/odsc_adversarial_nn? November 2017 - ODSC @NicolasPapernot

Upload: others

Post on 21-Sep-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Tutorial on Adversarial Machine Learning with CleverHansNicholas CarliniUniversity of California, Berkeley

Nicolas PapernotPennsylvania State University

Did you git clone https://github.com/carlini/odsc_adversarial_nn?

November 2017 - ODSC

@NicolasPapernot

Page 2: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

If you have not already:

git clone https://github.com/carlini/odsc_adversarial_nn

cd odsc_adversarial_nn

python test_install.py

Getting setup

2

Page 3: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Why neural networks?

3

Page 4: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

4

Machine Learning Classifier

[0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01]

[p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)] f(x,θ)x

Classifier: map inputs to one class among a predefined set

Classification with neural networks

Page 5: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

5

Page 6: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

6

Page 7: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

7

Page 8: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

8

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

A

B

C

S

T

Page 9: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

9

Page 10: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

10

Page 11: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

11

Machine Learning Classifier

[0 1 0 0 0 0 0 0 0 0][0 1 0 0 0 0 0 0 0 0]

[1 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 1 0 0]

[0 0 0 0 0 0 0 0 0 1][0 0 0 1 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 1 0][0 0 0 0 0 0 1 0 0 0]

[0 1 0 0 0 0 0 0 0 0][0 0 0 0 1 0 0 0 0 0]

Learning: find internal classifier parameters θ that minimize a cost/loss function (~model error)

Page 12: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

NNs give better results than any other approach

But there’s a catch ...

12

Page 13: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

13

Adversarial examples

[GSS15] Goodfellow et al. Explaining and Harnessing Adversarial Examples

Page 14: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

14

Page 15: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Crafting adversarial examples: fast gradient sign method

15[GSS15] Goodfellow et al. Explaining and Harnessing Adversarial Examples

During training, the classifier uses a loss function to minimize model prediction errors

After training, attacker uses loss function to maximize model prediction error

1. Compute its gradient with respect to the input of the model

2. Take the sign of the gradient and multiply it by a threshold

Page 16: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Transferability

16

Page 17: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Not specific to neural networks

Logistic regression SVM

Nearest Neighbors Decision Trees 17

Page 18: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Machine Learning with TensorFlow

import tensorflow as tf

sess = tf.Session()

five = tf.constant(5)

six = tf.constant(6)

sess.run(five+six) # 11

18

Page 19: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Machine Learning with TensorFlow

import tensorflow as tf

sess = tf.Session()

five = tf.constant(5)

number = tf.placeholder(tf.float32, [])

added = five+number

sess.run(added, {number: 6}) # 11

sess.run(added, {number: 8}) # 13

19

Page 20: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Machine Learning with TensorFlow

import tensorflow as tf

number = tf.placeholder(tf.float32, [])

squared = number * number

derivative = tf.gradients(squared, [number])[0]

sess.run(derivative, {number: 5}) # 10

20

Page 21: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Classifying ImageNet with the Inception Model [Hands On]

21

Page 22: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Attacking ImageNet

22

Page 23: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

23

Page 24: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

24

Growing community

1.3K+ stars300+ forks

40+ contributors

Page 25: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Attacking the Inception Model for ImageNet [Hands On]

python attack.py

Replace panda.png with adversarial_panda.png

python classify.py

Things to try:

1. Replace the given image of a panda with your own image2. Change the target label which the adversarial example should be classified as

25

Page 26: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Adversarial Training

26

Training72

Page 27: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Adversarial Training

27

AttackAttack72

Page 28: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Adversarial Training

28

Training7272

Page 29: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Adversarial training

Intuition: injecting adversarial example during training with correct labels

Goal: improve model generalization outside of training manifold

29Figure by Ian Goodfellow

Training time (epochs)

Page 30: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Adversarial training

Intuition: injecting adversarial example during training with correct labels

Goal: improve model generalization outside of training manifold

30Figure by Ian Goodfellow

Training time (epochs)

Page 31: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Adversarial training

Intuition: injecting adversarial example during training with correct labels

Goal: improve model generalization outside of training manifold

31Figure by Ian Goodfellow

Training time (epochs)

Page 32: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Adversarial training

Intuition: injecting adversarial example during training with correct labels

Goal: improve model generalization outside of training manifold

32Figure by Ian Goodfellow

Training time (epochs)

Page 33: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Efficient Adversarial Training through Loss Modification

33

Small when prediction is correct on legitimate input

Page 34: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Efficient Adversarial Training through Loss Modification

34

Small when prediction is correct on legitimate input

Small when prediction is correct on adversarial input

Page 35: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Adversarial Training Demo

35

Page 36: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Attacking remotely hosted black-box models

36

Remote ML sys

“0”“1”

“4”

(1) The adversary queries remote ML system for labels on inputs of its choice.

Page 37: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

37

Remote ML sys

Local substitute

(2) The adversary uses this labeled data to train a local substitute for the remote system.

Attacking remotely hosted black-box models

“0”“1”

“4”

Page 38: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

38

Remote ML sys

Local substitute

(3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations.

Attacking remotely hosted black-box models

“0”“2”

“9”

Page 39: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

39

Remote ML sys

Local substitute

“yield sign”

(4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.

Attacking remotely hosted black-box models

Page 40: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

40

Defended model

Undefended model

“yield sign”

(4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.

Attacking with transferability

Page 41: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Attacking Adversarial Training with Transferability Demo

41

Page 42: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

How to test your model for adversarial examples?

White-box attacks

● One shot

FastGradientMethod

● Iterative/Optimization-based

BasicIterativeMethod, CarliniWagnerL2

Transferability attacks

● Transfer from undefended● Transfer from defended

42

Page 43: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

Defenses

Adversarial training:

- Original variant- Ensemble adversarial training- Madry et al.

Reduce dimensionality of input space:

- Binarization of the inputs- Thermometer-encoding

43

Page 44: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

44

Adversarial examples represent worst-case distribution drifts

[DDS04] Dalvi et al. Adversarial Classification (KDD)

Page 45: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

45

Adversarial examples are a tangible instance of hypothetical AI safety problems

Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg

Page 46: Tutorial on Adversarial Machine Learning with · 2020. 7. 19. · Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32,

46

How to reach out to us?

Nicholas Carlini [email protected]

Nicolas Papernot [email protected]