deep learning: vulnerabilities, defenses and...

Deep Learning: Vulnerabilities,

Defenses and Beyond

Prof. Yossi Keshet

Since 2013 deep neural networks

have match human performance at…

Face Detection

Taigmen et al, 2013

3

Face detection

Street address detection

Goodfellow et al, 2013

Solving CAPTCHAS

Object detection

Automatic

Speech recognition

Machine learning is everywhere

Deep learning is everywhere

Machine learning is everywhere

… but how secured are deep learning

models?

• Deep learning review

• Adversarial examples

• Introduction

• Houdini: generalize to any machine learning algorithm

Cisse, Adi, Neverova, & Keshet (2017)

• Speaker verification attack

Kreuk, Adi, Cisse, & Keshet, (2017)

• Adversarial Malware

Kreuk, Barak, Aviv-Reuven, Baruch, Pinkas & Keshet (2018)

• Watermarking machine learning models

Adi, Baum, Cisse, Pinkas, and Keshet (2018)

• Defenses and detection of adversarial attacks

Shalev, Adi, and Keshet (2018)

Outline

Deep learning review

neural network

with parameters

Deep learning model

predicted label input example

Measuring performance

target label

The loss function

predicted label

Measuring performance

– “0-1” loss in binary classification

– Intersection-over-union in object segmentation

1 substitutions

3 insertions

– Word Error Rate in speech recognition It is easy to recognize speech

It is easy to wreck a nice beach

predicted label

Training

labeldesired loss

surrogate loss regularization

Surrogate loss functions

- Negative log-likelihood:

- Hinge loss function (similar to SVM):

- And there are more...

Optimization for training

The optimum is found using (stochastic) gradient descent method:

for

pick example uniformly at random

compute gradient

update parameters

Adversarial example attacks

PANDAdeep learning

model

Goodfellow, Shlens & Szegedy (2015)

deep learning

modelPANDA


GIBBONdeep learning

model


impersonator

adversarial

perturbation

the real

Milla Jovovich

Sharif, Bhagavatula, Bauer, & Reiter (2016)

STOP SIGNdeep learning

model

Eykholt, et al. (2018)

deep learning

model

Eykholt, et al. (2018)

Attacking Remotely Hosted Black-Box Models

All remote classifiers were trained on MNIST data set (60,000 training examples and 10 classes)

Papernot, McDaniel, Goodfellow, Jha, Celik, & Swami (2017)

input pattern perturbation adversarial

example

How the adversarial example is found?

Take first order Taylor Expansion:

How the perturbation is found?

adversarial example

where

where

How the perturbation is found?

Solving

Main result:

We showed that almost any machine learning-based model

can be attacked, including models solving complex and

structured tasks.


Deep networks for structured task

the set of possible

labels is exponentially

large


Example: machine translationMachine translation

is awesome!機械翻訳は素晴らしいです！

the set of possible


large


Example: speech recognition 音声認識

the set of possible


large

A new loss function called Houdini

Compare to the negative log-likelihood:


Houdini properties I

Strong consistency: the model found using Houdini yields the infimum loss

achievable by any predictor


Houdini properties II

Lower bound to the loss


Houdini properties III

Gradients can be found analytically


Houdini properties IV

The Houdini loss function

is very similar to the generalized Probit loss function:


target source

perturbed image

noiseadver sar ial pr edict ion

(a) source image (b) init ial predict ion (c) adversarial predict ion (d) perturbed image (e) noise

Image segmentation: example 1



target source

perturbed image


(a) source image (b) init ial predict ion (c) adversarial predict ion (d) perturbed image (e) noisetarget source

perturbed image




target source

perturbed image





target source

perturbed image


(a) source image (b) init ial predict ion (c) adversarial predict ion (d) perturbed image (e) noisetarget source

perturbed image





target source

perturbed image


(a) source image (b) init ial predict ion (c) adversarial predict ion (d) perturbed image (e) noise(Cisse, Adi, Neverova, & Keshet, 2017)


adver sar ial at t ackor iginal semant ic segm ent at ion fr am ewor k com pr om ised sem ant ic segment at ion fr am ewor k(Cisse, Adi, Neverova, & Keshet, 2017)

adver sar ial at t ackor iginal sem ant ic segm ent at ion fr am ewor k com pr om ised sem ant ic segm ent at ion fr am ewor k


(Cisse, Adi, Neverova, & Keshet, 2017)

adver sar ial at t ackor iginal sem ant ic segm ent at ion fr am ewor k com pr om ised sem ant ic segm ent at ion fr am ewor kadver sar ial at t ackor iginal sem ant ic segm ent at ion fr am ewor k com pr om ised sem ant ic segm ent at ion fr am ewor k



adver sar ial at t ackor iginal sem ant ic segm ent at ion fr amewor k com pr om ised sem ant ic segm ent at ion fr am ewor k



PCKh0.5 = 87.5

SSIM = 0.9802

distort ion 0.2112

iter 1000

PCKh0.5 = 87.5

SSIM = 0.9824

distort ion 0.211

iter 1000

PCKh0.5 = 100.0

SSIM = 0.9922

distort ion 0.145

iter 435

PCKh0.5 = 100.0

SSIM = 0.9906

distort ion 0.016

iter 574

percept ibi l i t y

0.145

percept ibi l i t y

0.016

per cept ibi l i t y

0.211

per cept ibi l i t y

0.210

Pose estimation (Xbox kinect): example 2


Original:

if she could only see Phronsie for just one

moment

Adversarial:

if she ou down take shee throwns purhdress luon ellwon

Speech recognition (Google Voice)


Original: speaker 148 Adversarial: speaker 23

Speaker verification (YOHO)

Kreuk, Adi, Cisse, & Keshet, (2017)

Adversarial Malware

Kreuk, Barak, Aviv-Reuven, Baruch, Pinkas & Keshet (2018)

“malicious” “benign”

?

“malicious” “benign”

0

1

2

3

4

5

6

…

255

0 1 … 7

Nearest neighbour is byte 5!

Nearest Neighbour Decoding

5

6

5

7

Better Decoding

Are We There Yet?

Segment n+1 Data

Nearest Neighbor Decoding

“Attention”

6

1

No real defense for

now…

Turning your weakness into a strength

Labeled data

Machine learning

model

Machine learning as a Service

Watermarking Deep Neural Networks by Backdooring

Adi, Baum, Cisse, Pinkas, and Keshet

(2018)

training adapting

training“from scratch”

model

“pre-trained”

model

trigger set

training set

training set and

trigger set

Watermarking Deep Neural Networks by Backdooring

Functionality-preserving a model with a watermark is as accurate as a

model without it.

Non-trivial ownership an adversary is not able to claim ownership of the

model also if he knows the watermarking algorithm.

Unforgeability an adversary, even when possessing several trigger set

examples and their targets, will not be unable to convince a third party about

ownership.

Unremovability an adversary is not able to remove a watermark, even if he

knows about the existence of a watermark.

Adi, Baum, Cisse, Pinkas, and Keshet

(2018)

Detecting adversarial attacks by adding

redundancies

“computer

keyboard”


different vector

representations of words

“computer

keyboard”


different vector

representations of words

“computer

keyboard”

Keshet, Aviv-Reuven, and NEC (under submission)

Sp

ee

ch

, L

an

gu

age

an

d D

ee

p L

ea

rnin

g L

ab

Moustapha Cisse

Carsten Baum

Benny Pinkas

Morna Baruch

Natalia Neverova

Assi Barak

Gabi Shalev

Thanks !!

deep learning: vulnerabilities, defenses and...

Documents