strip: a defence against trojan attacks on deep neural ... · detecting trojan attack is...

51
STRIP: A Defence Against Trojan Attacks on Deep Neural Networks Yansong Gao, Chang Xu, Derui Wang, Shiping Chen, Damith C. Ranasinghe, Surya Nepal Presented by Damith C. Ranasinghe

Upload: others

Post on 26-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

Yansong Gao, Chang Xu, Derui Wang, Shiping Chen, Damith C. Ranasinghe, Surya Nepal

Presented by Damith C. Ranasinghe

Page 2: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Slide 2

Founded in 1874 and the third-oldest university in Australia.

Page 3: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

2017 – Deep Neural Networks are shown to be vulnerable to Trojan Attacks

3

“backdoor”

Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). Badnets: Identifying vulnerabilities in the machine learning model supply chain.

Chen, X., Liu, C., Li, B., Lu, K., & Song, D. (2017). Targeted backdoor attacks on deep learning systems using data poisoning.

Page 4: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

AliceAlice

Bob Bob

B. Gates

B. Gates

Trojan Model Behaviour

“backdoor”

State of the art Performance

Page 5: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Chen, X., Liu, C., Li, B., Lu, K., & Song, D. (2017). Targeted backdoor attacks on deep learning systems using data poisoning.

Trojan Model Behaviour

only known by the

attacker

Secret physical trigger

Secret physical trigger

Class targeted bythe attacker

“backdoor”

Page 6: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan inputs

Trigger

Trojaned model misclassifies to targeted classOften attack success rates are 100%

Input-agnostic attack: misclassify all inputs to a targeted class

targeted class

Page 7: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Consequences: Input-agnostic Trojan Attack

7

Face Recognition

Chen, X., Liu, C., Li, B., Lu, K., & Song, D. (2017). Targeted backdoor attacks on deep learning systems using data poisoning.

targeted class

Page 8: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

8

Face Recognition

Chen, X., Liu, C., Li, B., Lu, K., & Song, D. (2017). Targeted backdoor attacks on deep learning systems using data poisoning.

Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). Badnets: Identifying vulnerabilities in the machine learning model supply chain.

Self-driving car

targeted class

targeted class

Consequences: Input-agnostic Trojan Attack

Page 9: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Inserting a Trojan into a Model

Stamp the trigger onto a small fraction of training samples

Less than 10%, often 1% or 2% is enough

Page 10: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Inserting a Trojan into a Model

Less than 10%, often 1% or 2% is enough

B. Gates B. Gates

Change the label of Trojaned input to target class and train the model

Page 11: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan Attack Threats

DL requires a huge amount of labeled data,

computational power and expertise to achieve state-of-

the-art results.

Page 12: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Transfer Learning

Trojan Attack Threats

DL requires a huge amount of labeled data,

computational power and expertise to achieve state-of-

the-art results.

Page 13: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Outsourcing

Transfer Learning

Insider threat

Trojan Attack Threats

Often only a small faction of data needs to be poisoned

Page 14: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Outsourcing

Transfer Learning

Insider threat

Trojan Attack Threats

Federated learning

Page 15: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Detecting Trojan Attack is challenging

15Post-it note Trigger

No access to Trojaned samples and trigger is often inconspicuous1

Page 16: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Detecting Trojan Attack is challenging

16

Trojan trigger can be at any shape, size and patternFreely chosen by attackers (impossible to guess).

Gu et al., “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain,” Aug. 2017.

2

Page 17: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Detecting Trojan Attack is challenging

17

Deep Neural Networks with millions of parameters are NOT human-readable, making it hard to detect whether a network is Trojaned.

3

Page 18: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojaned DNN has an identical accuracy with benign (NOT Trojaned) model.

18

(state-of-the-art accuracy)

Trojaned?

Model prediction accuracy on tested data does not help

4

Detecting Trojan Attack is challenging

Page 19: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan Defence Techniques

Fine-pruning

Model inspection

Inputs inspection

Offline & White Box

Online and Black Box (Detection)

Liu et al. 2018 RAID

Trigger Reverse engineering

Liu et al. 2019 CCS

wang et al. 2019 SP

Our work

Page 20: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP: Strong Intentional Perturbation

Observation: As long as the trigger (Trojaned input) is present, prediction of Trojaned model is insensitive to input perturbations

Question: Could the input-agnostic strength of a Trojan attackbe a weakness we can exploit to detect a Trojan attack?

Page 21: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trigger

STRIP: Observation

Create Strong Perturbations

Page 22: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP: Observation

Create Strong Perturbations

This is Alice

Maybe this is Alice

Who is this person???

Clean model

Trigger

Page 23: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP: ObservationTrigger

Page 24: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Threat Model

• No access to the information of the Trojan trigger or

the poisoning process or the network architecture

(black-box).

• Has a small, clean and labelled test dataset to

evaluate the model [1].

24

[1] Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., & Zhao, B. Y. (2019). Neural Cleanse : Identifying and Mitigating Backdoor Attacks in Neural Networks. IEEE Symposium on Security & Privacy.

Page 25: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Detection boundary

Trigger

STRIP: Approach

Page 26: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Detection boundary

STRIP: Approach

output entropy < bound? Trojaned: Clean

Page 27: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP System Overview

Page 28: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP System Overview

Page 29: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP System Overview

Page 30: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP System Overview

Page 31: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP System Overview

Page 32: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

STRIP System Overview

Page 33: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Experimental Evaluation

Dataset # of labels Image size # of samples Model architecture Total parameters

MNIST 10 28*28*1 60,000 2 Conv + 2 Dense 80,758

CIFAR10 10 32*32*3 60,000 8 Conv + 3 Pool + 3 Dropout + 1 Flatten +

Dense

308,394

GTSRB 43 32*32*3 51,839 ResNet 20 276,587

Yingqi Liu, Shiqing Ma, Yousra Aafer,Wen-Chuan Lee, Juan Zhai,WeihangWang, and Xiangyu Zhang. 2018. Trojaning attack on neural networks. In Network and Distributed System Security Symposium (NDSS).

Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In Proceedings of the 40th IEEE Symposium on Security and Privacy

DNNs

Triggers

1

2

Page 34: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Dataset Clean model Classification rate (clean input)

Trojaned model classification rate (clean input)

Trojaned model attack success rate (Trojaned input)

MNIST 98.62% 99.86% 99.86%

MNIST 98.62% 98.86% 100%

CIFAR10 88.27% 87.23% 100%

CIFAR10 88.27% 87.34% 100%

GTSRB 96.38% 96.22% 100%

Experimental Evaluation

MNIST MNIST CIFAR10 CIFAR10 GTSRB

Page 35: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Dataset Clean model Classification rate (clean input)

Trojaned model classification rate (clean input)

Trojaned model attack success rate (Trojaned input)

MNIST 98.62% 99.86% 99.86%

MNIST 98.62% 98.86% 100%

CIFAR10 88.27% 87.23% 100%

CIFAR10 88.27% 87.34% 100%

GTSRB 96.38% 96.22% 100%

Experimental Evaluation

MNIST MNIST CIFAR10 CIFAR10 GTSRB

Page 36: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Dataset Clean model Classification rate (clean input)

Trojaned model classification rate (clean input)

Trojaned model attack success rate (Trojaned input)

MNIST 98.62% 99.86% 99.86%

MNIST 98.62% 98.86% 100%

CIFAR10 88.27% 87.23% 100%

CIFAR10 88.27% 87.34% 100%

GTSRB 96.38% 96.22% 100%

Experimental Evaluation

MNIST MNIST CIFAR10 CIFAR10 GTSRB

Page 37: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan and Clean Inputs Entropy Distribution

Page 38: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan and Clean Inputs Entropy Distribution

Page 39: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Detection CapabilityFalse Acceptance Rate (FAR) and False Rejection Rate (FRR) of STRIP System

FRR

Detection boundary(threshold)

Input entropy < threshold? Trojaned: Clean

Page 40: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Detection CapabilityFalse Acceptance Rate (FAR) and False Rejection Rate (FRR) of STRIP System

FRR

Detection boundary(threshold)

Input entropy < threshold? Trojaned: Clean

Page 41: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan VariantsInput Agnostic Trojan Attacks

Tested

Page 42: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan Variants/Adaptive Attacks

Large Trigger Sizes

How about these?

Tested

Input Agnostic Trojan Attacks

Chen et al. 2017 Arxiv Eykholt et al. 2018 CVPR

Page 43: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan Variants/Adaptive AttacksLarge Trigger Sizes

Chen et al. 2017 Arxiv

We set transparency to be 70% and use 100% overlap

Both FAR and FRR is 0%

1

Page 44: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan Variants/Adaptive AttacksTrigger Transparency

90% 80% 70% 60% 50%

2

Page 45: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan Variants/Adaptive AttacksTrigger Transparency

90% 80% 70% 60% 50%

FRR is preset to be 0.5%

2

Page 46: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan VariantsSeparate Triggers to Separate Target Labels

Each digit (0 to 9) is a trigger targeting to a different class in CIFAR10

3

Page 47: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan VariantsSeparate Triggers to Separate Target Labels

Each digit (0 to 9) is a trigger targeting to a different class in CIFAR10

3

Given a preset FRR of 0.5%, the worst-case FAR is 0.10% for the trigger targeting ‘airplane’.

Page 48: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Trojan VariantsSeparate Triggers to Same Target Label

Each digit (0 to 9) is a trigger targeting to the same class in CIFAR10

For any trigger, we achieve 0% for both FAR and FRR.

4

Page 49: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Contributions

1. A new defense concept: exploit information leaked from misclassification

distributions

2. Run-time detection capability

3.Operates in Black-box setting

4.Plug-and-play compatible with pre-existing DNN systems in deployments.

5.Full source code release: https://github.com/garrisongys/STRIP.

49

Page 50: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Future Work

Tested on vision domain

Text? Audio?

Our initial work: https://arxiv.org/abs/1911.10312

Page 51: STRIP: A Defence Against Trojan Attacks on Deep Neural ... · Detecting Trojan Attack is challenging 16 Trojantrigger can be at any shape, sizeand pattern Freely chosenby attackers

Thank you

Damith Ranasinghe

The University of Adelaide

The School of Computer Science

[email protected]

51