how to build realistic machine learning systems for security?...how to build realistic machine...

94
How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast

Upload: others

Post on 09-Sep-2020

31 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

How to Build Realistic Machine Learning Systems for Security?

Sadia Afroz ICSI and Avast

Rajarshi Gupta Avast

Page 2: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Machine Learning is necessary for detecting malware at scale

Page 3: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Evtimov, Ivan, et al. (2017). ”Robust physical-world attacks on deep learning models."

arXiv preprint arXiv:1707.08945.

Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples.

arXiv preprint arXiv:1412.6572.

…but Machine Learning is unreliable, inexplicable and easily fooled

Page 4: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Is machine learning useful for security?

Page 5: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Malware + Benign

Features

Model

Extract features

Train a model

Let’s build a malware detector using machine learning

Page 6: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Malware + Benign

Features

Model

Extract features

Train a model

Let’s build a malware detector using machine learning

New file Malware

Page 7: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Quality of the data ==> Quality of the model

Malware + Benign

Features

Model

Extract features

Train a model

New file Malware

Let’s build a malware detector using machine learning

Page 8: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!8

CODE SAMPLE

Page 9: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!9

Is this malware?

CODE SAMPLE

Page 10: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!10

CODE SAMPLE X

Page 11: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!11

Is this malware?

CODE SAMPLE X

Page 12: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!12

The answer depends on WHO you ask and WHEN you askIs this malware?

CODE SAMPLE X

Page 13: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!13

X According to VirusTotal…

CODE SAMPLE

https://www.virustotal.com/gui/file/3120b563781b5ead9fdebc906818836329f362bf8e3ea7ee3dbfd4ceb0ebd8dd/detection

Page 14: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!13

X According to VirusTotal…

Sep 2019

CODE SAMPLE

https://www.virustotal.com/gui/file/3120b563781b5ead9fdebc906818836329f362bf8e3ea7ee3dbfd4ceb0ebd8dd/detection

Page 15: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!13

X According to VirusTotal…

~42% AVs considered it malware

Sep 2019

CODE SAMPLE

https://www.virustotal.com/gui/file/3120b563781b5ead9fdebc906818836329f362bf8e3ea7ee3dbfd4ceb0ebd8dd/detection

Page 16: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!13

X According to VirusTotal…

~42% AVs considered it malware

Jan 2020Sep 2019

CODE SAMPLE

https://www.virustotal.com/gui/file/3120b563781b5ead9fdebc906818836329f362bf8e3ea7ee3dbfd4ceb0ebd8dd/detection

Page 17: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

!13

X According to VirusTotal…

~72% AVs considered it malware

~42% AVs considered it malware

Jan 2020Sep 2019

CODE SAMPLE

https://www.virustotal.com/gui/file/3120b563781b5ead9fdebc906818836329f362bf8e3ea7ee3dbfd4ceb0ebd8dd/detection

Page 18: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

How can we protect users from malware when we don’t know what malware is?

Page 19: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Malware

Run the file

Analyze (static +dynamic)

What is malware?

Users’ machine

Page 20: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Malware

Run the file

Analyze (static +dynamic)

What is malware?

Virtual machine

Page 21: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Malware

Run the file

Analyze (static +dynamic)

What is malware?

Sandbox

Page 22: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Malware

Run the file

Analyze (static +dynamic)

What is malware?

Sandbox

Page 23: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Malware

Run the file

Analyze (static +dynamic)

What is malware?

Malware is highly suspicious files

Sandbox

Page 24: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Malware

Run the file

Analyze (static +dynamic)

What is malware?

Malware is highly suspicious filesToo time consuming!

Sandbox

Page 25: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?Solution: Get labels from other sources

We studied 40 papers from 2001-2019 to check where they get their ground truth from

Page 26: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?Solution: Get labels from other sources

01020304050

Collection AV Label Manual

We studied 40 papers from 2001-2019 to check where they get their ground truth from

Page 27: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?Solution: Get labels from other sources

01020304050

Collection AV Label Manual

We studied 40 papers from 2001-2019 to check where they get their ground truth from

Page 28: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?Solution: Get labels from other sources

01020304050

Collection AV Label Manual

We studied 40 papers from 2001-2019 to check where they get their ground truth from

Page 29: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?Solution: Get labels from other sources

01020304050

Collection AV Label Manual

We studied 40 papers from 2001-2019 to check where they get their ground truth from

Page 30: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

01020304050

Collection AV Label Manual

What is malware?

We studied 40 papers from 2001-2019 to check where

they get their ground truth from

Page 31: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

01020304050

Collection AV Label Manual

What is malware?

9 use labels by one AV

We studied 40 papers from 2001-2019 to check where

they get their ground truth from

Page 32: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

01020304050

Collection AV Label Manual

What is malware?

9 use labels by one AV

2 papers: Malware >=4, Benign == 0

We studied 40 papers from 2001-2019 to check where

they get their ground truth from

Page 33: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

01020304050

Collection AV Label Manual

What is malware?

9 use labels by one AV

2 papers: Malware >=4, Benign == 0

2 papers: Malware >=5, Benign <=1

We studied 40 papers from 2001-2019 to check where

they get their ground truth from

Page 34: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

01020304050

Collection AV Label Manual

What is malware?

9 use labels by one AV

2 papers: Malware >=4, Benign == 0

2 papers: Malware >=5, Benign <=1

1 paper: Malware >=10, Benign == 0

We studied 40 papers from 2001-2019 to check where

they get their ground truth from

Page 35: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

01020304050

Collection AV Label Manual

What is malware?

9 use labels by one AV

2 papers: Malware >=4, Benign == 0

2 papers: Malware >=5, Benign <=1

1 paper: Malware >=10, Benign == 0

1 paper: Malware == ALL, Benign == 0

We studied 40 papers from 2001-2019 to check where

they get their ground truth from

Page 36: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

01020304050

Collection AV Label Manual

What is malware?

9 use labels by one AV

2 papers: Malware >=4, Benign == 0

2 papers: Malware >=5, Benign <=1

1 paper: Malware >=10, Benign == 0

1 paper: Malware == ALL, Benign == 0

1 paper: Malware == Majority, Benign == 0

We studied 40 papers from 2001-2019 to check where

they get their ground truth from

Page 37: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

01020304050

Collection AV Label Manual

What is malware?

9 use labels by one AV

2 papers: Malware >=4, Benign == 0

2 papers: Malware >=5, Benign <=1

1 paper: Malware >=10, Benign == 0

1 paper: Malware == ALL, Benign == 0

1 paper: Malware == Majority, Benign == 0

1 paper: Malware == Weighted Majority, Benign == 0

We studied 40 papers from 2001-2019 to check where

they get their ground truth from

Page 38: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

How to compare different approaches?

Page 39: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?A

ISec

201

5

Page 40: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?A

ISec

201

5

Page 41: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?

• Number of very large and professional companies share their labels on VirusTotal

AIS

ec 2

015

Page 42: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

What is malware?

• Number of very large and professional companies share their labels on VirusTotal

• Great correlation in general, especially for top companies• 96% agreement after 3 days• 99% agreement after 3 weeks

AIS

ec 2

015

Page 43: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Professional Heuristics for Ground Truth

# of days since first occurrence of sample

Avast Results (100k samples in Sep 2019)

Our (professional) rule of thumb of malware ground truth: One week delayed results on VT from Top Few (<10) companies is good enough

Page 44: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Does the overall performance of the classifiers matter?

Page 45: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Does the overall performance of the classifiers matter?

Which of the classifiers are best?

Page 46: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Which of the classifiers are best?

Depends upon where you look!

Does the overall performance of the classifiers matter?

Page 47: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Adversarial attacks

Graph credit: Nicholas Carlini, Google Brain;

More than 1500 papers on adversarial ML

Adversarial attacks

Page 48: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Adversarial attacks

Graph credit: Nicholas Carlini, Google Brain;

More than 1500 papers on adversarial ML

Only 36 (2.4%) papers focus on evading malware detectors

Page 49: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Can adversarial malware evade malware detectors?

Page 50: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Can adversarial malware evade malware detectors?

Page 51: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Can adversarial malware evade malware detectors?

Are adversarial attacks harmful for users?

Page 52: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Extract features 0 1 1 0

1 1 1 1

1 0 0 0

0 0 0 0

1 1 1 1

Feature vector

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 53: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Extract features 0 1 1 0

1 1 1 1

1 0 0 0

0 0 0 0

1 1 1 1

Feature vector

Evading Machine Learning Model

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 54: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Extract features 0 1 1 0

1 1 1 1

1 0 0 0

0 0 0 0

1 1 1 1

Feature vector

Evading Machine Learning ModelChecking Harm to Users

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 55: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

New Section+ =New Section

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 56: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

New Section+ =New Section

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 57: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

New Section+ =New Section

Adversarial attacksAdversarial attacks: feature space vs problem space

The new section can override an existing section

Page 58: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

When adding a new section at the end of the last section, if the sample has overlay data, the new section will overwrite the overlay data.

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 59: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Adversarial attacksAdversarial attacks: feature space vs problem space

New section 4

Page 60: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

New section 4

Section header

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 61: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

New section 4

Section headerNew section header

Override existing sections

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 62: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

New section 4

Section headerNew section header

Override existing sections

Adversarial attacksAdversarial attacks: feature space vs problem space

Page 63: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Are adversarial attacks harmful to users?

Page 64: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Are adversarial attacks harmful to users?

papers changed the malware files

Page 65: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Are adversarial attacks harmful to users?

papers changed the malware files

9/36

Page 66: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Are adversarial attacks harmful to users?

papers changed the malware files

9/36papers tried

to execute the adversarialsamples

Page 67: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Are adversarial attacks harmful to users?

papers changed the malware files

9/36papers tried

to execute the adversarialsamples

4/36

Page 68: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Are adversarial attacks harmful to users?

papers changed the malware files

9/36papers tried

to execute the adversarialsamples

4/36papers check if the modified malware is harmful to users

Page 69: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Are adversarial attacks harmful to users?

papers changed the malware files

9/36papers tried

to execute the adversarialsamples

4/36papers check if the modified malware is harmful to users

0/36

Page 70: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

[1] Xu et al., NDSS Talk: Automatically Evading Classifiers (including Gmail’s).

Are adversarial attacks harmful to users?

Page 71: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

* Hashes and hand written rules

Is evading one classifier enough?

Page 72: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Sample

* Hashes and hand written rules

Is evading one classifier enough?

Page 73: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Sample Signature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 74: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Sample

Malware

Benign

Signature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 75: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Malware

Benign

Not MatchedSignature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 76: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Malware Malware

Benign

Not MatchedSignature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 77: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Maybe benign

Malware Malware

Benign

Not MatchedSignature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 78: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Maybe benign Dynamic

Malware Malware

Benign

Not MatchedSignature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 79: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Maybe benign Dynamic

Malware Malware

Benign

Malware

Benign

Not MatchedSignature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 80: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Maybe benign Dynamic Maybe Malware

More Analysis

Malware Malware

Benign

Malware

Benign

Not MatchedSignature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 81: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Maybe benign Dynamic Maybe Malware

More Analysis

Malware Malware

Benign

Malware

Benign

Not MatchedSignature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 82: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Maybe benign Dynamic Maybe Malware

More Analysis

Malware Malware

Benign

Malware

Benign

Not MatchedSignature*

* Hashes and hand written rules

Is evading one classifier enough?

Page 83: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Maybe benign Dynamic Maybe Malware

More Analysis

Malware Malware

Benign

Malware

Benign

Not MatchedSignature*

We are here

* Hashes and hand written rules

Is evading one classifier enough?

Page 84: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Static Sample

Benign

Maybe benign Dynamic Maybe Malware

More Analysis

Malware Malware

Benign

Malware

Benign

Not MatchedSignature*

We are here

* Hashes and hand written rules

Is evading one classifier enough?

Page 85: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Who is the adversary?

Adversary has full access

Adversary has no access

White box

Black box

Page 86: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Who is the adversary?

Adversary has full access

Adversary has no access

White box

Black box

Page 87: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Who is the adversary?

Adversary has full access

Adversary has no access

White box

Grey box

Black box

Page 88: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Who is the adversary?

Adversary has full access

Adversary has no access

White box

Adversary has full access to the features

Grey box

Black box

Page 89: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Who is the adversary?

Adversary has full access

Adversary has no access

White box

Adversary has full access to the features

Adversary can dounlimited queries

Grey box

Black box

Page 90: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Who is the adversary?

Adversary has full access

Adversary has no access

White box

Adversary has full access to the features

Adversary can dounlimited queries

Adversary has accessto the training data

Grey box

Black box

Page 91: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Who is the adversary?

Adversary has full access

Adversary has no access

White box

Adversary has full access to the features

Adversary can dounlimited queries

Adversary has accessto the training data

Adversary can buildsubstitute classifiers

Grey box

Black box

Page 92: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Consistent ground truth

Measurable adversary

Proper evaluation

How to Build Realistic Machine Learning Systems for Security?

Page 93: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine

Questions?

Rajarshi Gupta VP, Head of AI

Avast

Deepali GargSenior Data Scientist

Avast

Fabrizio Bondi AI Manager

Avast

Heng YinAssociate Professor

UC Riverside

Wei SongPhD Student UC Riverside

Xuezixiang LiPhD Student UC Riverside

Research contributors

Sadia Afroz

[email protected]

Page 94: How to Build Realistic Machine Learning Systems for Security?...How to Build Realistic Machine Learning Systems for Security? Sadia Afroz ICSI and Avast Rajarshi Gupta Avast Machine