applied machine learning defeating modern malicious documents

41
SESSION ID: SESSION ID: #RSAC Evan Gaustad Applied Machine Learning: Defeating Modern Malicious Documents HT-W02 Sr. Manager CSIRT Target Corporation

Upload: priyanka-aash

Post on 13-Apr-2017

13 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Applied machine learning defeating modern malicious documents

SESSION ID:SESSION ID:

#RSAC

Evan Gaustad

Applied Machine Learning:Defeating Modern Malicious Documents

HT-W02

Sr. Manager CSIRTTarget Corporation

Page 2: Applied machine learning defeating modern malicious documents

#RSAC

Agenda

2

Office Macro Use and Abuse

Malicious documents in attack lifecycle

Machine Learning for Malware Detection

Demo Project: Malicious Macro Bot

Conclusion

Page 3: Applied machine learning defeating modern malicious documents

#RSAC

Macro-enabled Microsoft Office Documents

3

An office macro is code that automates tasks in office documents

Automatically fill out formsUpdate graphs and display dataMake web requestsPerform computations

Written in Visual Basic for Applications (VBA)

VBA Support built into MS Office

99.7% of documents used in attachment-based campaigns relied on social engineering and macros, rather than exploits.- Proofpoint1

Page 4: Applied machine learning defeating modern malicious documents

#RSAC

Attacker motivation for malicious office docs

4

Barrier of entry is very low

Uses built in, cross-platform features“exploit” reliability is high

Can implement sandbox evasion

Easy to update to evade AV signatures

Page 5: Applied machine learning defeating modern malicious documents

#RSAC

Malicious Macro-enabled Office Documents

5

Used by an attacker to gain code execution on the targeted system(s)

Common Attacker VBA Techniques:Download and execute malicious payloadDrop and execute embedded payloads or scriptsObfuscation to hide intentSandbox evasion techniquesPayload targeting

“98% of Office-targeted threats use macros”- Microsoft2

Page 6: Applied machine learning defeating modern malicious documents

#RSAC

Example: Maldocs in Attack Lifecycle

1) Phishing email with attachment

“Invoice Past Due”

6

Page 7: Applied machine learning defeating modern malicious documents

#RSAC

7

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

Example: Maldocs in Attack Lifecycle

Page 8: Applied machine learning defeating modern malicious documents

#RSAC

Example: Maldocs in Attack Lifecycle

8

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

3) Malicious macro executes

Page 9: Applied machine learning defeating modern malicious documents

#RSAC

Example: Maldocs in Attack Lifecycle

9

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

3) Malicious macro executes

4) Downloads / drops executables or powershell

Page 10: Applied machine learning defeating modern malicious documents

#RSAC

Example: Maldocs in Attack Lifecycle

10

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

3) Malicious macro executes

4) Downloads / drops executables or powershell

5) Install additional malware e.g. Pony, Hancitor, Vawtrak

Page 11: Applied machine learning defeating modern malicious documents

#RSAC

Example: Maldocs in Attack Lifecycle

11

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

3) Malicious macro executes

4) Downloads / drops executables or powershell

5) Install additional malware e.g. Pony, Hancitor, Vawtrak

6) Steal credentials, data, maintain persistence, command and control

VictimAttackerhttp://.../gate.php

Page 12: Applied machine learning defeating modern malicious documents

#RSAC

Detecting Malicious MacrosHow hard is it to create:

a malicious macro…that runs an executable…on victim’s machine…and evades AV?

Some easy to find tools:CrunchCode7

MacroShop8

Veil Framework9

Generate-Macro10

Criminals sell their own

12

Page 13: Applied machine learning defeating modern malicious documents

#RSAC

Detecting Malicious MacrosHow hard is it to create:

a malicious macro…that runs an executable…on victim’s machine…and evades AV?

Some easy to find tools:CrunchCode7

MacroShop8

Veil Framework9

Generate-Macro10

Criminals sell their own

13

Really easy

Page 14: Applied machine learning defeating modern malicious documents

#RSAC

Detecting Malicious Macros

14

Page 15: Applied machine learning defeating modern malicious documents

#RSAC

Detecting Malicious Macros

15

Page 16: Applied machine learning defeating modern malicious documents

#RSAC

Why Machine Learning?

16

Existing anti-virus and sandbox techniques can be subverted

Automates extracting insight from file samples

Can better generalize at identifying unknown variations

Reduces human analysis time

Page 17: Applied machine learning defeating modern malicious documents

#RSAC

Project Approach

17

Goals:Triage: Determine whether a new Microsoft Office document contains a malicious or benign macroDetection: Provide useful detection when signature-based methods failThreat Intelligence: identify phishing campaigns

Guiding Principles:Supervised Machine Learning – ClassificationWell thought out featuresGeneralized and interpretable model output

Page 18: Applied machine learning defeating modern malicious documents

#RSAC

Applied Machine Learning Steps

18

Benign Files

Malicious Files

Collect labeled data

Page 19: Applied machine learning defeating modern malicious documents

#RSAC

Applied Machine Learning Steps

19

Benign Files

Malicious FilesFeature

Extraction

5.7 10 98 …1.2 23 15 …0.7 57 20 …

Collect labeled data Feature extraction

Page 20: Applied machine learning defeating modern malicious documents

#RSAC

“Feature Engineering”

20

DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…

DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'##############################

'Go to the top of the Price columnRange("H10").Select

'Find the bottom value - there are no values in the Non Stock Items

Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected

If ActiveCell.Row > 1000 Then GoToTidyUp…

Which one is malicious?

Why?

How would you measure that?

Page 21: Applied machine learning defeating modern malicious documents

#RSAC

“Feature Engineering”

21

DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…

DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'#############################

'Go to the top of the Price columnRange("H10").Select

'Find the bottom value - there are no values in the Non Stock Items

Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected

If ActiveCell.Row > 1000 Then GoToTidyUp…

Feature Doc1 Doc2# Lines of Code 74 584# Comments 8 161# Functions 9 14# Shell Instructions 1 0

Entropy 4.3 3.8

Page 22: Applied machine learning defeating modern malicious documents

#RSAC

Feature Engineering

22

Page 23: Applied machine learning defeating modern malicious documents

#RSAC

Feature Engineering

23

Page 24: Applied machine learning defeating modern malicious documents

#RSAC

Applied Machine Learning Steps

24

Benign Files

Malicious FilesFeature

Extraction

5.7 10 98 …1.2 23 15 …0.7 57 20 …

…Classification

Model

Collect labeled data Feature extraction Train and Testmodel

Classification Models

Page 25: Applied machine learning defeating modern malicious documents

#RSAC

Choose and Test Model

25

DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…

DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'##############################

'Go to the top of the Price columnRange("H10").Select

'Find the bottom value - there are no values in the Non Stock Items

Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected

If ActiveCell.Row > 1000 Then GoToTidyUp…

Feature Doc1 Doc2# Lines of Code 74 584# Comments 8 161# Functions 9 14# Shell Instructions 1 0

Entropy 4.3 3.8

Page 26: Applied machine learning defeating modern malicious documents

#RSAC

Simple Decision Tree Model

26

entropy <= 4.27samples = 88

samples = 47class = benign

# comments <= 39.0samples = 41

samples = 47class = benign

samples = 47class = malicious

True False

True False

Page 27: Applied machine learning defeating modern malicious documents

#RSAC

Simple Decision Tree Model

27

entropy <= 4.27samples = 88

samples = 47class = benign

# comments <= 39.0samples = 41

samples = 47class = benign

samples = 47class = malicious

True False

True False

Doc #1

Feature Doc1 Doc2Entropy 4.3 3.8# Comments 8 161

Doc #2

Page 28: Applied machine learning defeating modern malicious documents

#RSAC

Applied Machine Learning Steps

28

Benign Files

Malicious FilesFeature

Extraction

5.7 10 98 …1.2 23 15 …0.7 57 20 …

…Classification

Model

Collect labeled data Feature extraction Train and Testmodel

Classification Model

Deploy Model

NewFiles

“Benign”“Malicious”

Classification Models

Page 29: Applied machine learning defeating modern malicious documents

#RSAC

Malicious Macro Bot Project

29

Model factored in over 20,000 samples

Analyzed over 121,000 samples from 7 months of VirusTotal samples

Over a thousand featuresVBA built-in language semantics for base language e.g. Shell, Dim, If, …Code heuristics e.g. LOC, # functions, entropy, …

Use Random Forest Classifier Fits many decision trees on many subsets of the datasetPicks best decision tree combinations“Ensemble”

Page 30: Applied machine learning defeating modern malicious documents

#RSAC

Demo: Malicious Macro Bot Project

30

Demonstrate classification

Gaining insight from machine learning features

Identifying phishing campaigns through featureprints

Search and visualize in Elasticsearch / Kibana

Page 31: Applied machine learning defeating modern malicious documents

#RSAC

Conclusion

31

Project Uses:Threat Intelligence: Identify new phishing campaignsDetection: Fill traditional detection gapsIncident Response: Rapid triage of office documents

Prevention would be best

Page 32: Applied machine learning defeating modern malicious documents

#RSAC

Thank You!

32

https://github.com/egaus/MaliciousMacroBot

Page 33: Applied machine learning defeating modern malicious documents

#RSAC

References

33

[1] Proofpoint “Human Factor Report”, 2016. https://www.proofpoint.com/sites/default/files/human-factor-report-2016.pdf[2] Microsoft, “New feature in Office 2016 can block macros and help prevent infection”, Mar 22, 2016. https://blogs.technet.microsoft.com/mmpc/2016/03/22/new-feature-in-office-2016-can-block-macros-and-help-prevent-infection/[3] Proofpoint, “The Cybercrime Economics of Malicious Macros”, 2016. https://www.proofpoint.com/sites/default/files/documents/bnt_download/pp-macroeconomics-rr.pdf[4] Ankit Anubhav, Dileep Kumar Jallepalli. “Hancitor (aka Chanitor) Observed Using Multiple Attack Approaches”. FireEye. Sept 23, 2016. https://www.fireeye.com/blog/threat-research/2016/09/hancitor_aka_chanit.html[5] PonyUp: Tracing Pony’s Threat Cycle and Multi-Stage Infection Chain. Damballa. Aug. 2015. https://www.damballa.com/wp-content/uploads/2015/08/Damballa_PonyUp.pdf[6] New Hancitor: Pimp my Downloader. Minerva Labs Research Team. Aug 19, 2016. http://www.minerva-labs.com/post/new-hancitor-pimp-my-downloader[7] CrunchCode http://www.crunchcode.de/en/index.html[8] MacroShop https://github.com/khr0x40sh/MacroShop[9] Veil Evasion Framework https://github.com/Veil-Framework/Veil-Evasion[10] Generate-Macro https://github.com/enigma0x3/Generate-Macro[11] SciKit Learn Algorithm Cheat Sheet. http://scikit-learn.org/stable/tutorial/machine_learning_map/

Page 34: Applied machine learning defeating modern malicious documents

#RSAC

Thank You!

34

Questions?

Page 35: Applied machine learning defeating modern malicious documents

#RSAC

Offline Demo

35

Page 36: Applied machine learning defeating modern malicious documents

#RSAC

Identifying Phishing Campaigns

36

Page 37: Applied machine learning defeating modern malicious documents

#RSAC

Identifying Phishing Campaigns

37

Page 38: Applied machine learning defeating modern malicious documents

#RSAC

Identifying Phishing Campaigns

38

Page 39: Applied machine learning defeating modern malicious documents

#RSAC

Identifying Phishing Campaigns

39

Page 40: Applied machine learning defeating modern malicious documents

#RSAC

Identifying Phishing Campaigns

40

Page 41: Applied machine learning defeating modern malicious documents

#RSAC

Identifying Phishing Campaigns

41