applied machine learning: defeating modern malicious … · #rsac detecting malicious macros. how...
Post on 17-May-2018
236 Views
Preview:
TRANSCRIPT
SESSION ID:SESSION ID:
#RSAC
Evan Gaustad
Applied Machine Learning:Defeating Modern Malicious Documents
HT-W02
Sr. Manager CSIRTTarget Corporation
#RSAC
Agenda
2
Office Macro Use and Abuse
Malicious documents in attack lifecycle
Machine Learning for Malware Detection
Demo Project: Malicious Macro Bot
Conclusion
#RSAC
Macro-enabled Microsoft Office Documents
3
An office macro is code that automates tasks in office documents
Automatically fill out formsUpdate graphs and display dataMake web requestsPerform computations
Written in Visual Basic for Applications (VBA)
VBA Support built into MS Office
99.7% of documents used in attachment-based campaigns relied on social engineering and macros, rather than exploits.- Proofpoint1
#RSAC
Attacker motivation for malicious office docs
4
Barrier of entry is very low
Uses built in, cross-platform features“exploit” reliability is high
Can implement sandbox evasion
Easy to update to evade AV signatures
#RSAC
Malicious Macro-enabled Office Documents
5
Used by an attacker to gain code execution on the targeted system(s)
Common Attacker VBA Techniques:Download and execute malicious payloadDrop and execute embedded payloads or scriptsObfuscation to hide intentSandbox evasion techniquesPayload targeting
…
“98% of Office-targeted threats use macros”- Microsoft2
#RSAC
7
1) Phishing email with attachment
“Invoice Past Due”
2) Victim opens file, allows macros to run
Example: Maldocs in Attack Lifecycle
#RSAC
Example: Maldocs in Attack Lifecycle
8
1) Phishing email with attachment
“Invoice Past Due”
2) Victim opens file, allows macros to run
3) Malicious macro executes
#RSAC
Example: Maldocs in Attack Lifecycle
9
1) Phishing email with attachment
“Invoice Past Due”
2) Victim opens file, allows macros to run
3) Malicious macro executes
4) Downloads / drops executables or powershell
#RSAC
Example: Maldocs in Attack Lifecycle
10
1) Phishing email with attachment
“Invoice Past Due”
2) Victim opens file, allows macros to run
3) Malicious macro executes
4) Downloads / drops executables or powershell
5) Install additional malware e.g. Pony, Hancitor, Vawtrak
#RSAC
Example: Maldocs in Attack Lifecycle
11
1) Phishing email with attachment
“Invoice Past Due”
2) Victim opens file, allows macros to run
3) Malicious macro executes
4) Downloads / drops executables or powershell
5) Install additional malware e.g. Pony, Hancitor, Vawtrak
6) Steal credentials, data, maintain persistence, command and control
VictimAttackerhttp://.../gate.php
#RSAC
Detecting Malicious MacrosHow hard is it to create:
a malicious macro…that runs an executable…on victim’s machine…and evades AV?
Some easy to find tools:CrunchCode7
MacroShop8
Veil Framework9
Generate-Macro10
Criminals sell their own
12
#RSAC
Detecting Malicious MacrosHow hard is it to create:
a malicious macro…that runs an executable…on victim’s machine…and evades AV?
Some easy to find tools:CrunchCode7
MacroShop8
Veil Framework9
Generate-Macro10
Criminals sell their own
13
Really easy
#RSAC
Why Machine Learning?
16
Existing anti-virus and sandbox techniques can be subverted
Automates extracting insight from file samples
Can better generalize at identifying unknown variations
Reduces human analysis time
#RSAC
Project Approach
17
Goals:Triage: Determine whether a new Microsoft Office document contains a malicious or benign macroDetection: Provide useful detection when signature-based methods failThreat Intelligence: identify phishing campaigns
Guiding Principles:Supervised Machine Learning – ClassificationWell thought out featuresGeneralized and interpretable model output
#RSAC
Applied Machine Learning Steps
19
Benign Files
Malicious FilesFeature
Extraction
5.7 10 98 …1.2 23 15 …0.7 57 20 …
…
Collect labeled data Feature extraction
#RSAC
“Feature Engineering”
20
DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…
DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'##############################
'Go to the top of the Price columnRange("H10").Select
'Find the bottom value - there are no values in the Non Stock Items
Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected
If ActiveCell.Row > 1000 Then GoToTidyUp…
Which one is malicious?
Why?
How would you measure that?
#RSAC
“Feature Engineering”
21
DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…
DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'#############################
'Go to the top of the Price columnRange("H10").Select
'Find the bottom value - there are no values in the Non Stock Items
Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected
If ActiveCell.Row > 1000 Then GoToTidyUp…
Feature Doc1 Doc2# Lines of Code 74 584# Comments 8 161# Functions 9 14# Shell Instructions 1 0
Entropy 4.3 3.8
#RSAC
Applied Machine Learning Steps
24
Benign Files
Malicious FilesFeature
Extraction
5.7 10 98 …1.2 23 15 …0.7 57 20 …
…Classification
Model
Collect labeled data Feature extraction Train and Testmodel
Classification Models
#RSAC
Choose and Test Model
25
DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…
DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'##############################
'Go to the top of the Price columnRange("H10").Select
'Find the bottom value - there are no values in the Non Stock Items
Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected
If ActiveCell.Row > 1000 Then GoToTidyUp…
Feature Doc1 Doc2# Lines of Code 74 584# Comments 8 161# Functions 9 14# Shell Instructions 1 0
Entropy 4.3 3.8
#RSAC
Simple Decision Tree Model
26
entropy <= 4.27samples = 88
samples = 47class = benign
# comments <= 39.0samples = 41
samples = 47class = benign
samples = 47class = malicious
True False
True False
#RSAC
Simple Decision Tree Model
27
entropy <= 4.27samples = 88
samples = 47class = benign
# comments <= 39.0samples = 41
samples = 47class = benign
samples = 47class = malicious
True False
True False
Doc #1
Feature Doc1 Doc2Entropy 4.3 3.8# Comments 8 161
Doc #2
#RSAC
Applied Machine Learning Steps
28
Benign Files
Malicious FilesFeature
Extraction
5.7 10 98 …1.2 23 15 …0.7 57 20 …
…Classification
Model
Collect labeled data Feature extraction Train and Testmodel
Classification Model
Deploy Model
NewFiles
“Benign”“Malicious”
Classification Models
#RSAC
Malicious Macro Bot Project
29
Model factored in over 20,000 samples
Analyzed over 121,000 samples from 7 months of VirusTotal samples
Over a thousand featuresVBA built-in language semantics for base language e.g. Shell, Dim, If, …Code heuristics e.g. LOC, # functions, entropy, …
Use Random Forest Classifier Fits many decision trees on many subsets of the datasetPicks best decision tree combinations“Ensemble”
#RSAC
Demo: Malicious Macro Bot Project
30
Demonstrate classification
Gaining insight from machine learning features
Identifying phishing campaigns through featureprints
Search and visualize in Elasticsearch / Kibana
#RSAC
Conclusion
31
Project Uses:Threat Intelligence: Identify new phishing campaignsDetection: Fill traditional detection gapsIncident Response: Rapid triage of office documents
Prevention would be best
#RSAC
References
33
[1] Proofpoint “Human Factor Report”, 2016. https://www.proofpoint.com/sites/default/files/human-factor-report-2016.pdf[2] Microsoft, “New feature in Office 2016 can block macros and help prevent infection”, Mar 22, 2016. https://blogs.technet.microsoft.com/mmpc/2016/03/22/new-feature-in-office-2016-can-block-macros-and-help-prevent-infection/[3] Proofpoint, “The Cybercrime Economics of Malicious Macros”, 2016. https://www.proofpoint.com/sites/default/files/documents/bnt_download/pp-macroeconomics-rr.pdf[4] Ankit Anubhav, Dileep Kumar Jallepalli. “Hancitor (aka Chanitor) Observed Using Multiple Attack Approaches”. FireEye. Sept 23, 2016. https://www.fireeye.com/blog/threat-research/2016/09/hancitor_aka_chanit.html[5] PonyUp: Tracing Pony’s Threat Cycle and Multi-Stage Infection Chain. Damballa. Aug. 2015. https://www.damballa.com/wp-content/uploads/2015/08/Damballa_PonyUp.pdf[6] New Hancitor: Pimp my Downloader. Minerva Labs Research Team. Aug 19, 2016. http://www.minerva-labs.com/post/new-hancitor-pimp-my-downloader[7] CrunchCode http://www.crunchcode.de/en/index.html[8] MacroShop https://github.com/khr0x40sh/MacroShop[9] Veil Evasion Framework https://github.com/Veil-Framework/Veil-Evasion[10] Generate-Macro https://github.com/enigma0x3/Generate-Macro[11] SciKit Learn Algorithm Cheat Sheet. http://scikit-learn.org/stable/tutorial/machine_learning_map/
top related