understanding how machine learning defends against zero-day threats

© 2016 ISACA. All Rights Reserved© 2016 ISACA. All Rights Reserved

UNDERSTANDING HOW MACHINE LEARNING DEFENDS AGAINST ZERO-DAY THREATSVinoo ThomasSr. Product Manager, Intel Security

© 2016 ISACA. All Rights Reserved

WELCOME

• Have a question for the speaker? Text it in using the Ask A Question button!

• Audio is streamed over your computer

• Technical issues? Click the ?button

• Use the Feedback button to share your feedback about today’s event

• Questions or suggestions?Email them to [email protected]

Use the Attachments Button to find the following:

• PDF Copy of today’s presentation

• Link to the Event Home Page where ISACA members can find the CPE Quiz

• Upcoming ISACA Events

• More assets from today’s webcast

2

mailto:[email protected]


TODAY’S SPEAKER

3

Vinoo ThomasSr. Product Manager, Intel Security


AGENDA

Malware Trends

Unpacking Challenges

Machine Learning Approaches

Feature Extraction and Clustering

Q&A

4


5

Mid-1980’s 2005 Today2000 2010

Historic Stages of Cyber Attacks

MALWARE TRENDS


Everything of value is being harvested!

Looking for Credentials:

FB or Twitter account

Dropbox account (corporate documents)

IT credentials

Looking for Source Code:

Design document

Blueprints and drawings

Intellectual Property

Looking for digital money:

Bitcoin

PayPal

Looking for certificates and private keys

6

MALWARE WANTS YOUR INFORMATION


The relentless climb of total malware continues.

We crossed the half-billion-sample barrier end of 2015

Source: McAfee Labs

December 31, 2015: 516 million samples

( ~ 480,000 new and unique malicious binaries classified daily)

MALWARE TRENDS

7


There are 327 new

threats every

minute,

or more than

5 every second

Source: McAfee Labs

MALWARE TRENDS

8


10010101

10101010

11101010

Signatures identify with near certainty that an object is either malicious or clean

This technique is reactive by nature. Although very precise, the sheer number and growth in malware variants is making this unsustainable

Malware authors are continuously monitoring AV vendor detection and releasing new variants

Use of commercial, open source or underground packers and protectors makes repacking new variants trivial

THE AGE OF “SIGNATURES” IS FADING…

9


• Think of it as a file, inside another executable file…. which can be inside another executable file.

• Think Russian dolls (Matryoshka).

• When executed, the ‘outer’ executable will unpack the contents of the ‘inner’ executable into memory and execute it.

• The inner most executable is the ‘real’ executable!

UNPACKING CHALLENGES

10



What did this Snake eat for lunch? ;)

11


12

VBS/HOUDINI – INITIAL VARIANT


13



14



15

VBS/HOUDINI – SUBSEQUENT VARIANTS


16



17



18



19



Malware developers work hard to obfuscate their malicious intent


20


21Sources: http://blog.gentilkiwi.com/mimikatz

www.matcode.com/mpress.htm

UNPACKING CHALLENGES - MIMIKATZ


22

MIMIKATZ – COMPILED BINARY


23

MIMIKATZ – PACKED WITH MPRESS


24

NATIVE BINARY


25

NATIVE BINARY


26

NATIVE BINARY


27

NATIVE BINARY


28

NATIVE BINARY


29

NATIVE BINARY


30

NATIVE BINARY


31

PACKED BINARY

Previously

available static

features are

destroyed and

made unavailable

by the packer


• Static Analysis

• Fuzzy Hashing

• Import Hash

• Dynamic Analysis

• Memory Analysis

• Machine Learning

DIFFERENT APPROACHES

32


Extract features

Select algorithm

Model & Classifier

Evaluate results

MACHINE LEARNING APPROACH

33


Identify a suspicious characteristic or activity

The object is given a reputation and confidence level if existing signatures based methods don’t detect

Pre-execution: Static file features extraction(file type, import hash, entry point, resources, strings, packer & compiler details, compile time, API’s, section names etc.)

Post-execution: Behavioral features and Memory analysis

(behavioral sequence, process tree, file system, registry events, network communication events, strings from memory)

The key is to combine multiple sources of features for classification!

LEVERAGING MULTIPLE SOURCES OF KNOWLEDGE

34

© 2016 ISACA. All Rights Reserved35

CLASSIFICATION OF MALICIOUS PROGRAMS

• Analyze PE (.exe, .dll, .sys), Office, PDF, VBS and other executable and

script file formats

• Static Features

– Static file features abstracted to check a file prior to execution on system

– Can block malicious files with rich features pre-execution (think Adware!)

• Runtime execution tracing

– Traces program execution events (file system, registry and network events) using a

light-weight sensor

– Program runtime trace abstracted and sent to the machine learning classifier to make

a real-time classification based on behavior

– Once machine learning classifier responds with a malicious classification, sensor

performs genealogy-based repair

A hybrid approach provides the best classification rates!


Supervised Learning Training data with classification/labels as input to prepare algorithm that can be applied to classify unknown data

SUPERVISED MACHINE LEARNING

36


EXTRACTING STATIC FEATURES

API Calls, Command and Patterns

Ransomware Example

37


API Calls, Command and Patterns

EXTRACTING STATIC FEATURES

Ransomware Example

38


EXTRACTING BEHAVIORAL FEATURES

Ransomware Example

39


RUNTIME EXECUTION TRACE

Behavioral trace when sample was executed

40


41






42


43




44

UNSUPERVISED MACHINE LEARNING


45



46



47



48

SIMILARITY: PROTOTYPE BASED CLUSTERING


49



50



51

SIMILARITY: CLASSIFICATION BASED ON CLUSTERING


52

SIMILARITY: CLASSIFICATION BASED ON CLUSTERING


Graphic representation of clusters with samples which are similar

53

MD5: 207B87475A36BAF2ED6F4A57B7C8A453

Similarity: 63.19%

CLUSTERING SIMILAR SAMPLES


Real Protect gathers static features and behavioral traces to determines if it matches patterns seen with known malware

Offers a platform for researchers to conduct forensics investigations and hunt for IOCs

Real Protect augments Windows endpoint products with machine learning based classification to protect users from new malware

Cloud-based analytics for malware detection

INTRODUCING REAL PROTECT

54

© 2016 ISACA. All Rights Reserved55

• Detects zero-day malware in near real-time– Uses machine learning to detect unknown malware and zero-day threats

– Signature-less, small client footprint

• Improves detection up to 30% on top of DAT and Cloud detections– Augments McAfee endpoint security products for Windows

• Produces actionable threat intelligence– Real Protect classification can be used to create Indicators of Compromise

– Useful for patient zero discovery, forensic investigations and remediation

• Beta available now– www.mcafee.com/us/downloads/free-tools/raptor.aspx

– Part of Consumer Cloud AV product

– Enterprise availability planned for later this year.

REAL PROTECT

http://www.mcafee.com/us/downloads/free-tools/raptor.aspx


QUESTIONS?

56


THANK YOUFOR ATTENDING THIS WEBINAR


For more information, go to www.isaca.org/webinars

understanding how machine learning defends against zero-day threats

Technology