opml shooting the moving target - usenix · shooting the moving target : machine learning in...

Shooting the moving target :machine learning in cybersecurity

Ankit Arun*, Ignacio Arnaldo

MIT CSAIL top 16 2016

1. Machine Learning in Cybersecurity: problem statement and state-of-the-art

2. Machine Learning Platform

3. Current state of the system

4. Ongoing efforts

Outline

Vast number of data sources and attacks

100+Log Types

1000+Security Attacks

Reported in 2018

~ 24kmalicious mobile apps are blocked

everyday

600%IoT attacks in 2017

350%annually

Ransomware attacks

303USA faced

targeted attacks between 2015 and

The Need for AI in InfoSec: Data Problem

86%are investigated

successfully

80%of Attacks GoUndetected

By machines (aka logs and network systems) during or after the attack

By human analysts, after an attack has been known to occur

Detection approaches

I’m Here

Coverage

False positives Dwell time

Threat intel and signatures Rules Anomaly detection Supervised models

Challenges…

Cybersecurity

ComputerVision

More ExpertKnowledgeRequired

DATA PROPERTY AVAILABILITY VARIETY LABELED STATIC / DYNAMIC

Siloed with BarriersAdversarial

and Dynamic

State-of-the-art ML in Cybersecurity

[1] M. Darling, G. Heileman, G. Gressel, A. Ashok, and P. Poornachandran, “A lexical approach for classifying malicious urls”

[2] M. S. I. Mamun, M. A. Rathore, A. H. Lashkari, N. Stakhanova, and A. A. Ghorbani, “Detecting malicious urls using lexical analysis”

[3] Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long short-term memory networks”

[4] H. S. Anderson, J. Woodbridge, and B. Filar, “DeepDGA: Adversarially-Tuned Domain Generation and Detection,”

[5] J. Saxe et al., “eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys”

● 2015-2016: lexical analysis to detect spam, malware hosting and phishing URLs [1][2]

● 2016: LSTMs for DGA detection [3][4]● 2017: Char-level CNNs for URL classification [5]

Academia Industry● Web traffic open by default● Blacklists based on threat intelligence● ML is rarely used for live detection

Are the approaches still valid or are they outdated?How do the models perform in real world scenarios?

Will the models work in my environment?Risks preventing ML adoption:

Data Pipelines

Labels

Models

Continuous Improvement Process

• Adding/Changing more data• Changing the entity to model• Adding more attack examples• Changing modeling strategy

Machine Learning Platform

The cloud repositories

Golden Data Set and Models

Threat Researchers

ML Engineers

Data Scientists

The cloud repositories

Horizontal Brute Force Attackenvironment_1raw_logsnormalized_logsfeatures

label.csvlabeled_feature_matrixmodelsBrute_force_attack_classifier_v1.1Brute_force_attack_outlier_v1.1

Configurable data pipelines

fields {name: ‘protocol’display_name: ‘Protocol’index: ‘proto’data_type: string

Log Parsing Engine

Configurable data pipelines

feature {name: ‘distinct_protocol’display_name: ‘Distinct Protocol’definition: ‘count_distinct(protocol)’data_type: int

Feature Compute Engine

Model Versioning

Brute_force_attack_Classifier_V2.3

Major Version

Minor Version

Brute Force Attack ClassifierParam Version Apr 2019 Mar 2019 Feb 2019 Jan 2019 Dec 2018 Nov 2018 Oct 2018 Sep 2018

4 3 2 1

Current state of the system

Ping Sweep

Port Scan

DNS Reconnaissance

Zone Transfer

Social Eng Domains

Phishing Domains

Redirects

Dll Highjack

Task Sched

Mimikatz

Winroot

Domain Enumeration

Brute Force Login

Overpass the Hash

Skeleton Key Attack

Kerberoasting

DC Replication

Golden Ticket Attack

SSO Login Attack

Malware Backdoor

TOR Connections

ICMP Tunneling

HTTP Tunneling

Twittor

SSH Tunneling

DNS Tunneling

DNS Beaconing

ICMP Exfiltration

HTTP Exfiltration

Gmail Exfiltration

Twitter Exfiltration

NTP Exfiltration

SMTP Exfiltration

DNS Exfiltration

Cloud Takeover

Reconnaissance Delivery Privilege Escalation

Lateral Movement

Command and Control Exfiltration

Fwd Proxy Logs / NGFW

AD Logs

EDR Logs

DNS Logs

App Logs

Network

Proxy Logs

Zscaler

BlueCoat

Bro HTTP

Intersafe

FW Logs

Cisco ASA

Fortigate

NetScreen

Bro Conn

Flow Logs

Netflow

VPC Flow

IBM QFlow

DNS Logs

Windows DNS

Suricata

Bro DNS

Authentication

Auth/Auth

Active Directory

End Point

EDR Logs

Carbon Black

osQuery

Applications

App Logs

Apache

OneDrive

Audit Trail

AWS CloudTrail

Contextual

Tenable

Open IoC

Alexa Top 1M

31Data Sources

27Golden

Datasets

70Models

1000Model

Deployment

Weekly Model

Updates

Ongoing efforts

• Automating Feature Computation

• Data Shift Detection

• Automating Model Review/Update Process

References• https://www.ptsecurity.com/ww-en/analytics/cybersecurity-

threatscape-2018-q3/• https://www.checkpoint.com/downloads/product-related/report/2018-

security-report.pdf• https://www.varonis.com/blog/cybersecurity-statistics/

Questions?

opml shooting the moving target - usenix · shooting the moving target : machine learning in...

Documents

shine winter 2018 the shooting star chase magazine · the...

ads-b what is it? what it means to us 1. ads-b aka shooting...

recovering and moving forward - police foundationrecovering...

shooting schedule

tristans shooting

shooting photography

premium target optics for dynamic hunting - zeiss.com ·...

the two–column shooting script. jump cuts matched action...

1-smf0509-009 ad opml english

shooting basics

shooting closures shooting closures apply to recreational...

· quick reference sheet slow shoot without moving (may...

shooting at moving targets - bsrc · r j maddison, 19 april...

the scout association · web viewair rifle shooting air...

opml infographics - labour market in the turin's province

shooting locations

acknowledgement to sport... · web viewrifle target...

by kaitlyn dodge. group data chart classfrequency blue...

phpskjones.files.wordpress.com … · web viewshooting...

wyrdwars v.1.4 · 1 wyrdwars v.1.4.1 basic rules unit...