anomaly detection at scale
TRANSCRIPT
ANOMALY DETECTION AT SCALE:A CYBERSECURITY STREAMING DATA PIPELINE USING KAFKA AND AKKA
CLUSTERINGO'Reilly Security Conference NYC, November 2, 2016
Jeff Henrikson
Groovescale
http://www.groovescale.com
Why build predictive models?
Models continue to do usefulwork a�er humans are not looking
Models are based on assumptions
Only humans can make assumptions
INTRUSION DETECTION
1) Log Data2) Configure rules3) Human awareness examines alarms and logs4) Quick action taken (e.g. deauthorize)5) Re-authorize once human awareness deems longer-term mitigation is adequate
Sometimes for high-confidence rules we allow 2) to trigger 4) without human intervention
HOW IS A SKILLED PERSON'S AWARENESS CAN BE MORE EFFECTIVELY GUIDED?
1) Matching of network behavior against localized rules
2) Predictive modeling of the aggregate network behavior
HOW IS A SKILLED PERSON'S AWARENESS CAN BE MORE EFFECTIVELY GUIDED?
1) Matching of network behavior against localized rules
2) Predictive modeling of the aggregate network behavior
Hypothesis: Let's see if 2 is better.
AI Artificial Intelligence
"IA" Intelligence Augmented
From Building practical AI systems
Adam Cheyer, (Siri, Sentient, and Viv Labs) Strata 2016
INTRUSION DETECTION TOOLS AS "INTELLIGENCE AUGMENTED"
Intruders are trying to evade detection.
Let's not worry about making the human protector of the network going away. Probably not possiblegiven evasive response.
NETFLOW (V5) BASICSAttributes:Source/Destination IP
Source/Destination PortInput interface
Metrics: Number ofPackets, Sum of Bytes, Start Time, End Time.IPv4 only
https://nsrc.org/workshops/2015/sanog25-nmm-tutorial/materials/netflow.pdf
Functional Requirements
Produce netflow from PCAPScore netflow for anomaliesControl the number of anomalous events brought to the human expert's attention
Nonfunctional Requirements
Process line rate 10Gb/sBe within 2x perf of tcpdumpBe within 4x of netflow latencyDo not add single points of failure
EXTERNAL DESIGNSystem coupling:
Do not prescribe deploying kafka upstream or downstream(Which Kafka version? Which language binding?)
External APIs:
Ingress HTTP POST octet encoding
Egress HTTP GET Long Polling
INTERNAL DESIGNRecord state only in:
KafkaPcap temporary files on local fs
Need to write block id to EFH and dedupe for sumsto be correct in the presence of retriesPrefer late delivery to dropping dataPrefer reading capture time in data stream to wall clock time
Akka-cluster in one slide:Framework for Actor-based concurrencyProgram in Scala or JavaAkka-cluster more general than map reduce, data pipelinesMakes use local and remote resources work the same
MINIMUM VIABLE PREDICTIVE MODEL
1) Take Netflow metrics: sum(bytes), sum(packets), count
2) For each metric, compute mean and variance
3) Emit an "anomaly" when signal exceeds (mean + 3.0*sqrt(variance))
Meets minimum requirement: controls the number of events brought to the human expert'sattention
EXERCISE FOR THE READER
Model for periodicity:
Ihler et al, Adaptive Event Detection with Time–Varying Poisson Processes, ACM SIGKDD 2006http://www.datalab.uci.edu/papers/event_detection_kdd06.pdf
RESULTSQualitatively, users can find relevant Anomalies in a reasonable sized streamSystem operates reliablyNumbers are correct within assumptions
SO WHY KAFKA VS ANY OTHER STREAMING COMPONENT?
https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/comment-page-1/
STREAMING DATA LITERATURE:A data entity is created by one module, is passed from module to module until it is no longer needed
and is then destroyed. . . . Punched card accounting systems exemplify this environment.
J. P. Morrison, "Data Stream Linkage Mechanism", IBM Systems Journal, 1978.
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=45DED06EC91474F5938A9E05CC3D5A61?doi=10.1.1.89.2601&rep=rep1&type=pdf
BIND ARCHITECTURAL COUPLINGS EARLY SO THAT ARCHICTECTURALCOMPONENTS CAN BE CHOSEN WITH AMPLE EVIDENCE
Examples of components:
which databasewhich streaming engine
Examples of couplings:
format of data (e.g. newline delimited json)how to notifyhow to checkpoint
HTTP COUPLING: WINSWin #1: Can't get access to pcap over APIWin #2: Only RHEL-distributed reqs (perl-core, curl) required for ingressWin #3: Upgrade kafka when improved
HTTP COUPLING: WIN #3: UPGRADE WHEN READYKafka Version 0.9.0 0.10.0.1 0.10.1.0
Partition by Hash x x x
Write timestamp to message x x
Read seek by timestamp x
LEARNING #1https://github.com/akka/reactive-kafka
Using this library in place of KafkaConsumer
FAVOR INTEGRATION TESTING TO UNIT TESTINGIngress, egress have optional flag placebo={true,false}. Default to true.Every deployment simulates low volume placebo sinks, sources.Transmit heartbeats when each component is sure to have made forward progress.
ON EVALUATING FAULT TOLERANCE AND SCALABILITY
My smart buddy
LinkedIn runs it in production
The NSA
Can we do better?
ON EVALUATING FAULT TOLERANCE AND SCALABILITY:The idea:
Create linked containers for appUse tc to tell netfilter to drop and/or delay packetsRun simulated data source
ON EVALUATING FAULT TOLERANCE AND SCALABILITY:
Hands on create container:
Hands on with the container:
Hands on with the host:
(docker-machine's boot2docker has tc built-in)
docker run -it --rm ubuntu:14.04.2 bash
root@07e330775e98:/# apt-get update && apt-get install -y ethtool root@07e330775e98:/# ethtool -S eth0 NIC statistics: peer_ifindex: 875
dev=$(ip link | grep '^875:') tc qdisc change dev $dev root netem delay 100ms 20ms distribution normaltc qdisc change dev eth0 root netem loss 0.1%
Myth: Code should always go into docker containers through an image
Alternative: docker run -v $dirSrc:$dirSrc # to convey source code docker exec # to restart program
Myth: A docker image is something that came from a Dockerfile:
Alternative docker run ansible-playbook -c local docker commit
RECOMMENDED READING
I Heart Logs, Jay Kreps (creator of Kafka)
Akka in Action, Roestenburg et al
Released Sept 30, 2016
Scala for the Impatient, 1e, Cay Horstman
Second edition coming December 2016
https://www.amazon.com/Heart-Logs-Stream-Processing-Integration/dp/1491909382
https://www.amazon.com/Akka-Action-Raymond-Roestenburg/dp/1617291013
https://www.amazon.com/Scala-Impatient-Cay-S-Horstmann/dp/0321774094
READINGS ON LOW LATENCY DATA ENGINEERING(ORGANIZED BY COMMUNITY)
Community Title URL
Reactive The Reactive Manifesto http://www.reactivemanifesto.org/
Reactive Streams http://www.reactive-streams.org/
Kafka I Heart Logs, Jay Kreps, 2014 https://www.amazon.com/Heart-Logs-Stream-Processing-Integration/dp/1491909382
Kafka: The Definitive Guide,prerelease/2017
https://www.amazon.com/Kafka-Definitive-Real-time-stream-processing/dp/1491936169
NiFi The core concepts of NifFi http://nifi.apache.org/docs/nifi-docs/html/overview.html#the-core-concepts-of-nifi
Flow BasedProgramming
Flow-Based Programming, J. PaulMorrison, 2010
https://www.amazon.com/Flow-Based-Programming-2nd-Application-Development/dp/1451542321
Storm Big Data, Nathan Marz, 2015 https://www.amazon.com/Big-Data-Principles-practices-scalable/dp/1617290343