dc spark bake off - realtime tcp packet analysis using spark and azure event hubs

6
Washington DC Area Apache Spark Interactive Spark Bake-off Team Name: Silvio Fiorito Solution Title: Real-time Packet Analysis using Spark

Upload: silvio-fiorito

Post on 28-Jul-2015

232 views

Category:

Software


2 download

TRANSCRIPT

Washington DC Area Apache Spark Interactive

Spark Bake-off

Team Name: Silvio Fiorito Solution Title: Real-time Packet Analysis using Spark

Spark Bake-offPage: 2

Team Introductions

Silvio Fiorito – Background in development and app security– Started working with Hadoop in 2012– Started using Spark at v0.6 in early 2013– Built a few prototypes for low-latency query

services with Spark/Shark and then SparkSQL

– Twitter: @granturing

Spark Bake-offPage: 3

Solution Overview

Real-time TCP packet analysis of geographically distributed hosts– Must support high throughput from many hosts– 3 demo VMs ( 2 x Azure & 1 x AWS)

Local Flume agent pushes events to Azure Event Hub Events are partitioned and persisted up to 7 days Spark Streaming app ingests streams

– Reconstruct packets– Lookups for geo-ip and port description– Clusters using pre-trained k-means model– Saves data to Azure Table Storage and publishes on Service

Bus Topic

Spark Bake-offPage: 4

Solution Overview

Spark Bake-offPage: 5

Sample Dashboard with Power BI

Spark Bake-offPage: 6

Final Comments & Questions

With more time– Add true anomaly detection with MLLib– Test on hosts with real traffic– Wire up end-to-end with d3.js viz and

SparkSQL backend– Integrate with existing IDS/IPS rules– Bad IPs lookup