big data technologies for infosec

22
Big Data Technologies for InfoSec Dive Deeper. See Further. Ram Sripracha ([email protected]) UCLA / Sift Security

Upload: gada

Post on 06-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Big Data Technologies for InfoSec. Dive Deeper. See Further . Ram Sripracha ( [email protected] ) UCLA / Sift Security. Experiences. RR Systems. What are “Big Data” systems?. XXL in Size Data Volume TBs - PBs Computation Scalability Horizontally Scalable Multi-host Deployment - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Big  Data Technologies  for InfoSec

Big Data Technologies for

InfoSecDive Deeper. See Further.

Ram Sripracha ([email protected])UCLA / Sift Security

Page 2: Big  Data Technologies  for InfoSec

Experiences

RR Systems

Page 3: Big  Data Technologies  for InfoSec

What are “Big Data” systems?

• XXL in Size• Data Volume• TBs - PBs

• Computation Scalability• Horizontally Scalable• Multi-host Deployment• Commodity Hardware

Page 4: Big  Data Technologies  for InfoSec

Why now?

• Rich Ecosystem• Well Supported Open Source Software

• High Adoption Rate• Commercial Backings• “Redhat” Model

• Heavily Invested

Page 5: Big  Data Technologies  for InfoSec

Platform Providers

Page 6: Big  Data Technologies  for InfoSec

Technologies

Page 7: Big  Data Technologies  for InfoSec

Is it a “Big Data” problem?

• Many moving parts• Initially maybe overwhelming

• 100s of configuration setting• Requests some level of expertise• Overkill for some problems• Larger resource footprint

Page 8: Big  Data Technologies  for InfoSec

Big Data Stack

Page 9: Big  Data Technologies  for InfoSec

Big Data Stack

Page 10: Big  Data Technologies  for InfoSec

DFS

Page 11: Big  Data Technologies  for InfoSec

• NoSQL• Columnar• Sits on HDFS• Million Rows

x Million Columns • Cell-level Security

Page 12: Big  Data Technologies  for InfoSec

Titan

• Graph-based Datastore• Optimized for (E, V)• Key/Value attributes for vertices

and edges

• 100s million vertices x 100s billion edges

• Capturing relationships• Sits on top of HBase, Cassandra,

Page 13: Big  Data Technologies  for InfoSec

Map-Reduce

Page 14: Big  Data Technologies  for InfoSec

• Resilient Distributed Dataset(RDD)

• In-Memory RDD• Iterative Algorithms• Machine Learning

Page 15: Big  Data Technologies  for InfoSec
Page 16: Big  Data Technologies  for InfoSec

Impala

• Near-real-time analysis• Micro-batch processing• Pipelining of micro-

batches• Stream annotations

Page 17: Big  Data Technologies  for InfoSec

• Sits on top of• Distributed indexing and search• Indexes • Raw text files from HDFS• HBase content• Titan properties• Other data replicated data streams

Page 18: Big  Data Technologies  for InfoSec

Application Log Search

• Full Text Indexes• Flexible Faceting• Automatic field extraction• Dashboard-able search

interface• Low-cost alternative to

Splunk and other search solutions

Page 19: Big  Data Technologies  for InfoSec

Real-time Blacklist Alerting• Fault tolerance• Netflow annotation• Match alerting• Application access alerting• Authentication alerting

• Network metrics

Page 20: Big  Data Technologies  for InfoSec

Netflow Data Warehouse

• 3x Nodes• 2x 8-Core Intel E5-2450 per

node• 16Gb RAM per node• 72TB Storage Total• ~5B Netflow records/day• >1 year retention• Support complex SQL-like query

Page 21: Big  Data Technologies  for InfoSec

Netflow Data Warehouse

• Continuous scanning• Direct querying of delimited

file• Perform metrics and diffs• Compute trending• Firewall rule validations• Long retention

DFS

Page 22: Big  Data Technologies  for InfoSec

EMR Access Anomalies• Category of insider threat• Relational networks of• Users/Groups• Department• Document Access

• Community structure-based anomaly detection