big data visualization

43
Raffael Marty, CEO Big Data Visualization London February, 2015

Upload: raffael-marty

Post on 14-Jul-2015

6.119 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Big Data Visualization

Raffael Marty, CEO

Big Data Visualization

London February, 2015

Page 2: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .2

• Visualization

• Design Principles

• Dashboards

• SOC Dashboard

• Data Discovery and Exploration

• Data Requirements for Visualization

• Big Data Lake

Overview

Page 3: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .3

I am Raffy - I do Viz!

IBM Research

Page 4: Big Data Visualization

4

Visualization

Page 5: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .5

Why Visualization?the stats ...

http://en.wikipedia.org/wiki/Anscombe%27s_quartet

the data...

Page 6: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .6

Why Visualization?

http://en.wikipedia.org/wiki/Anscombe%27s_quartet

Human analyst: • pattern detection • remembers context • fantastic intuition • can predict

Page 7: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .7

Visualization To …

Present / Communicate Discover / Explore

Page 8: Big Data Visualization

Design Principles

Page 9: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .9

Choosing Visualizations

Objective AudienceData

Page 10: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .10

• Objective: Find attackers in the network moving laterally

• Defines data needed (netflow, sflow, …)

• maybe restrict to a network segment

• Audience: security analyst, risk team, …

• Informs how to visualize / present data

For Example - Lateral Movement

Recon Weaponize Deliver Exploit Install C2 Act

Page 11: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .11

• Show  comparisons, contrasts,

differences • Show  causality, mechanism,

explanation, systematic structure. • Show  multivariate data; that is,

show more than 1 or 2 variables.

by Edward Tufte

Principals of Analytic Design

Page 12: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .12

Show Context

42

Page 13: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .

42 is just a number

and means nothing without context

13

Show Context

Page 14: Big Data Visualization
Page 15: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .15

Use Numbers To Highlight Most Important Parts of Data

NumbersSummaries

Page 16: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .16

Additional information about objects, such as:

• machine • roles • criticality • location • owner • …

• user • roles • office location • …

Add Context

source destination

machine and user context

machine role

user role

Page 17: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .17

Traffic Flow Analysis With Context

Page 18: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .18

http://www.scifiinterfaces.com/

• Black background • Blue or green colors • Glow

Aesthetics Matter

Page 19: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .19

B O R I N G

Page 20: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .20

Sexier

Page 21: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .21

• Audience, audience, audience!

• Comprehensive Information (enough context)

• Highlight important data

• Use graphics when appropriate

• Good choice of graphics and design

• Aesthetically pleasing

• Enough information to decide if action is necessary

• No scrolling

• Real-time vs. batch? (Refresh-rates)

• Clear organization

Dashboard Design Principles

Page 22: Big Data Visualization

22

SOC Dashboards

Page 23: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .23

Mostly Blank

Page 24: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .24

• Disappears too quickly

• Analysts focus is on their own screens

• SOC dashboard just distracts

• Detailed information not legible

• Put the detailed dashboards on the analysts screens!

Dashboards For Discovery

Page 25: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .25

• Provide analyst with context

• “What else is going on in the environment right now?”

• Bring Into Focus

• Turn something benign into something interesting

• Disprove

• Turn something interesting into something benign

Use SOC Dashboard For Context

Environment informs detection policies

Page 26: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .26

Show Comparisons

Current Measure

week prior

Page 27: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .27

• News feed summary (FS ISAC feeds, mailinglists, threat feeds)

• Monitoring twitter or IRC for certain activity / keywords

• Volumes or metrics (e.g., #firewall blocks, #IDS alerts, #failed transactions)

• Top N metrics:

• Top 10 suspicious users

• Top 10 servers connecting outbound

What To Put on Screens

Provide context to individual security alerts

http://raffy.ch/blog/2015/01/15/dashboards-in-the-security-opartions-center-soc/

Page 28: Big Data Visualization

28

Data Discovery & Exploration

Page 29: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .29

Visualize Me Lots (>1TB) of Data

Page 30: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .30

Information Visualization Mantra

Overview Zoom / Filter Details on Demand

Principle by Ben Shneiderman

• summary / aggregation • data mining • signal detection (IDS, behavioral, etc.)

Page 31: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .31

• Access to data

• Parsed data and data context

• Data architecture for central data access and fast queries

• Application of data mining (how?, what?, scalable, …)

• Visualization tools that support

• Complex visual types (||-coordinates, treemaps,

heat maps, link graphs)

• Linked views

• Data mining (clustering, …)

• Collaboration, information sharing

• Visual analytics workflow

Visualization Challenges

Page 32: Big Data Visualization

Big Data Lake

Page 33: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .33

• One central location to store all cyber security data • “Data collected only once and third party software leveraging it” • Scalability and interoperability

• More than deploying an off the shelf product from a vendor • Data use influences both data formats and technologies to store the data

• search, analytics, relationships, and distributed processing • correlation, and statistical summarization

• What to do with Context? Enrich or join? • Hard problems:

• Parsing: can you re-parse? Common naming scheme! • Data store capabilities (search, analytics, distributed processing, etc.) • Access to data: SQL (even in Hadoop context), how can products access the data?

The Big Data Lake

Page 34: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .34

Federated Data Access

SIEM

dispatcher

SIEM connector SIEM console

Prod A

AD / LDAPHR

IDS

FW Prod B

DBs

Data Lake

Caveats:

• Dispatcher?

• Standard access to dispatcher /

products enabled

• Data lake technology?

SNMP

Page 35: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .35

Multiple Data Stores

raw logs

key-value

structured

real-timeprocessing

(un)-structured data

context

SQL

storage

stats

index

queue

distributedprocessing

access

graph

Caveat:

• Need multiple types of data stores

Page 36: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .36

Technologies (Example)

raw logs

key-value(Cassandra)

columnar(parquet)

real-time processing

(Spark)

(un)-structured data

context

SQL(Impala,

SparkSQL)

HDFS

aggregates

index(ES)

queue(Kafka)

distributedprocessing

(Spark)

access

graph(GraphX)

Caveat:

• No out of the box

solution available

Page 37: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .37

SIEM Integration - Log Management First

SIEM

columnar or

search engineor

log management

processing

SIEM connector

raw logs

SIEM console

SQL or searchinterface

processingfiltering

HDFS

e.g., PIG parsing

Page 38: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .38

Simple SIEM Integration

raw, csv, jsonflume

log data

SQL(Impala,

with SerDe)

HDFS

SIEM connector

SIEM

Requirement:

• SIEM connector to forward text-based data to Flume.

SQL interface Tableau, etc.

SIEM console

Page 39: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .39

SIEM Integration - Advanced

SIEM

columnar(parquet)

processing

syslog data

SQL(Impala,

SparkSQL)

HDFS

index(ES)

queue(Kafka)

access

other data sources

SIEM connector

raw logs

SIEM console

SQL and search interface

Tableau, Kibana, etc.requires parsing and formatting in a SIEM readable format (e.g., CEF)

Page 40: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .40

What I am Working On

Data Stores Analytics Forensics Models Admin

10.9.79.109 --> 3.16.204.150 10.8.24.80 --> 192.168.148.19310.8.50.85 --> 192.168.148.19310.8.48.128 --> 192.168.148.19310.9.79.6 --> 192.168.148.193

10.9.79.6

10.8.48.128

80

538.8.8.8

127.0.0.1

Anomalies

Decomposition

Data

Seasonal

Trend

Anomaly Details

“Hunt” ExplainVisual Search

• Big data backend • Own visualization engine (Web-based) • Visualization workflows

Page 41: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .41

BlackHat Workshop

Visual Analytics - Delivering Actionable Security

Intelligence

August 1-6 2015, Las Vegas, USA

big data | analytics | visualization

Page 42: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .42

http://secviz.org

List: secviz.org/mailinglist

Twitter: @secviz

Share, discuss, challenge, and learn about security visualization.

Security Visualization Community

Page 43: Big Data Visualization

Secur i ty. Analyt ics . Ins ight .

[email protected]

http://slideshare.net/zrlram

http://secviz.org and @secviz

Further resources: