network traffic visibility and anomaly...

50
Network Traffic Visibility and Anomaly Detection @Scale: October 27th, 2016 Dan Ellis

Upload: others

Post on 02-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Network Traffic Visibility and Anomaly Detection

@Scale: October 27th, 2016Dan Ellis

Page 2: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

• Network traffic visibility?

Introduction

Page 3: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

• Network traffic visibility?• What data is available on your network• What can you do with this data• Tools available

Introduction

Page 4: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

• Network traffic visibility?• What data is available on your network• What can you do with this data• Tools available

• 20+ years running blind (ISP’s, CDN’s, enterprise)

• Who is Kentik

Goal of this talk: Make your life easier

Introduction

Page 5: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

• Data networks can be compared to FedEx

• Imagine FedEx without package tracking

• Majority of data networks operate in this vacuum of visibility

• Hard to believe? Problem is massive data scale, lack of tools, little network + systems collaboration

Traffic Visibility Problem

Page 6: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

6

Not Helping…

Page 7: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

• Interface Volume (Mb/s, pps)?

A Poll

Page 8: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

• Interface Volume (Mb/s, pps)?

•Src/Dst IP+Port, ASN, BGP Path?

A Poll

Page 9: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

A Poll

• Interface Volume (Mb/s, pps)?

•Src/Dst IP+Port, ASN, BGP Path?

• IP, Port, ASNor Path Thresholds?

Page 10: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

?Maybe there isn’t a traffic visibility problem

Maybe no one really needs this data

Is this really a problem?

Page 11: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

11

Complaints of high latency… BGP Path to Comcast

Page 12: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

12

Dyn attack last week – ISP recursive inbound

Page 13: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Dyn attack last week – Traffic / source_ip

Page 14: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Dyn attack last week – ISP recursive outbound

Page 15: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Use cases of traffic visibility

• Network Planning• Peering Analytics and Abuse• Congestion detection• Is it the network?• Where on the network?• Proactive alerting• Distributed DDoS Detection

• What Changed Post Deploy?• Security and Breach

Detection• Cost Analytics• Revenue Identification

(New + Risk)• Enabling Internal Groups

Page 16: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Tenets

• Infinite granularity storage for months

• Drillable visibility, network specific UI

• Real-time and fast (< 10s queries)

• Anomaly detection + actions

• Open / API

• Scale

Page 17: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Now we know what we need, how do we do it?

Page 18: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

TCP stats data / app specific data

Where to find this data ?

Flow data NetFlow, SFlow, IPFIX

SNMP interfaces info

Sys/Event logsTACACS

orSyslog

AppServer

BGP Path info

NETWORK

+

+

+

=

Something actually useful

+

Router

Router

PCAPagent

Page 19: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

What kind of tools

• Current Open Source:

• Older Open Source:

• Commercial software:

• DIY Big Data:

• On-Prem Big Data:

• SaaS Big Data:

pmacct, ntop, SiLK, cacti

cflowd, AS-PATH, RRDtool

Arbor, Plixer, SevOne, Solarwinds, ManageEngine

Kafka + ELK, Hadoop, druid, grafana, tsdb

Cisco Tetration, Deepfield…

Kentik, Datadog, Appneta, Splunk

Page 20: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

20

Many tools gets you almost there

Page 21: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Really though… (tools)

Open source (ish):• Pmacct• Nprobe / Ntop• Elastic Search + Kibana (ELK)

Commercial:• Arbor• Kentik

Page 22: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Three layer approach

Ingest & Fusion layer

Storage Layer(flow specific)

QueryLayer

Each layer has separate and different scalingcharacteristics

Query engine and UI

Query interfaces

SQL

WWW

REST

datasources clients

SELECT flowFROM routerWHERE …

>_

Page 23: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Seriously?

How much data

• Small network (< 10Gb/s traf.)• Large network (1 Tb/s traf.)• Querying over 30+ days

10k flows/sec (+rows/sec)500k flows/sec@ 200k fps (518 B rows, 207 TB) in < 10s

Page 24: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Data fusion is a key enabler to useful data

Page 25: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

DATA FUSION

Proxy

DATA FUSIONDecoderModules

MemTables

NetFlow v5

NetFlow v9

IPFIX

BGP RIB

Custom Tags

SNMP Poller

BGP Daemon

Enrichment DB

DATA FUSION

Geo ßà IP

ASN ßà IP

SFlow

ROUTER

FLOW FRIENDLY DATASTORE

Single flowfused row

sent to storage

PCAPagent

PCAP

Page 26: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Proxy

Agent

CLIENT PROCESSDecoderModules

MemTables

NetFlow v5

NetFlow v9

IPFIX

Unified Flow

BGP RIB

Custom Tags

SNMP Poller

BGP Daemon

Enrichment DB

DATA FUSION

Geo ßà IP

ASN ßà IP

SFlow

ROUTER

FLOW FRIENDLY DATASTORE

Single flowfused row

sent to storage

DATA FUSION

Serious. Flux Capacitor. Is. Serious.

Page 27: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Fusing should be:

near real-timeperformed at ingest data specific

Page 28: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Network planning: traffic by BGP hop

Page 29: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Network planning: collapsed path, exclude 1st

Page 30: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Looking at existing architectures out there

Page 31: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

PMAcct-based implementation

BGP

Flow

PCAP

Ingest & Fusion layer Storage Layer Query

Layerdata

sources clients

frontend

Page 32: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Nprobe + Ntop + ElasticSearch

BGP

Flow

nProbe

nProbe(s)

Ingest & Fusion layer Storage Layer Query

Layerdata

sources clients

BGP

Flow

Page 33: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

• Dropbox implementation of a (mostly) open-source NetFlow solution here: Dropbox blog

• Requires custom ingest, fusing, UI

Kafka + Elastic Search

Page 34: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

• Ingest:

• Data-store:

• Query frontends very generic:

Distributing and scaling (1xNProbe = 1xDevice)No SNMP (= no IF info available for fusion)Aggregation (no infinite granularity)

Challenging at scale when ESvery hard for MySQL/MongoDB

Tailoring of meaningful dashboards difficultNo anomaly detection

Caveats

Commercial HW solutions (Arbor)Appliance basednot truly distributedpre-determined list of aggregated data (no infinite granularity)

Page 35: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

And so…

Page 36: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Why isn’t everybody already doing it?Required areas of expertise (because every presentation needs a Vin diagram)

Distributed systems engineersNetwork

Engineers

SREs

Low-levelNetwork

developers

Resilience / ReliabilityGeo-distributed ingestFlow friendly data-store

BGP DaemonFlow inspection & conversionNetwork protocols hacking

Make all of the abovework reliably

Train all the other teamson the involved network

protocols and their usageUnicorn

Page 37: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Looking beyond the basics

Page 38: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Once you have a platform, what’s next?

• Augmented flow (retransmits, latency, URL, DNS)• Anomaly detection• Multi-hop exit determination• BGP-path congestion detection

Building on top of the platform

Page 39: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Imagine if we could get performance data from the network:• Q Depth• Retransmits per flow• TCP latency• Application Latency

You can:• Nprobe (ntop) collects Latency, Rxmits, URL, DNS -> IPFIX flow

• Deploy on a host or a sensor

• Cisco, Juniper, Arista working to expose Q Depth into flow

Building on top of the platform

Page 40: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Retransmits enhanced flow: rexmits / interface

Page 41: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Retransmits enhanced flow: rexmits / ASN

Page 42: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Retransmits enhanced flow: TCP latency / ASN

Page 43: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

You shouldn’t have to stare at dashboards or watch logs to detect badness

Monitor top-x of any dimension combination (IP, ASN’s, Geo, Interface)

Create baselines based on time of day

Be able to look at things beyond pps/bps such as retransmits, latency, logs

Detect shifts: did an ASN or IP on a particular interface suddenly move from top-x #200 to #2 and that is unusual for this time of day

This is available today (Open Source: Hadoop, Spark, Storm, Samza, Flink)

Building on top of the platform

Page 44: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Use case: anomaly detection

TrafficfromoneASN(network)unusually high.Operator notifiedatredline.

Page 45: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Use case: traffic anomaly detection & annotation

Page 46: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Use case: traffic annotated w/ multiple events

Page 47: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Anomaly detection: DDoS detection & characteristics

Page 48: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Once you have a platform, what’s next?

üAugmented flow (retransmits, latency, URL, DNS)

üAnomaly detection

q Multi-hop exit determinationChallenging to map traffic from ingest to exit point, multi-hop

q BGP-path congestion detectionDetect individual congested paths within a circuit that isn’t congested

Building on top of the platform

Page 49: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

Summary

Networks can produce large amounts of data that will make your life easier

Big Data platforms are able to consume this data

Specific tools for Network Operators are beginning to appear (free & paid)

Paid tools are more specific to network use (UI, easy setup, etc)Free tools have the “power” but require cobbling together piecesMuch work to be done re fusing data such as logs, changes, alerts, DNS

SaaS providers will provide community views and enable data-sharing

Page 50: Network Traffic Visibility and Anomaly Detectioninfo.kentik.com/.../BOSAtScale-DanEllis-Network-Traffic.pdfUse cases of traffic visibility • Network Planning • Peering Analytics

QUESTIONS ?

Dan [email protected]

CREDITS