rethinking netflow: a case for a coordinated “risc” architecture for flow monitoring vyas sekar...

55
Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen, Anupam Gupta, Ramana Kompella, Walter Willinger 1

Post on 19-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

1

Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring

Vyas SekarJoint work with

Mike Reiter, Hui ZhangDavid Andersen, Anupam Gupta,

Ramana Kompella, Walter Willinger

Page 2: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

Flow Monitoring is critical for effective Network Management

Traffic Engineering

Analyze new user apps

AnomalyDetection

Network Forensics

Worm Detection

Accounting

Botnet analysis

…….

Many management applications Evolving and growing over time Need high-fidelity measurements

Page 3: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

3

Requirements for monitoring Network

Operations Center

Flow reports

report = ( flow = same src-dst, ports, proto) + pkt/byte counters

Respect resource constraints

High flow coverage

Provide network-wide goals

Low data management overhead

High-fidelity for all applications

Page 4: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

4

Sampling due to resource constraints

• Routers cannot record every packet/flow– Constraints: CPU, Memory, Bandwidth

• Resource constraints don’t go away!– Network demands scale even as routers become

more powerful

• Some form of sampling is inevitable– Record/report only a subset of the traffic

Page 5: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

5

Current solution

Respect resource constraints

High flow coverage

Provide network-wide goals

Low data management overhead

Uniform packet sampling, e.g., Cisco NetFlow• Each router independently samples packets• Aggregates sampled packets into flow reports

Biased towardslarge flows

Redundant measurements

Too coarse

High-fidelity for all applications Not very goodfor security

Page 6: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

6

How do we meet the requirements?

High-fidelity for all applications

Respect resource constraints

High flow coverage

Provide network-wide goals

Low data mgmt overhead

Part 1: Coordinated Sampling

Part 2: “RISC” monitoring

Page 7: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

7

How do we meet the requirements?

High-fidelity for all applications

Respect resource constraints

High flow coverage

Provide network-wide goals

Low data mgmt overhead

Part 1: Coordinated Sampling

Part 2: “RISC” monitoring

Page 8: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

8

High-level idea

Sampling algorithm not biased to large flows

Packet sampling has low flow coverage due to bias toward large flows

Routers sample independently Wasted measurements

Can’t reason about network-wide goals

Treat routers in the network as a system to be managed in a coordinated fashion!

Page 9: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

9

Part 1 Outline

• Motivation

• Design of cSamp (Coordinated Sampling)

• Evaluation

• Practical deployment

Page 10: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

10

Design

• Random flow sampling (single router)– Sample flows not packets

• Hash-based coordination (single path)– Efficient, non-redundant sampling– Coordination without explicit communication

• Network-wide optimization (whole network)– Satisfy network-wide constraints and objectives

Page 11: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

11

Design (single router)

• Random flow sampling– Sample flows not packets

Page 12: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

12

Flow sampling

1613111

Flow memory(flow, counter #pkts)

3

[3,10]Hash range

6

Sample flows, not packets, to increase flow coverage

1131611

Compute hash, log if in range

Version IHL TOS LengthIdentification Flags OffsetTTL Protocol ChecksumSource IP addressDestination IP address ……SourcePort DestinationPort

Hash Flowid [0,Max]

1131611

Pack

et h

eade

r

1

1

Page 13: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

13

Design (single path)

• Random flow sampling (single router)– Sample flows not packets

• Hash-based coordination – Efficient, non-redundant sampling– Coordination without explicit communication

Page 14: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

14

Hash-based coordination

Flow memory

1[1,4]

Hash range

3

Flow memory

8[7,9]

Hash range

Non-overlapping hash-ranges avoids redundant monitoringCoordination without communication

Stream:11816135

R1R2

4

1

1

Page 15: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

15

Design (whole network)

• Random flow sampling (single router)– Sample flows not packets

• Hash-based coordination (single path)– Efficient, non-redundant sampling– Coordination without explicit communication

• Network-wide optimization– Satisfy network-wide constraints and objectives

Page 16: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

16

Network-wide view

Many paths = Origin-Destination (OD) pairs in a networke.g., NYC-PIT, PIT-SFO

Moving from a single-path to network?

Page 17: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

17

Network-wide coordination

Assign non-overlapping ranges per OD-pair/path

[1,5]

[1,3]

[3,7]

[1,2]

[7,9]

[5,8]

Page 18: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

18

cSamp algorithm on each router

[5,10]

[1,4]

Sampling Manifest

1. Get OD-Pair from packet

3. Look up hash-range for OD-pair from sampling manifest 2. Compute hash (flow = packet 5-tuple)

4.Log if hash falls in range for this OD-pair

Red vs. Green?

Flow memory

2

2

1

OD Range

Page 19: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

19

Overall system architecture

[1,5]

[5,9]

[3,7]

[1,2]

[7,9]

[5,8]

Network Operations

Center

Generate sampling manifests

Applications

Configuration Dissemination

Flow reports

Page 20: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

20

Framework for generating manifests

Network-wide optimization

OD-pair infoTraffic, Path(routers)

Router constraintse.g., SRAM for flowrecords

Sampling manifests

{<OD-Pair,Hash-range>} per router

Objective:Max i ε ODPairs Coveragei Traffici

Subject to achieving maximum Mini ε ODPairs {Coveragei}

LinearProgram

Inputs

Output

Page 21: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

21

Part 1 Outline

• Motivation

• Design of cSamp (Coordinated Sampling)

• Evaluation

• Practical deployment

Page 22: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

22

cSamp vs. other sampling solutions

• Metrics reflect initial goals– Coverage, network-wide goals, redundancy

• Flow sampling– Fixed-rate and Maximal flow sampling– Use same memory (400K flow records)

• Packet sampling– 1-in-100 and 1-in-50 (edge)– Allow infinite memory

Page 23: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

23

Total flow coveragecSamp is 2-3X better than packet sampling, 30% over maximal flow sampling

Page 24: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

24

Minimum fractional coveragecSamp is significantly better than other solutions!

Maximal flow sampling is inadequate for network-wide objectives

Page 25: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

25

How do these solutions fare?

Requirements

Packet Sampling(NetFlow)

Flow Sampling cSamp

Respect resource constraints

High flow coverage Network-wide goals Minimize redundant measurements

Page 26: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

26

Part 1 Outline

• Motivation

• Design of cSamp (Coordinated Sampling)

• Evaluation

• Practical deployment

Page 27: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

27

Practical Issues

2. Is the optimization scalable?Need two improvements

(binary search + max-flow)

3. What about multi-path routing?Simple, lightweight extension

1. What about traffic dynamics?History + short-term adaptation

4. How do interior routers identify OD-pairs?Assume ingress routers mark packets

Page 28: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

28

Why we may want to avoid this ….

Extra overhead on ingress

OD-pair id might be ambiguous (multi-egress peers)

Need to modify packet headers or add shim header

May require overhaul of routing infrastructure

How do interior routers identify OD-pairs?Assume ingress routers mark packets

Page 29: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

29

Can we realize the benefits of cSamp withoutrequiring OD-pair identification?

Use local info. at router to make sampling decisions

“Stitch” coverage for a path across routers on that path

Page 30: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

30

R1 R3R2R1 R4R0

What local info can I get from

packet and routing table?

{Previous Hop, My Id, NextHop}

SamplingSpecGranularity at which

sampling decisions are made

How much traffic to sample for this SamplingSpec?

SamplingAtomDiscrete hash-

ranges, select some of them to log

Page 31: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

31

=

=

“Stitching” together coverage

union

union

R1

R2

R4R3

R5

R6

R7

Page 32: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

32

Problem Formulation

Coverage for path Pi

Load on router Rj

Maximize: Total flow coverage: i TiCi

Minimum fractional coverage: mini {Ci } Subject To:

j, Loadj Lj

Page 33: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

33

Maximize: Total flow coverage: i TiCi

Min. frac coverage: mini {Ci } Subject To:

j, Loadj Lj

Sorry .. NP-hard!Can’t even approximate

min without resource augmentation

Total flow coverage: Submodular maximization with partition-knapsack constraintsEfficient greedy algorithm with near-optimal performance

Min. fractional flow coverage: Intelligent augmentation much better than theoretical guaranteePartial/incremental deployment of adding OD-pair identifiers

Page 34: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

34

Total flow coverage

cSamp-T (tuple+) gives near-ideal total flow coverage vs. cSamp

Page 35: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

35

Minimum fractional coverage

With smart resource augmentation, cSamp-T gives good min. frac. coverage

Page 36: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

36

How do we meet the requirements?

High-fidelity for all applications

Respect resource constraints

High flow coverage

Provide network-wide goals

Low data mgmt overhead

Part 1: Coordinated Sampling

Part 2: “RISC” monitoring

Page 37: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

37

FSD

HeavyHitters

Entropy SuperSpreaders

ChangeDetection

Outdegreehistogram

PortAddr

PortAddr

SrcDst

What functionality should we put on

routers ?

Page 38: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

38

Current Research: Application-Specific!

FSD

PortAddr

Separate Counters & Estimation algorithms

Per App

HeavyHitters

Entropy SuperSpreaders

ChangeDetection

Outdegreehistogram

PortAddr Src

Dst

Why? Application-specific approaches provide higher fidelity

Traffic

Page 39: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

39

Alternative: “RISC”

Traffic

Why? Late-binding to applications, Easier to implement, “Future-proof”

FSD

HeavyHitters

Entropy SuperSpreaders

ChangeDetection

Outdegreehistogram

PortAddr

PortAddr

SrcDst

GenericData

CollectionDecouple Collection

and Computation

Page 40: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

40

RISC vs. Application-Specific

Revisit this perception that RISC does not provide good performance

Requirements

Application Specific

RISC 1.0NetFlow

RISC 2.0

Fidelity acrossapplications ?ImplementationComplexity ?Processing Overhead ?Changing Applications ?Enable Diagnostics ?

Page 41: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

41

Why this might make sense?

Each app-specific algorithm requires dedicated counters

Primary bottleneck for high-speed monitoring = SRAM counters

Look at aggregate memory usage across applications

Pool in these resources into a few sampling primitivesRun these with sufficient fidelity!

Page 42: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

42

Challenges

What RISC primitives should we implement?

Does it perform comparably to application-specific approaches?

Combination of flow sampling, sample and hold, cSamp

Yes! RISC with aggregate resources is comparable or even better

Page 43: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

43

Challenges

What RISC primitives should we implement?

Does it perform comparably to application-specific approaches?

Combination of flow sampling, sample and hold, cSamp

Yes! RISC with aggregate resources is comparable or even better

Page 44: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

44

RequirementsRISC 2.0

Fidelity acrossapplications ?ImplementationComplexity ?Processing Overhead ?Changing Applications ?Enable Diagnostics ?

Two broad classes“Structure”

Flow Sampling“Volume”

Sample and Hold

Provide flow reports like

NetFlow

Coordination Network-wideOptimization

What RISC primitives should we implement?

Page 45: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

45

Sample and Hold

1613111

Flow memory(flow, counter #pkts)

1

6

Accurate counts of “heavy hitters” with few counters

1131611

AlgorithmIf flow is already logged updateSample packet with probability p

If new flow create counter

1131611

1

1

234

Page 46: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

46

Putting the pieces together

Page 47: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

47

Challenges

What RISC primitives should we implement?

Does it perform comparably to application-specific approaches?

Combination of flow sampling, sample and hold, cSamp

Yes! RISC with aggregate resources is comparable or even better

Page 48: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

48

FSD

HeavyHitters

Entropy SuperSpreaders

ChangeDetection

Outdegreehistogram

PortAddr

PortAddr

SrcDst

FlowSamp + Sample & Hold

Calculate aggregate memory usage

Compute “Relative Accuracy

Difference” + good

- bad

Page 49: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

49

Sensitivity to Application Portfolio

Bigger app. portfolio or Some resource intensive apps Better gains for RISC approach

“Relative Accuracy

Difference” + good

- bad

Bigger portfolio More resources

Page 50: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

50

Evaluation: Single Router

RISC > Application-specific for most applications

Worse for heavyhitter, but not by much!

“Relative Accuracy

Difference” + good

- bad

Page 51: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

51

Network-wide evaluation: Per-Ingress

Results for network-wide per ingress mirror single router evaluation

RISC > Application-specific for most applications

“Relative Accuracy

Difference” + good

- bad

Page 52: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

52

RISC vs. Application-Specific

Requirements

Application Specific

RISC 1.0NetFlow

RISC 2.0FS, SH, cSamp

Fidelity acrossapplications ImplementationComplexity Processing Overhead Changing Applications Enable Diagnostics

Yes, We Can!

Page 53: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

53

Meeting requirements for Flow Monitoring

High-fidelity for all applications

Respect resource constraints

High flow coverage

Provide network-wide goals

Low data mgmt overhead

CoordinatedSamplingWith “RISC” primitives

Page 54: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

54

Looking beyond flow monitoring

• WAN optimization, Content-Aware Networking“Intelligent and targeted caching of traffic at selected nodes of the network can help engineer traffic by reducing load and avoiding hot spots, cut end-to-end latency, and enhance the end-user experience.”

– SmartRE (with Ashok Anand, Aditya Akella)

• Intrusion Detection/Prevention for Enterprises (ongoing work .. )

Page 55: Rethinking NetFlow: A Case for a Coordinated “RISC” Architecture for Flow Monitoring Vyas Sekar Joint work with Mike Reiter, Hui Zhang David Andersen,

55

Thanks!www.cs.cmu.edu/~vyass

www.cs.cmu.edu/[email protected]