networking seminar stanford universitynetseminar.stanford.edu/seminars/03_12_15.pdf · 3/12/2015...

Post on 27-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Madan Jampani3/12/2015

Networking SeminarStanford University

Can SDN control plane scale without sacrificing

abstractions and performance?

Control Plane

Data Plane

Simple yet powerful abstraction

Global Network View

Scaling strong consistency

Managing Complexity

What is the simplest controller?

Observe Program

Abstractions Performance Correctness Availability Scale

SingleInstance

How to improve availability?

Primary Standby

How to improve availability?

Standby Primary

How to improve availability?

Abstractions Performance Correctness Availability Scale

SingleInstance

Primary/Standby

Primary Standby

Why you might need to scale?

Primary Standby

Why you might need to scale?

Primary Standby

Why you might need to scale?

Primary Standby

Why you might need to scale?

Fully Distributed Control Plane

Abstractions Performance Correctness Availability Scale

SingleInstance

Primary/Standby

Multi-Instance

Abstractions Performance Correctness Availability Scale

SingleInstance

Primary/Standby

MultiInstance

“Tell me about your slice?”

Abstractions Performance Correctness Availability Scale

SingleInstance

Primary/Standby

MultiInstance

Cache

“Tell me about your slice?”

Abstractions Performance Correctness Availability Scale

SingleInstance

Primary/Standby

MultiInstance

Can we build a complete solution?

Let’s look at prior art...

TopologyEvents

DistributedTopology

Store

WriteTopology

State

DistributedTopology

Store

ReadTopology

State

Cache

DistributedTopology

Store

ReadTopology

State

Cache

DistributedTopology

Store

ReadTopology

State

Can we build a solution that meets all criteria?

Topology as a State Machine

Events are Switch/Port/Link up/down

Current Topology

Updated Topology

apply event

E

E E

E

Pros

● Simple● Fast

Cons

● Dropped messages● Reordered messages

E

F

F E

We have a single writer problem

C1C2C1

1 2 3

Switch Mastership Terms

We track this in a stronglyconsistent store

2.1 2.2 2.3 2.4 2.5 2.61.41.2 1.31.1 3.2 3.33.12.7 2.8

C1C2C1

1 2 3

Switch Event Numbers

Partial Ordering of Topology Events

Each event has a unique logical timestamp

(Switch ID, Term Number, Event Number)

E (S1, 1, 2)

E E

E

(S1, 1, 2)

(S1, 1, 2)

(S1, 1, 2)

F(S1, 2, 1)

F

F

(S1, 2, 1)

(S1, 2, 1)

F

F

(S1, 2, 1)

(S1, 2, 1) (S1, 1, 2)E

F

F

(S1, 2, 1)

(S1, 2, 1) (S1, 1, 2)E

To summarize

Each instance has a full copy of network topology

Events are timestamped on arrival and broadcasted

Stale events are dropped on receipt

There is one additional step...

What about dropped messages?

Did you hear about the port

that went offline?

Anti-Entropy

Lightweight, Gossip style, peer-to-peer

Quickly bootstraps newly joined nodes

Abstractions Performance Correctness Availability Scale

SingleInstance

Primary/Standby

MultiInstance

A model for state tracking

If you are the observer, Eventual Consistency is the best option

View should be consistent with the network, not with other views

Hiding the complexity

EventuallyConsistentMap<K, V, T>

Plugin your own timestamp generation

What about other Control Plane state...

● Switch to Controller mapping● Resource Allocations● Flows● Various application generated state

What about other Control Plane state...

● Switch to Controller mapping● Resource Allocations● Flows● Various application generated state

In all case strong consistency is either required or highly desirable

Consistency => Coordination

2PC

All participants need to be available

Consensus

Only a majority need to participate

State Consistency through Consensus

LOG LOG

LOG

State Consistency through Consensus

LOG

LOG LOG LOG LOG LOG LOG

Scaling Consistency

Scaling Consistency

Scaling Consistency

Consistency Cost

● Atomic updates within a shard are “cheap”● Atomic updates spanning shards use 2PC

Hiding Complexity

ConsistentMap<K, V>

To Conclude

● SDN at scale is possible if we can take care of state management

● Abstractions are important

Thank you

top related