radia perlman: principles - tu berlinstefan/npa10_january-19.pdf · stefan schmid 1 radia perlman:...

Stefan Schmid 1

Radia Perlman: Principles

Radia Perlman: „mother of the Internet“Contributions to spanning tree protocol (for networkbridges)PhD @ MIT, now with Sun Microsystems

Entertaining summary of the memo, have a look:http://www.usenix.org/event/usenix01/invitedtalks/perlman.pdf

http://www.usenix.org/event/usenix01/invitedtalks/perlman.pdf

Stefan Schmid 2

Radia Perlman’s Folklore of protocol design

Collect various tricks and ''gotchas'' (wörtlich: “erwischt!”) in protocol design

'‘Here are several ways to solve problem X'', with technical explanation of pros/cons

Some “real world” examples

We’ll cover most, not all,“tricks and gotchas”

Stefan Schmid 3

Simplicity vs. flexibility vs. optimality??

Is a more complex protocol reasonable?Is “optimal” important?(Approximation enough, e.g., dynamic anyway?)KISS: “The simpler the protocol, the more likely it is to be successfully implemented and deployed.”

Why are protocols overly complex?Design by committee (different stakeholders)Backward compatibilityFlexibility: Heavyweight swiss army knifeUnreasonable striving for optimalityUnderspecificationExotic/unneeded features

“Making the simple complicated iscommonplace; making the complicated simple, awesomely simple, that’s creativity!”

Charles Mingus

Stefan Schmid 4

Know the problem you are trying to solve:Have at least one well-defined problem in mindThen: solve other problems without complicating solution?

Make it Well-defined and Scalable!

Think about scalingThink about what happens if you’re successful: protocol is used by millions (prevent “success desaster”)Think also about the other extreme: Does the protocol make sense in small situations as well?

Stefan Schmid 5

Operation above capacity

Protocol should degrade gracefully in overload, at least detect overload and complain

How does protocol break and die?

Can’t just die under overload…!

Think about Overload/Failures!

Stefan Schmid 6

Example: How to design identifiers?

Identifiers: Protocols often contain a field identifying something, e.g., the protocol type.

Two approaches: global or hierarchicalHighly encoded universal numbers: E.g., upper layer protocol # assigned by IANA:compact and interoperational, but central adminGeneral purpose object identifiers, as in ASN.1 (Abstract Syntax Notation One): hierarchical structure: not compact (memory, BW, CPU, …), name clashes, but “federalistic”

e.g., “Next Header” field in IPv5: 1=ICMP, 9=IGP, etc.

Stefan Schmid 7

SNMP namingQuestion: How to name every possible standard object

(protocol, data, more..) in every possible network standard??

Answer: ISO Object Identifier tree:Hierarchical naming of all objectsEach branch point has name, number

1.3.6.1.2.1.7.1ISO (0=ITU)

ISO-ident. Org.US DoD

Internet

udpInDatagramsUDPMIB2management

Simple network management protocol(central protocol to monitor networkelements)

e.g., also used for X.509 public keycertificates (objects therein...)

Stefan Schmid 8

Check out www.alvestrand.no/harald/objectid/top.html

OSI Object Identifier Tree

Stefan Schmid 9

Assigned Internet Protocol numbersFrom RFC 1700:

Decimal Keyword Protocol References------- ------- -------- ----------

0 Reserved [JBP]1 ICMP Internet Control Message [RFC792,JBP]2 IGMP Internet Group Management [RFC1112,JBP]3 GGP Gateway-to-Gateway [RFC823,MB]4 IP IP in IP (encasulation) [JBP]5 ST Stream [RFC1190,IEN119,JWF]6 TCP Transmission Control [RFC793,JBP]7 UCL UCL [PK]8 EGP Exterior Gateway Protocol [RFC888,DLM1]9 IGP any private interior gateway [JBP]

10 BBN-RCC-MON BBN RCC Monitoring [SGC]11 NVP-II Network Voice Protocol [RFC741,SC3]12 PUP PUP [PUP,XEROX]13 ARGUS ARGUS [RWS4]14 EMCON EMCON [BN7]15 XNET Cross Net Debugger [IEN158,JFH2]

Stefan Schmid 10

Optimize for common caseSeen this before…Nice example: IPV6 payload (packet) length field

Example: Design for Common Case

Payload length: only 2 bytesIf packet longer: payload length = 0, but 4 byte length field found in IP optionsDesigners chose against 4-byte

header to optimize common case: 2 bytes are typically enough

Of course, if not alternative avialable, better overestimate than underestimate! (e.g., IP packet identifier is arguably too small)

Stefan Schmid 11

Forward compatibilityThink about future changes, evolutionMake fields large enoughReserve some spare bitsSpecify an options fieldthat can be used/augmented later (see IP length discussion before!)

Compatibility & Use of Parameters

Parameters: yes or no?Protocol parameters can be useful?

Designers can’t determine reasonable valuesTradeoffs exist: Leave parameter choice to users

Parameters can be bad?Users (often not well informed) will need to choose valuesTry to make values plug-and-play!

Stefan Schmid 12

Making systems “robust”: Many forms of robustnessImmediately adapt to failure/changeSelf-stabilization: Eventually adapt to failure/change(example: self-stabilizing peer-to-peer overlays like SKIP+: regaining logarithmic degree and diameter from any initially connected overlay network!)Byzantine robustness: Will work in spite of malicious usersMaybe better to crash than degrade when problems occur: signal that problem existsTechniques for limited spread of figures

Robustness: Notions

A Polylogarithmic Time Algorithm for Distributed Self-Stabilizing Skip Graphs, PODC 2009.

Stefan Schmid 13

Missing folklore/advice?

Stefan Schmid 14

Summary: Implementation principles Identify, study principles that can guide implementation of network protocols

Common principles among many protocols“Folklore” of protocol design

Synthesis: Big pictureArchitecture and implementation:

Both more art than science

Stefan Schmid 15

Where we are now…

Goals:Identify, study common architectural components, protocol mechanismsSynthesis: big pictureDepth: important topics not covered in introductory courses

Overview:SignalingStateMultiplexing/ResourceAllocationRandomizationIndirectionService locationNetwork virtualization

Stefan Schmid 16

Randomization

Randomization used in many protocolsE.g., to?

break symmetries (e.g., among symmetric elements)desynchronize (e.g., when only one answer is needed) „avoid worst-cases“... or just make protocol simpler!!

we’ll study examples:Shared medium/bus access:Ethernet multiple access protocolrouter (de)synchronization switch scheduling

Stefan Schmid 17

Ethernet

Metcalfe’s Ethernetsketch

Single shared broadcast channel 2+ simultaneous transmissions by nodes: interference

only one node can send successfully at a time multiple access protocol: distributed algorithm that determines how nodes share channel, i.e., determine when node can transmitInspired by the ALOHANet of Hawaii (first radio network, connectingHawaian islands...), quite efficient (close to 100% at low utilization)Initially star topology connected by hub, nowadays switch in center...

TAP:

“vampire tap”, “T-Stück”, …

(“Spannungsmessung”

etc.)Transceiver:

Transmitter

and Receiver

Stefan Schmid 18

Deterministic Algorithms

How to share the medium using deterministicalgorithms...?

Time Division Multiplexing ?But how to organize? What if someone has nothing to send? Whatif additional hosts are added and removed? Etc.

Polling?Virtual Ring?Etc.

Randomized often simpler and more efficient!

Stefan Schmid 19

Ethernet: uses CSMA/CD

A: sense channel (“CS”), if idle then {

transmit and monitor the channel;// “asynchronous protocol”!

If

detect another transmission (“CD”)then

{ abort and send jam signal; update # collisions; delay as required by exponential backoff algorithm; goto A}

else

{done with the frame; set collisions to zero}}

else {wait until ongoing transmission is over and goto A}

Carrier Sense Multiple Access / Collision Detection

Stefan Schmid 20

Ethernet’s CSMA/CD: Jam Signal

Jam Signal: make sure all other transmitters are aware of collision (48 bits)

Why?:A starts to send, at shortly before signal reaches B, B starts to send:

B immediately notices collision and stops; but to makesure A notices the collision too and will also stop thetransmission, a higher power signal is neededEtnernet limits spatial extension... (notice before finished!)

A B

Stefan Schmid 21

Ethernet’s CSMA/CD: Backoff

Exponential Backoff Algorithm:first collision for given packet: choose K randomly from {0,1}; delay is K x 512 bit transmission timesafter second collision: choose K randomly from {0,1,2,3}, {0,1,2,3,4,5,6,7}, etc.after ten or more collisions, choose K randomly from {0,1,2,3,4,…,1023} (limited scale!)

Stefan Schmid 22

Ethernet’s use of randomization

Resulting behavior: probability of retransmission attempt (equivalently: length of randomization interval) adapted to current load

simple, load-adaptive, multiple access!

morecollisions

heavierload (most likely), more

nodes trying to send

randomizeretransmissionsover longer time

interval, to reduce collision

probability

Stefan Schmid 23

Ethernet Comments

Upper bounding at 1023 = K limits max network size!Max spatial extension of Ethernet makes sure sender withlowest K value has a good chance to successfully send entire packet before next collisionCould remember last value of K when we were successfull... rather, new packet is tried with minimal backoff again! (Analogy: TCP remembers last values of congestion window size)Q: why use binary backoff rather than something more sophisticated such as AIMD: simplicity

Stefan Schmid 24

The bottom line

Why does Ethernet use randomization?

E.g., to desynchronize:

A distributed (=“each host runs the protocol independently”) adaptive algorithm to spread out load over time when there is contention for multiple access channel

Stefan Schmid 25

Efficiency of Ethernet?

Approximation formulas, e.g.

Eff = 1/(1+5*prop/dur)

whereprop = max propagation time between two adaptersdur = time to transmit packet of max size

Intuition:If prop is very small, transmissions are stopped immediately

when colliding, so efficient!If dur is very large, channel is used for a long time without

collisions, which is efficient again.

Stefan Schmid 26

Excursion: Medium Access on Wireless Networks: What changes…?

Typical wireless networks…:are not full-duplex (just one channel...)nodes cannot sense the medium during owntransmissions (just one antenna...)no bounded propagation domainare multihop (hidden and exposed terminal problems):

A B C

Hidden terminal: C does not notice that B is currently receiving transmissions fromA also => no „remote carrier sense“

A B C

Exposed terminal: B sends A and C wantsto send to someone on the right: it waitsbecause it hears B, but B would notreach the recipient of C, so actually C could send! => inefficient

Stefan Schmid 27

Excursion: Medium Access on Wireless Networks?

Therefore, CD is often replaced by (best effort) Collision Avoidance (CA)

Side note: still ongoing research, e.g., there are randomized distributed medium access protocols which optimally coordinate medium access probabilities and exploit the unpredictable non- jammed (e.g., due to external inteference) time periods (e.g., the Jade protocol).

A Jamming-Resistant MAC Protocol for Multi- Hop Wireless Networks, DISC 2010.

Stefan Schmid 28

Randomization to avoid synchronization!

Phenomenon: many apparently independent processes synchronizeover timeClassic example: 17th century (Huygens)

Two pendulums synchronize if attached to same wall!Try putting two metronomes on the same floor...Similar phenomena: blinking of fireflies, road traffic and car kinetics(one car reduces speed: collective decrease in flow), TCP windowincrease/decrease cycles in presence of shared bottleneck gateway, client/server scenarios where server is busy, etc.

Stefan Schmid 29

Youtube!

http://www.youtube.com/watch?v=tlYIyKic3w8

http://www.youtube.com/watch?v=tlYIyKic3w8

Stefan Schmid 30

Fireflies... („Glühwürmchen“)

Stefan Schmid 31

Routing messages can get synchronized over time!Emergent phenomenon: no synchronization up to a certain scale, and then fully synchronized!Can result in long delays...Randomization can help, but quite a lot is needed!

Stefan Schmid 32

(de)Synchronization of periodic routing updates

Periodic losses observed in end-end Internet traffic at 90 sec intervals

Ping messages to Harvard and MIT (1-sec intervals)Round trip times in figure: losses shown as negative RTT

Why?

IGRP routing updates: routers could not forward other packets while large routingupdates were processed; similar phenomena with RIP...Found paths with 318sec/45sec/15sec spikes...

Stefan Schmid 33

Router UpdatesA simplified model (for EGP, IGRP, RIP, etc.):

Routers transmit routing messages at periodic intervals (ensuresconsistent tables even after losses)

1. A router prepares and sends its routing message. In the absence of incoming routing messages, a router resets its timer Tc (= time to process an outgoing or incoming message) seconds after Step 1. begins. Other nodes receive this router‘s message after Td seconds.

2. If a router receives an incoming routing message while preparing its own outgoing routing message, it also processes the incoming routing message, which takes time another Tc seconds.After Steps 1.+2., a router sets its timer, it expires after {Tp-Tr, Tp+Tr} time somewhere, where Tr describes the randomfluctuation (e.g., OS overhead). When it expires, it goes back to Step 1.If a router receives a message after the timer has been set, therouting message is processed immediately. If it is a triggeredupdate (e.g., link failure), we go directly to Step 1. without waitingfor timer to expire.

Stefan Schmid 34

Router Update Operation:

prepareown routing

update(time: TC)

receive update from neighborprocess (time: TC)

wait

receive update from neighborprocess

<ready>send update (time: Td

to arrive at dest)start_timer (uniform: Tp

+/-

Tr)

timeout, or link fail

update

time spent in statedepends on msgs

received from others(weak coupling between routers

processing)

Stefan Schmid 35

Router SynchronizationSimulation: 20 routers broadcasting updates to each otherx-axis: time until routing update sent relative to start of roundBy t=100,000 all router rounds are of length 120 and synchronized! Yields long delays... (20*Tc instead of Tc seconds!)synchronization or lack thereof depends on system parameters… (e.g., crucially on network size according to the paper)Often a robust trend to oraway from synchronization...

Stefan Schmid 36

Details

Blowup of previous graph

Note expansion of computation phase

→ increased period

A‘s timer expires, begins to send message butbefore finishing, B‘s timer expires, A needs to process this also before resetting ist timer, so this takes time 2*Tc: A and B are synrhonizedand become a cluster... (for Td=0)

Desynchronization due tosome random event...

short

interval

long

interval

Stefan Schmid 37

Sync

Coupled routersExample of spontaneous synchronization

firefliessleep cycleheart beatetc.Steven Strogatz . Sync, Hyperion Books, 2003.

Stefan Schmid 38

Avoiding Synchronization?

Enforce max time spent in prepare stateMake thingsindependent of externalevents (e.g., spec of RIP)?Problem: If initiallysync, never desync...Choose random timer component, Tr large

prepareown routing

update(time: TC)

receive update from neighborprocess (time: TC)

wait

receive update from neighborprocess

<ready>send update (time: Td

to arrive)start_timer (uniform: Tp

+/-

Tr)

Stefan Schmid 39

Router (de)synchronization

One use of randomization:

Desynchronization of routers!

Our model was simplistic: ignores collisions, Ethernet retransmissions, etc.

Stefan Schmid 40

Randomization in Reliable Multicast

Reliable Multicast: how to transfer data “reliably” from source(s) to R receivers.

”Like in real life”: all current RM error and congestion control approaches have an analogy in human-human communication

Stefan Schmid 41

Scalability: Feedback Implosion

. . .

AC

K

ACK

ACK

ACK

ACK

ACK ACK

senderrcvrs

If all receivers ACK immediately upon reception,the sender has to process a large number of messages!

Smart and scalable reliable MC?

Stefan Schmid 42

Reliable Mcast

Thus, we can distinguish between two main types of multicasts:

Sender-oriented multicast: how to implement? Pro and con?

Receiver-oriented multicast: how to implement? Pro and con?

What is better? Two sampleimplementations next...

Stefan Schmid 43

Sender Oriented Reliable Mcast

Sender:mcasts all (re)transmissionsselective repeat if loss (only lost packet, but Mcast to all again)timers for loss detection(positive) ACK table: for each packet list of who ACKed alreadypkt removed when ACKs are in

Rcvr: ACKs received pktsNote: group membership

important (sender needs to know…)

X

sender

receivers

ACK ACK

AC

KACK ACK

How to do it reliably with less burden at server?!

burden: ACK lists, timer, ACK implosion, ...

Without ACKs?!

Stefan Schmid 44

(simple) Rcvr Oriented Reliable Mcast

Sender:mcasts (re)transmissionsselective repeat (but to all)responds to NAKsProblem: when stop buffering pkt?(sender does not know who is there and interested!)

Rcvr:NAKs (unicast to sender) missing pkts (e.g., gap in seq numbers)timer to detect lost retransmission

Note: easy to allow joins/leaves: no list at sender

X

sender

receivers

NA

K

Stefan Schmid 45

Receiver- vs Sender-oriented RM: Observations? (Dis)Advantages?

Rcvr-oriented: shift recovery burden to rcvrsloss detection “responsibility”, timersscaling: protocol computational resources grow as R (# receivers) grows (“receivers scale, sender does not!”) weaker notion of “group” (no explicit lists at server…)also cool: receivers can transparently choose their own, individual reliability semantics!

but ……when does sender “release” data rcvd by all?heartbeat needed to detect lost last pkt (receivers won’t notice a lost last packet, no gap in seqnumbers…)

Stefan Schmid 46

Evaluation of Approaches

Let’s examine resource requirements!processing requirements

expected time to process pkt• at sender: X, E[X]• at rcvr: Y, E[Y]

mean value approach

network requirements

Stefan Schmid 47

Processing in Sender-Initiated Protocol

For Mcast, sender must:Obtain data from higher layers (app)Construct packetSet timerProcess every ACK for each packet and receiverTimer interrupts and context switches...If error: rebroadcast, set timer again...

Stefan Schmid 48

Assumptions for Analysisone sender, R receivers

computational load matters!independent errors (not true for spanning tree propagation!), p per rcvrlossless signaling (okay: short ACKs less likely to get lost, and sometimes get a “better service”)

M - total number of transmissions per packet:

( ) K,1,1][ =−=≤ mpmMP Rm

( )∑∞

=

−−=1

11][m

RmpME

Prob that

none

of the

m transmissions

arrives

at a given

rcvr: pm, so it

works

with

prob 1-pm; prob that

all rcvrs

work: product...

E[M]= ∑

m·

P[M=m]: counting

multiple times

with

P[M>m]!

Stefan Schmid 49

Analysis…

E.g.:

Stefan Schmid 50

Sender vs Receiver: SimulationMetric - rcvr oriented thruput/sender oriented thruput

- sender is bottleneck (s. paper), so sender throughput = overall system throughput- much better throughput in receiver-oriented MC- especially for many receivers (scales better) and low error probability (hardly any

NAKs…)- in many-to-many multicasts less…

0

20

40

60

80

100

120

140

160

0 100 200 300 400 500 600 700 800 900 1000No. Receivers

p=0.01

p=0.05

p=0.10

p=0.25

One-to-Many Comparison

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

0 100 200 300 400 500 600 700 800 900 1000No. Receivers

p=0.01

p=0.05

p=0.10

p=0.25

Many-to-Many Comparison

RM: Coping with Scale, Heterogeity

Issues:avoid feedback implosion in reverse pathavoid receiving unneeded data (retrans.) in forward pathrecover data quickly, avoid long repair times

Techniques:•

feedback suppression

•

local recovery: “local retransmission”

How to do even better?

Stefan Schmid 52

Feedback Suppressionrandomly delay NAKs

“listen” to NAKs generated by othersif no NAK for lost pkt when timer expires, multicast NAK

widely used in RM tradeoffs

reduces bandwidth, especially with correlated errors (e.g., along spaning tree)but: additional complexity at receivers (timers, etc), maybe higher delay…

sender

X X

NAK

Stefan Schmid 53

Feedback Suppression: Performance GainsMetric - suppression thruput/no suppression thruput

- If high errors helps more and scales almost linearly in number of receivers!

- gains/loss depends on whether 1-many or many-many (receivers are also senders, additional complexity now matters, etc.)

0

5

10

15

20

25

0 100 200 300 400 500 600 700 800 9001000No. Receivers

p=0.01

p=0.05

p=0.10

p=0.25

One-to-Many Comparison

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 100 200 300 400 500 600 700 800 9001000No. Receivers

p=0.01

p=0.05

p=0.10

p=0.25

Many-to-Many Comparison

Local Recovery in SRM

Allow rcvr to recover lost pkt from “nearby” rcvr

“ask your neighbor”: send localized NAK (repair request)multicast: randomize local repair transmission time to avoid too many replies

orthogonal(complementary) to feedback suppression who to recover from?

don’t want repair request to go to everyonescoping: how to restrict how far request will travel: IP time-to-live field

Another idea: fix locally!

Stefan Schmid 55

Local Recovery: Example

R2 detects lost pktmulticasts repair requestlimited scope

not seen by R4

R1 and R3 have pktR3 times out first and sends repair

R4R4

R3R3

R2R2R1R1

NAK

repair

Stefan Schmid 56

Reliable multicast (SRM)

Use of randomizationavoid synchronizing all repliesto reduce feedback implosionin local recovery, to reduce number of retransmissions of same messagecould scale the randomization interval to be load-adaptive…

Stefan Schmid 57

Sidenote: Multicast vs N Unicasts

Multicast “group concept” preferable (e.g., IP multicast, indirection) to N unicasts (e.g., N TCP connections):

no redundant transmissions over a link

Challenges for (reliable) multicast:„fate-sharing“ in unicast clear: either sender or receiver mustdetect and recover from errors (e.g., in TCP: sender); but in multicast receivers can come and go any time? smart round trip time estimate with heterogeneous recievers?? (in unicast clear...)congestion window size?

=> receiver-based often better... (e.g., IP multicast, RSVP)

Stefan Schmid 58

To be continued next week…

radia perlman: principles - tu berlinstefan/npa10_january-19.pdf · stefan schmid 1 radia perlman:...

Documents