networks and operating systems chapter 9 – link layer · networks and operating systems chapter 9...

Networks and Operating Systems Chapter 9 – Link Layer

(252-0062-00)

Donald Kossmann & Torsten Höfler

Frühjahrssemester 2013

© Systems Group | Department of Computer Science | ETH Zürich

Last time

• Network layer – Services, addressing, structure

• Routing protocols – Link-state protocols (i.e., OSPF) – Distance-vector protocols (i.e., BGP)

2

Link Layer: setting the context

• Two or more physically connected devices – host-router, router-router, host-host

• unit of data is a frame




4

Application Transport Network

Link Physical

Data

Data HT

Data HT HN

Data HT HN HL

Network Link

Physical Data HT HN HL

Physical link Frame




5


Link Physical

Data

Data HT

Data HT HN

Data HT HN HL

Network Link


Physical link

Data Link Protocol

Frame




6


Link Physical

Data

Data HT

Data HT HN

Data HT HN HL

Network Link


Physical link

Data Link Protocol

Network interfaces

Frame

Link Layer Services

• Basic problem: you can’t just send IP datagrams over the link!

• Encoding: How to represent the bits (optically, electrically, …) • Framing: Where do the datagrams start and end?

– Byte-level framing – Bit-level framing – Clock-based framing

• Error Detection: How to know if the datagram is corrupted? – Checksums, CRCs – Error Correction: Can we fix an error?

• (Flow Control: Can we adjust the rate properly?) • Media Access: What if 3 or more machines share a link?

7

Aside: The End-to-End Argument

“The function in question can completely and correctly be implemented only with the knowledge and help of the

application standing at the end points of the communication system.

Therefore, providing that questioned function as a feature of the communication system itself is not possible.

(Sometimes an incomplete version of the function provided by the communication system may be useful as a performance

enhancement.)”

“End-to-end arguments in system design”, J.H.Saltzer, D.P. Reed and D.D. Clark.

ACM Transactions in Computer Systems vol.2, no.4, November, 1984, pages 277-288.

8

Encoding

How to represent the bits?

Encoding: The Problem

• Suppose you can send high and low discrete signals • How to send (represent) bits? • First Option: Non-Return to Zero (NRZ) encoding

• Problem: long runs of zeros or ones – Baseline wander – No clock recovery

10

0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0

Encoding: the problem

• Suppose you can send high and low discrete signals • How to send (represent) bits? • First Option: Non-Return to Zero (NRZ) encoding

• Problem: long runs of zeros or ones – Baseline wander – No clock recovery

11

0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0

Alternative: Non Return to Zero Inverted (NRZI)

• Signal transitions for a “1”, stays for a “0” – Solves half the problem! – Long runs of “0” still an issue

• But still useful – in two slides’ time…

12

0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0

Manchester encoding

• Encode “1” as hi→lo transition, “0” as lo→hi • + Each bit has a transition: no loss of sync • - Requires double the frequency (baud rate)

– N.B. Baud rate is different from bit rate!

13

0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0

14

4B/5B encoding • Key idea: break up long sequences

by inserting bits • Encode every 4 symbols as 5 bits

– 0 or 1 leading zero – <= 2 trailing zeroes

• Send using NRZI – 80% efficiency, vs 50% for

Manchester • Can use other symbols for control

information – 11111 idle – 00000 dead – 00100 halt

4-bit symbol 5-bit code

0000 11110

0001 01001

0010 10100

0011 10101

0100 01010

0101 01011

0110 01110

0111 01111

1000 10010

1001 10011

1010 10110

1011 10111

1100 11010

1101 11011

1110 11100

1111 11101

Framing

How to tell where the messages start and end?

Framing: Where do frames start and end?

16

Example: suppose 1st byte gives length of frame

Without errors: 5 1 2 3 4 5 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3

5-chararacter frame

5-chararacter frame

8-chararacter frame

8-chararacter frame

Length byte

(Example from Tanenbaum)


17


Without errors: 5 1 2 3 4 5 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3

5-chararacter frame

5-chararacter frame

8-chararacter frame

8-chararacter frame

Length byte

5 1 2 3 4 7 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3

5-chararacter frame

Length byte Error!


Single error:


18


Without errors: 5 1 2 3 4 5 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3

5-chararacter frame

5-chararacter frame

8-chararacter frame

8-chararacter frame

Length byte

5 1 2 3 4 7 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3

5-chararacter frame

7-chararacter frame

(wrong!)

Length byte Error!

Should not be a character count!


Single error:

Point to Point Protocol (PPP) Example of byte-level framing

• Byte-oriented protocols: view each frame as a collection of bytes (characters)

• Scenario: Point-to-point data link – dialup link, ISDN line, serial cable, ADSL, TCP tunnel, etc.

• Bidirectional unicast link: – only two machines

• No sharing: – no Media Access Control – no explicit MAC addressing

19

PPP Design Requirements [RFC 1557]

• packet framing – encapsulation of network-layer datagram in data link frame – carry network layer data of any network layer protocol (not just

IP) at same time – ability to demultiplex received frames to the network-layer

• bit transparency: must carry any bit pattern in the data field • error detection (no correction) • connection liveness: detect, signal link failure to network

layer • network layer address negotiation: endpoints can

learn/configure each other’s network addresses

• No error correction/recovery, flow control, in-order delivery – all relegated to higher layers

20

PPP Data Frame

• Flag: delimiter (framing) • Address: does nothing (only one option) • Control: does nothing; in future possible multiple control fields • Protocol: upper layer protocol to which frame delivered

(e.g. PPP-LCP, IP, IPCP, etc.) • info: upper layer data being carried • check: cyclic redundancy check for error detection

21

01111110 11111111 00000011 protocol payload check 01111110

1 byte 1 byte 1 byte 1-2 bytes variable 2-4 bytes 1 byte

flag address control flag

Byte stuffing in PPP

• Problem: need for “data transparency”: – What if payload contains <01111110>? – If receiver sees <01111110>, is it data or flag?

• Solution: – Sender: adds (“stuffs”) extra flag after every flag byte in the payload

– Receiver: • replaces 2 successive flags with a single flag payload byte • Interprets single flag byte as a real flag

22

Byte stuffing in PPP

• Problem: need for “data transparency”: – What if payload contains <01111110>? – If receiver sees <01111110>, is it data or flag?

• Solution: – Sender: adds (“stuffs”) extra flag after every flag byte in the payload

– Receiver: • replaces 2 successive flags with a single flag payload byte • Interprets single flag byte as a real flag

23

payload flag payload flag flag

payload flag payload flag flag flag

Alternative byte stuffing: Escape bytes

24

A flag B A esc flag B


25

A flag B

A esc B

A esc flag

A esc esc

B

B


26

A flag B

A esc B

A esc flag B

A esc flag

A esc esc

A esc esc esc

B

B

flag B


27

A flag B

A esc B

A esc flag

A esc esc

B

B

A esc flag

A esc esc

A esc esc

A esc esc

esc

esc

B

B

flag B

esc B

High-level Data Link Control (HDLC) Example of bit-level framing

• Superficially similar to PPP • Checksum for error detection • Bit-oriented protocol ⇒ bitstuffing for framing

– Sender inserts a “0” after 5 consecutive “1”s (except in flag) – Receiver interprets 0111111 as either flag or error.

011011111111111111110010

↕ 011011111011111011111010010

28

01111110 address control payload check 01111110

8 bits 8 bits 8 bits variable 16 bits 8 bits

HDLC (contd)

• Large family of similar protocols – SDLC, HDLC, LAP, LAPB, etc.

• 3 types of frames: – Information (most frames, control information piggybacked) – Supervisory (flow and error control) – Unnumbered (all kinds of misc. things)

• Sliding window protocol, 3-bit sequence number – ⇒ 7 unacknowledged outstanding frames – Piggybacked acknowledgements (‘next’ field)

• “Poll/Final” bit: for multiple terminals – “P”: used for polling terminals / sending data from one – “F”: indicates final frame from polled terminal

29

Synchronous Optical NETwork Example of clock-based framing

• The dominant standard for long-distance data transmission over optical networks

• No bit- or byte-stuffing ⇒ frames are all the same size: 125µs

• STS-1 ~ 51.84Mbps, 810-byte frames • STS-192 ~ 10Gbps, 155,520-byte frames

• Framing is clock based: – Flag word: first 16 bits of each frame – Receiver looks for regular flag every 125µs

• Encoding: NRZ but scrambled – XOR-ed with 127-bit pattern

• Lots of other complexity! – e.g. 64bytes reserved for voice channel, etc.

30

Error Detection

How to tell if bits have been lost or changed?

Error Detection

• EDC = Error Detection and Correction bits (redundancy) • D = Data protected by error checking, may include header • Error detection is not 100% reliable!

– protocol may miss some errors, but rarely – larger EDC field yields better detection and correction

32 Link with bit errors

All bits in D’ OK?

D EDC D’ EDC’

datagram datagram

yes

no d data bits

Simple single-bit parity

• E.g. 1-bit odd parity checking:

33

0111000110101011 0

d data bits parity bit

0110001010011101 1

0000000100100100 0

Number of 1’s in the

data+parity should always

be odd

Detect single-bit errors

d1,1 … d1,j d1,j+1 d2,1 … d2,j d2,j+1 … … … … di,1 … di,j di,j+1 di+1,1 … di+1,j di+1,j+1

Row parity

Column parity

Two-dimensional bit parity


Row parity

Column parity

10101 1 11110 0 01110 1 00101 0


Example:


Row parity

Column parity

10101 1 11110 0 01110 1 00101 0

10101 1 10110 0 01110 1 00101 0


Example:


Row parity

Column parity

10101 1 11110 0 01110 1 00101 0

10101 1 10110 0 01110 1 00101 0 Parity error

Parity error


Detect and correct single-

bit errors

Example:

Cyclic Redundancy Check (CRC)

• Polynomials with binary coefficients bk: xk + bk-1 xk-1 + … + b0 x0

• Order of polynomial: max i with bi ≠ 0 • Binary coefficients bi (0 or 1) form a field with operations: “+” (XOR) “•” (AND)

• Pick a generator polynomial: G(x) • Let the whole frame (D+EDC) be polynomial T(x) • Idea: fill EDC (CRC) field such that:

T(x) mod G(x) = 0 38

Cyclic Redundancy Check (CRC)

• How to divide with polynomials? • Example with G(x) = 1x2 + 0x1 + 1x0 = 101; D = 111011:

11101100 / 101 = 110110, remainder 10 100 011 111 100 010

• Idea: – generate T’(x) with EDC’ = 00 (i.e., T’(x) = D concat EDC’) – compute EDC = T’(x) mod G(x) – generate T(x) with EDC (i.e., T(x) = D concat EDC)

• Calculating and testing CRC is the same operation – can be implemented efficiently in hardware (XOR, AND, shift ops) 39

Notes

• Why does it work? – T’(x) mod G(x) = EDC – T(x) mod G(x) = (T’(x) + EDC) mod G(x)

= T’(x) mod G(x) + EDC mod G(x) = EDC + EDC = 0

• EDC is always one bit less than G(x)

40

• Use cyclic shift register with r registers, where r is the order of G(x)

• Example (4 bit G(x); 3 bit EDC):

⇒ Remainder of the division ends up in the registers

CRC Calculation in Hardware

41

T(x) + +

G(x) = x3 + x2 + 1

CRC: 11101100 / 101

42

T(x) + +

G(x) = x2 + 1

Output Register 1 Register 0 Input

0 0 0 Inital

0 0 1 1

0 1 1 1

1 1 0 1

1 0 1 0

0 1 1 1

1 1 0 1

1 0 1 0

0 1 0 0

CRC: 11101100 / 101

43

T(x) + +

G(x) = x2 + 1

Output Register 1 Register 0 Input

0 0 0 Inital

0 0 1 1

0 1 1 1

1 1 0 1

1 0 1 0

0 1 1 1

1 1 0 1

1 0 1 0

0 1 0 0

result

remainder

CRCs: How to choose G(x)?

• Typical generator polynomial G(x) = x16+x12+x5+1 • Why does G(x) look like this? (How does EDC look like?)

• Let E(x) be transmission errors. Then:

T(x) = M(x) + E(x) • So: T(x) mod G(x) = (M(x) + E(x)) mod G(x) = M(x) mod G(x) + E(x) mod G(x) • But recall:

M(x) mod G(x) = 0

⇒ We can detect all transmission errors iff E(x) is not divisible by G(x) without remainder

44

CRCs: How to chose G(x)?

• One can show that G(x) of order r can detect – All single bit errors if G(x) has xr and x0 terms with non-zero

coefficients – All double-bit errors if G(x) has a factor of at least three terms – Any odd number of errors if G(x) contains the factor (x+1) – Any burst of errors of length k ≤ r – Any burst of errors of length greater than r+1 bits with

probability (1- 2-r)

45

Media Access Control (MAC)

How can 3 or more machines share a single link?

Multiple Access Links and Protocols Three types of “links” • point-to-point (single wire; e.g. PPP)

– Just seen this • broadcast (shared wire or medium; e.g. Ethernet, WLAN)

– Today: how to share a broadcast medium • switched (e.g. switched Ethernet, ATM)

– Next lecture: packet switching

47

48

Multiple Access Protocols

• Ideal features with a channel of rate R – when only one node has data to send

throughput of R bps – when M nodes have data to send

throughput of R/M bps

– Need a decentralized protocol to do that! How? • (cannot assume that senders coordinate magically )

Turn-taking Protocols (e.g., Round Robin)

• No master node • Token-passing protocol

– Station k sends after station k–1 – If a station needs to transmit, when it receives the token, it sends

up to a max number of frames (m) and then forwards the token – If a station does not need to transmit, then forwards the token

• Questions – How efficient is this? – How to handle station failures or leaves? – How to handle new stations joining?

• Some MAC protocols (e.g. Token Ring, Slotted Ring, etc.) deal with this

• Others use random access (e.g. Aloha, Ethernet)

49

Random Access Protocols

• When node has packet to send – transmit at full channel data rate R – no a priori coordination among nodes

• Two or more transmitting nodes “collision” • Random access MAC protocol specifies

– how to detect collisions – how to recover from collisions

• via delayed retransmissions

50

Example: Slotted Aloha • Time is divided into equal size slots

– A slot is equal to the packet transmission time • Node with new arriving packet: transmit at start of next slot • Collision → retransmit packet in future slots

with probability p, until successful

51

Slots Node 3

Node 2

Node 1

Collisions Successes

Slotted Aloha (slightly simplified)

52

• We assume that the stations are fully synchronized.

• At each time slot, each stations sends with probabability p. There are n stations.

• P1 = Pr(S1 is successful) = p * (1 – p) n-1

• P = Pr(any station is successful) = n * P1

Slotted Aloha (slightly simplified)

53

• Goal: Maximize P: – solve dP / dp = 0 – (condition: dP / dp2 < 0) – solution: np = 1

• So, set p = 1/n: P = (1 – 1/n) n-1 ~ 1 / e (for large n)

• So effective bandwidth of channel is not R bps it is 1/e bps = 0.37 bps

Slotted Aloha vs. Round Robin

– Slotted Aloha does not use every slot of the channel

Less efficient than Round Robin + What happens in Round Robin when a new station joins?

What about more than one new station? Slotted Aloha is more flexible.

54

Pure (unslotted) Aloha

• Unslotted Aloha: simpler, no synchronization • Packet needs transmission

– send without waiting for start of slot • Collision probability increases:

– packet sent at t0 collides with packets sent in (t0-1, t0+1)

55 t0-1 t0 t0+1

node i frame

will overlap with start of

i’s frame

will overlap with end of

i’s frame

56

Pure Aloha analysis

• There are N stations • Each station transmits with probability p • For a node i to have a successful transmission means

no overlapping transmissions before or after, each with probability (1-p)N-1

• So: Np (1-p)2(N-1)

• The maximum value for N large is 1/2e • Half the rate of slotted Aloha!

Slotted Aloha vs. Pure Aloha

57

G = offered load 0.5 1.0 1.5 2.0

0.1

0.2

0.3

0.4

Pure Aloha

Slotted Aloha

• A small increase in the channel load, that is G, can drastically reduce its performance

• The protocol constrains effective channel throughput!

Demand Assigned Multiple Access (DAMA)

• Channel efficiency only 37% for Slotted Aloha, and even worse for Aloha.

• Practical systems use reservation whenever possible. • Reservation

– a sender reserves a future time-slot – sending within this reserved time-slot is possible

without collision – reservation also causes higher delays

• But: Every scalable system needs an Aloha style component. – e.g., what happens if new sender arrives after reservation?

58

Example Reservation-based Protocol

Distributed Polling • time divided into slots • begins with N short reservation slots

– reservation slot time equal to channel end-end propagation delay

– station with message to send posts reservation – reservation seen by all stations

• after reservation slots, message transmissions ordered by known priority

59

1 0 1 1 0 0 Node 1 Node 3 Node 4 0 1 0 0 0 0 Node 2

Reservation slots

Packet transmission

CSMA: Carrier Sense Multiple Access

Idea of CSMA: listen before transmit! • If channel sensed idle: transmit entire packet • If channel sensed busy, defer transmission. Two variants

– Persistent CSMA • retry immediately with probability p when channel

becomes idle (may cause instability) – Non-persistent CSMA

• retry after random interval

• Human analogy – Don’t interrupt anybody already speaking

60

61

CSMA collisions

61

collisions can occur propagation delay: two nodes may not hear each other’s transmission

collision entire packet transmission time wasted

spatial layout of nodes along Ethernet

Note: role of distance and propagation delay in determining collision prob.

CSMA/CD (Collision Detection) CSMA/CD: carrier sensing, as in CSMA

– collisions detected within short time – colliding transmissions aborted, reducing channel wastage – persistent or non-persistent retransmission

• collision detection – easy in wired LANs: measure signal strengths, compare

transmitted, received signals – difficult in wireless LANs

• Human analogy (the polite conversationalist) 1. Don’t interrupt anybody already speaking 2. If another starts speaking with you, then back off.

62

CSMA/CD collision detection

63

Ethernet The predominant LAN technology • cheap: $2 for 100Mbps! • first widely used LAN technology • Simpler/cheaper than token rings and ATM • Keeps up with speed race: 10 Mpbs, 100 Mbps, 1 Gbps, 10

Gbps, 40 Gbps, 100 Gbps…

64

Metcalfe’s original 3Mbps Ethernet sketch at Xerox PARC, May 22, 1973

Ethernet Frame Structure

65

• Sending adapter encapsulates IP datagram (or other network layer protocol packet) in Ethernet frame

• Preamble – 7 bytes with pattern 10101010 – Followed by 1 byte with pattern 10101011 – Used to synchronize receiver, sender clock rates

• Addresses – 6 bytes, frame is received by all adapters on a LAN and dropped if

address does not match • Type (2 bytes): indicates the higher layer protocol, mostly IP • CRC (4 bytes): checked at receiver, if error is detected, the frame is

simply dropped

Preamble Dest. address

Source address ty

pe

Data (payload) CRC

Ethernet CSMA/CD algorithm

1. Adapter gets datagram from network layer and creates frame

2. If adapter senses channel idle, it starts to transmit frame. If it senses channel busy, waits until channel idle and then transmits

3. If adapter transmits entire frame

without detecting another transmission, the adapter is done with frame!

4. If adapter detects another transmission while transmitting, aborts and sends jam signal

5. After aborting, adapter enters

exponential backoff: after the nth collision, adapter chooses a K at random from {0,1,2,…,2m-1} where m = min(n,10). Adapter waits K • 512 bit times and returns to Step 2

66

Ethernet’s CSMA/CD (more)

Jam Signal • make sure all other

transmitters are aware of collision

Bit time • 51.21 μsec for 10 Mbps

Ethernet (2500m long) • for K=1023, wait time is

about 50 msec

Exponential Backoff • Goal: adapt retransmission

attempts to estimated current load – heavy load: random wait

will be longer • first collision: choose K from

{0,1}; delay is K • 512 bit transmission times

• after second collision: choose K from {0,1,2,3}

• after ten collisions, choose K from {0,1,2,3,4,…,1023}

67

CSMA/CD efficiency

68

Very Old Ethernet Technologies: 10Base2

• 10: 10Mbps; 2: under 200 meters maximal cable length • thin coaxial cable in a bus topology

• repeaters used to connect up to multiple segments • repeater repeats bits it hears on one interface to its other

interfaces: physical layer device only! • Rarely seen at all these days 69

node node node node node

Adapter

Terminator T-connector transmitted packet travels in both directions

Hubbed 10BaseT and 100BaseT • 10/100 Mbps rate; latter a.k.a. “fast ethernet” • T stands for Twisted Pair • Nodes connect to a hub: “star topology”; 100 m max distance

between nodes and hub

• Hubs are essentially physical-layer repeaters – bits coming in on one link go out on all other links – no frame buffering – no CSMA/CD at hub: adapters detect collisions – provides network management functionality

70

hub

nodes

Gigabit Ethernet: today’s commodity Ethernet

71

• Standard Ethernet frame format – plus “Jumbo frames”

• point-to-point links and shared broadcast channels – CSMA/CD is used for shared mode – short distances between nodes to be efficient – Full-Duplex at 1 Gbps for point-to-point links

• 10 Gig, 40 Gig Ethernet now available

networks and operating systems chapter 9 – link layer · networks and operating systems chapter 9...

Documents