networks and operating systems chapter 9 – link layer · networks and operating systems chapter 9...
TRANSCRIPT
Networks and Operating Systems Chapter 9 – Link Layer
(252-0062-00)
Donald Kossmann & Torsten Höfler
Frühjahrssemester 2013
© Systems Group | Department of Computer Science | ETH Zürich
Last time
• Network layer – Services, addressing, structure
• Routing protocols – Link-state protocols (i.e., OSPF) – Distance-vector protocols (i.e., BGP)
2
Link Layer: setting the context
• Two or more physically connected devices – host-router, router-router, host-host
• unit of data is a frame
Link Layer: setting the context
• Two or more physically connected devices – host-router, router-router, host-host
• unit of data is a frame
4
Application Transport Network
Link Physical
Data
Data HT
Data HT HN
Data HT HN HL
Network Link
Physical Data HT HN HL
Physical link Frame
Link Layer: setting the context
• Two or more physically connected devices – host-router, router-router, host-host
• unit of data is a frame
5
Application Transport Network
Link Physical
Data
Data HT
Data HT HN
Data HT HN HL
Network Link
Physical Data HT HN HL
Physical link
Data Link Protocol
Frame
Link Layer: setting the context
• Two or more physically connected devices – host-router, router-router, host-host
• unit of data is a frame
6
Application Transport Network
Link Physical
Data
Data HT
Data HT HN
Data HT HN HL
Network Link
Physical Data HT HN HL
Physical link
Data Link Protocol
Network interfaces
Frame
Link Layer Services
• Basic problem: you can’t just send IP datagrams over the link!
• Encoding: How to represent the bits (optically, electrically, …) • Framing: Where do the datagrams start and end?
– Byte-level framing – Bit-level framing – Clock-based framing
• Error Detection: How to know if the datagram is corrupted? – Checksums, CRCs – Error Correction: Can we fix an error?
• (Flow Control: Can we adjust the rate properly?) • Media Access: What if 3 or more machines share a link?
7
Aside: The End-to-End Argument
“The function in question can completely and correctly be implemented only with the knowledge and help of the
application standing at the end points of the communication system.
Therefore, providing that questioned function as a feature of the communication system itself is not possible.
(Sometimes an incomplete version of the function provided by the communication system may be useful as a performance
enhancement.)”
“End-to-end arguments in system design”, J.H.Saltzer, D.P. Reed and D.D. Clark.
ACM Transactions in Computer Systems vol.2, no.4, November, 1984, pages 277-288.
8
Encoding
How to represent the bits?
Encoding: The Problem
• Suppose you can send high and low discrete signals • How to send (represent) bits? • First Option: Non-Return to Zero (NRZ) encoding
• Problem: long runs of zeros or ones – Baseline wander – No clock recovery
10
0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0
Encoding: the problem
• Suppose you can send high and low discrete signals • How to send (represent) bits? • First Option: Non-Return to Zero (NRZ) encoding
• Problem: long runs of zeros or ones – Baseline wander – No clock recovery
11
0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0
Alternative: Non Return to Zero Inverted (NRZI)
• Signal transitions for a “1”, stays for a “0” – Solves half the problem! – Long runs of “0” still an issue
• But still useful – in two slides’ time…
12
0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0
Manchester encoding
• Encode “1” as hi→lo transition, “0” as lo→hi • + Each bit has a transition: no loss of sync • - Requires double the frequency (baud rate)
– N.B. Baud rate is different from bit rate!
13
0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0
14
4B/5B encoding • Key idea: break up long sequences
by inserting bits • Encode every 4 symbols as 5 bits
– 0 or 1 leading zero – <= 2 trailing zeroes
• Send using NRZI – 80% efficiency, vs 50% for
Manchester • Can use other symbols for control
information – 11111 idle – 00000 dead – 00100 halt
4-bit symbol 5-bit code
0000 11110
0001 01001
0010 10100
0011 10101
0100 01010
0101 01011
0110 01110
0111 01111
1000 10010
1001 10011
1010 10110
1011 10111
1100 11010
1101 11011
1110 11100
1111 11101
Framing
How to tell where the messages start and end?
Framing: Where do frames start and end?
16
Example: suppose 1st byte gives length of frame
Without errors: 5 1 2 3 4 5 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3
5-chararacter frame
5-chararacter frame
8-chararacter frame
8-chararacter frame
Length byte
(Example from Tanenbaum)
Framing: Where do frames start and end?
17
Example: suppose 1st byte gives length of frame
Without errors: 5 1 2 3 4 5 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3
5-chararacter frame
5-chararacter frame
8-chararacter frame
8-chararacter frame
Length byte
5 1 2 3 4 7 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3
5-chararacter frame
Length byte Error!
(Example from Tanenbaum)
Single error:
Framing: Where do frames start and end?
18
Example: suppose 1st byte gives length of frame
Without errors: 5 1 2 3 4 5 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3
5-chararacter frame
5-chararacter frame
8-chararacter frame
8-chararacter frame
Length byte
5 1 2 3 4 7 6 7 8 9 8 0 1 2 3 4 5 6 8 7 8 9 0 1 2 3
5-chararacter frame
7-chararacter frame
(wrong!)
Length byte Error!
Should not be a character count!
(Example from Tanenbaum)
Single error:
Point to Point Protocol (PPP) Example of byte-level framing
• Byte-oriented protocols: view each frame as a collection of bytes (characters)
• Scenario: Point-to-point data link – dialup link, ISDN line, serial cable, ADSL, TCP tunnel, etc.
• Bidirectional unicast link: – only two machines
• No sharing: – no Media Access Control – no explicit MAC addressing
19
PPP Design Requirements [RFC 1557]
• packet framing – encapsulation of network-layer datagram in data link frame – carry network layer data of any network layer protocol (not just
IP) at same time – ability to demultiplex received frames to the network-layer
• bit transparency: must carry any bit pattern in the data field • error detection (no correction) • connection liveness: detect, signal link failure to network
layer • network layer address negotiation: endpoints can
learn/configure each other’s network addresses
• No error correction/recovery, flow control, in-order delivery – all relegated to higher layers
20
PPP Data Frame
• Flag: delimiter (framing) • Address: does nothing (only one option) • Control: does nothing; in future possible multiple control fields • Protocol: upper layer protocol to which frame delivered
(e.g. PPP-LCP, IP, IPCP, etc.) • info: upper layer data being carried • check: cyclic redundancy check for error detection
21
01111110 11111111 00000011 protocol payload check 01111110
1 byte 1 byte 1 byte 1-2 bytes variable 2-4 bytes 1 byte
flag address control flag
Byte stuffing in PPP
• Problem: need for “data transparency”: – What if payload contains <01111110>? – If receiver sees <01111110>, is it data or flag?
• Solution: – Sender: adds (“stuffs”) extra flag after every flag byte in the payload
– Receiver: • replaces 2 successive flags with a single flag payload byte • Interprets single flag byte as a real flag
22
Byte stuffing in PPP
• Problem: need for “data transparency”: – What if payload contains <01111110>? – If receiver sees <01111110>, is it data or flag?
• Solution: – Sender: adds (“stuffs”) extra flag after every flag byte in the payload
– Receiver: • replaces 2 successive flags with a single flag payload byte • Interprets single flag byte as a real flag
23
payload flag payload flag flag
payload flag payload flag flag flag
Alternative byte stuffing: Escape bytes
24
A flag B A esc flag B
Alternative byte stuffing: Escape bytes
25
A flag B
A esc B
A esc flag
A esc esc
B
B
Alternative byte stuffing: Escape bytes
26
A flag B
A esc B
A esc flag B
A esc flag
A esc esc
A esc esc esc
B
B
flag B
Alternative byte stuffing: Escape bytes
27
A flag B
A esc B
A esc flag
A esc esc
B
B
A esc flag
A esc esc
A esc esc
A esc esc
esc
esc
B
B
flag B
esc B
High-level Data Link Control (HDLC) Example of bit-level framing
• Superficially similar to PPP • Checksum for error detection • Bit-oriented protocol ⇒ bitstuffing for framing
– Sender inserts a “0” after 5 consecutive “1”s (except in flag) – Receiver interprets 0111111 as either flag or error.
011011111111111111110010
↕ 011011111011111011111010010
28
01111110 address control payload check 01111110
8 bits 8 bits 8 bits variable 16 bits 8 bits
HDLC (contd)
• Large family of similar protocols – SDLC, HDLC, LAP, LAPB, etc.
• 3 types of frames: – Information (most frames, control information piggybacked) – Supervisory (flow and error control) – Unnumbered (all kinds of misc. things)
• Sliding window protocol, 3-bit sequence number – ⇒ 7 unacknowledged outstanding frames – Piggybacked acknowledgements (‘next’ field)
• “Poll/Final” bit: for multiple terminals – “P”: used for polling terminals / sending data from one – “F”: indicates final frame from polled terminal
29
Synchronous Optical NETwork Example of clock-based framing
• The dominant standard for long-distance data transmission over optical networks
• No bit- or byte-stuffing ⇒ frames are all the same size: 125µs
• STS-1 ~ 51.84Mbps, 810-byte frames • STS-192 ~ 10Gbps, 155,520-byte frames
• Framing is clock based: – Flag word: first 16 bits of each frame – Receiver looks for regular flag every 125µs
• Encoding: NRZ but scrambled – XOR-ed with 127-bit pattern
• Lots of other complexity! – e.g. 64bytes reserved for voice channel, etc.
30
Error Detection
How to tell if bits have been lost or changed?
Error Detection
• EDC = Error Detection and Correction bits (redundancy) • D = Data protected by error checking, may include header • Error detection is not 100% reliable!
– protocol may miss some errors, but rarely – larger EDC field yields better detection and correction
32 Link with bit errors
All bits in D’ OK?
D EDC D’ EDC’
datagram datagram
yes
no d data bits
Simple single-bit parity
• E.g. 1-bit odd parity checking:
33
0111000110101011 0
d data bits parity bit
0110001010011101 1
0000000100100100 0
Number of 1’s in the
data+parity should always
be odd
Detect single-bit errors
d1,1 … d1,j d1,j+1 d2,1 … d2,j d2,j+1 … … … … di,1 … di,j di,j+1 di+1,1 … di+1,j di+1,j+1
Row parity
Column parity
Two-dimensional bit parity
d1,1 … d1,j d1,j+1 d2,1 … d2,j d2,j+1 … … … … di,1 … di,j di,j+1 di+1,1 … di+1,j di+1,j+1
Row parity
Column parity
10101 1 11110 0 01110 1 00101 0
Two-dimensional bit parity
Example:
d1,1 … d1,j d1,j+1 d2,1 … d2,j d2,j+1 … … … … di,1 … di,j di,j+1 di+1,1 … di+1,j di+1,j+1
Row parity
Column parity
10101 1 11110 0 01110 1 00101 0
10101 1 10110 0 01110 1 00101 0
Two-dimensional bit parity
Example:
d1,1 … d1,j d1,j+1 d2,1 … d2,j d2,j+1 … … … … di,1 … di,j di,j+1 di+1,1 … di+1,j di+1,j+1
Row parity
Column parity
10101 1 11110 0 01110 1 00101 0
10101 1 10110 0 01110 1 00101 0 Parity error
Parity error
Two-dimensional bit parity
Detect and correct single-
bit errors
Example:
Cyclic Redundancy Check (CRC)
• Polynomials with binary coefficients bk: xk + bk-1 xk-1 + … + b0 x0
• Order of polynomial: max i with bi ≠ 0 • Binary coefficients bi (0 or 1) form a field with operations: “+” (XOR) “•” (AND)
• Pick a generator polynomial: G(x) • Let the whole frame (D+EDC) be polynomial T(x) • Idea: fill EDC (CRC) field such that:
T(x) mod G(x) = 0 38
Cyclic Redundancy Check (CRC)
• How to divide with polynomials? • Example with G(x) = 1x2 + 0x1 + 1x0 = 101; D = 111011:
11101100 / 101 = 110110, remainder 10 100 011 111 100 010
• Idea: – generate T’(x) with EDC’ = 00 (i.e., T’(x) = D concat EDC’) – compute EDC = T’(x) mod G(x) – generate T(x) with EDC (i.e., T(x) = D concat EDC)
• Calculating and testing CRC is the same operation – can be implemented efficiently in hardware (XOR, AND, shift ops) 39
Notes
• Why does it work? – T’(x) mod G(x) = EDC – T(x) mod G(x) = (T’(x) + EDC) mod G(x)
= T’(x) mod G(x) + EDC mod G(x) = EDC + EDC = 0
• EDC is always one bit less than G(x)
40
• Use cyclic shift register with r registers, where r is the order of G(x)
• Example (4 bit G(x); 3 bit EDC):
⇒ Remainder of the division ends up in the registers
CRC Calculation in Hardware
41
T(x) + +
G(x) = x3 + x2 + 1
CRC: 11101100 / 101
42
T(x) + +
G(x) = x2 + 1
Output Register 1 Register 0 Input
0 0 0 Inital
0 0 1 1
0 1 1 1
1 1 0 1
1 0 1 0
0 1 1 1
1 1 0 1
1 0 1 0
0 1 0 0
CRC: 11101100 / 101
43
T(x) + +
G(x) = x2 + 1
Output Register 1 Register 0 Input
0 0 0 Inital
0 0 1 1
0 1 1 1
1 1 0 1
1 0 1 0
0 1 1 1
1 1 0 1
1 0 1 0
0 1 0 0
result
remainder
CRCs: How to choose G(x)?
• Typical generator polynomial G(x) = x16+x12+x5+1 • Why does G(x) look like this? (How does EDC look like?)
• Let E(x) be transmission errors. Then:
T(x) = M(x) + E(x) • So: T(x) mod G(x) = (M(x) + E(x)) mod G(x) = M(x) mod G(x) + E(x) mod G(x) • But recall:
M(x) mod G(x) = 0
⇒ We can detect all transmission errors iff E(x) is not divisible by G(x) without remainder
44
CRCs: How to chose G(x)?
• One can show that G(x) of order r can detect – All single bit errors if G(x) has xr and x0 terms with non-zero
coefficients – All double-bit errors if G(x) has a factor of at least three terms – Any odd number of errors if G(x) contains the factor (x+1) – Any burst of errors of length k ≤ r – Any burst of errors of length greater than r+1 bits with
probability (1- 2-r)
45
Media Access Control (MAC)
How can 3 or more machines share a single link?
Multiple Access Links and Protocols Three types of “links” • point-to-point (single wire; e.g. PPP)
– Just seen this • broadcast (shared wire or medium; e.g. Ethernet, WLAN)
– Today: how to share a broadcast medium • switched (e.g. switched Ethernet, ATM)
– Next lecture: packet switching
47
48
Multiple Access Protocols
• Ideal features with a channel of rate R – when only one node has data to send
throughput of R bps – when M nodes have data to send
throughput of R/M bps
– Need a decentralized protocol to do that! How? • (cannot assume that senders coordinate magically )
Turn-taking Protocols (e.g., Round Robin)
• No master node • Token-passing protocol
– Station k sends after station k–1 – If a station needs to transmit, when it receives the token, it sends
up to a max number of frames (m) and then forwards the token – If a station does not need to transmit, then forwards the token
• Questions – How efficient is this? – How to handle station failures or leaves? – How to handle new stations joining?
• Some MAC protocols (e.g. Token Ring, Slotted Ring, etc.) deal with this
• Others use random access (e.g. Aloha, Ethernet)
49
Random Access Protocols
• When node has packet to send – transmit at full channel data rate R – no a priori coordination among nodes
• Two or more transmitting nodes “collision” • Random access MAC protocol specifies
– how to detect collisions – how to recover from collisions
• via delayed retransmissions
50
Example: Slotted Aloha • Time is divided into equal size slots
– A slot is equal to the packet transmission time • Node with new arriving packet: transmit at start of next slot • Collision → retransmit packet in future slots
with probability p, until successful
51
Slots Node 3
Node 2
Node 1
Collisions Successes
Slotted Aloha (slightly simplified)
52
• We assume that the stations are fully synchronized.
• At each time slot, each stations sends with probabability p. There are n stations.
• P1 = Pr(S1 is successful) = p * (1 – p) n-1
• P = Pr(any station is successful) = n * P1
Slotted Aloha (slightly simplified)
53
• Goal: Maximize P: – solve dP / dp = 0 – (condition: dP / dp2 < 0) – solution: np = 1
• So, set p = 1/n: P = (1 – 1/n) n-1 ~ 1 / e (for large n)
• So effective bandwidth of channel is not R bps it is 1/e bps = 0.37 bps
Slotted Aloha vs. Round Robin
– Slotted Aloha does not use every slot of the channel
Less efficient than Round Robin + What happens in Round Robin when a new station joins?
What about more than one new station? Slotted Aloha is more flexible.
54
Pure (unslotted) Aloha
• Unslotted Aloha: simpler, no synchronization • Packet needs transmission
– send without waiting for start of slot • Collision probability increases:
– packet sent at t0 collides with packets sent in (t0-1, t0+1)
55 t0-1 t0 t0+1
node i frame
will overlap with start of
i’s frame
will overlap with end of
i’s frame
56
Pure Aloha analysis
• There are N stations • Each station transmits with probability p • For a node i to have a successful transmission means
no overlapping transmissions before or after, each with probability (1-p)N-1
• So: Np (1-p)2(N-1)
• The maximum value for N large is 1/2e • Half the rate of slotted Aloha!
Slotted Aloha vs. Pure Aloha
57
G = offered load 0.5 1.0 1.5 2.0
0.1
0.2
0.3
0.4
Pure Aloha
Slotted Aloha
• A small increase in the channel load, that is G, can drastically reduce its performance
• The protocol constrains effective channel throughput!
Demand Assigned Multiple Access (DAMA)
• Channel efficiency only 37% for Slotted Aloha, and even worse for Aloha.
• Practical systems use reservation whenever possible. • Reservation
– a sender reserves a future time-slot – sending within this reserved time-slot is possible
without collision – reservation also causes higher delays
• But: Every scalable system needs an Aloha style component. – e.g., what happens if new sender arrives after reservation?
58
Example Reservation-based Protocol
Distributed Polling • time divided into slots • begins with N short reservation slots
– reservation slot time equal to channel end-end propagation delay
– station with message to send posts reservation – reservation seen by all stations
• after reservation slots, message transmissions ordered by known priority
59
1 0 1 1 0 0 Node 1 Node 3 Node 4 0 1 0 0 0 0 Node 2
Reservation slots
Packet transmission
CSMA: Carrier Sense Multiple Access
Idea of CSMA: listen before transmit! • If channel sensed idle: transmit entire packet • If channel sensed busy, defer transmission. Two variants
– Persistent CSMA • retry immediately with probability p when channel
becomes idle (may cause instability) – Non-persistent CSMA
• retry after random interval
• Human analogy – Don’t interrupt anybody already speaking
60
61
CSMA collisions
61
collisions can occur propagation delay: two nodes may not hear each other’s transmission
collision entire packet transmission time wasted
spatial layout of nodes along Ethernet
Note: role of distance and propagation delay in determining collision prob.
CSMA/CD (Collision Detection) CSMA/CD: carrier sensing, as in CSMA
– collisions detected within short time – colliding transmissions aborted, reducing channel wastage – persistent or non-persistent retransmission
• collision detection – easy in wired LANs: measure signal strengths, compare
transmitted, received signals – difficult in wireless LANs
• Human analogy (the polite conversationalist) 1. Don’t interrupt anybody already speaking 2. If another starts speaking with you, then back off.
62
CSMA/CD collision detection
63
Ethernet The predominant LAN technology • cheap: $2 for 100Mbps! • first widely used LAN technology • Simpler/cheaper than token rings and ATM • Keeps up with speed race: 10 Mpbs, 100 Mbps, 1 Gbps, 10
Gbps, 40 Gbps, 100 Gbps…
64
Metcalfe’s original 3Mbps Ethernet sketch at Xerox PARC, May 22, 1973
Ethernet Frame Structure
65
• Sending adapter encapsulates IP datagram (or other network layer protocol packet) in Ethernet frame
• Preamble – 7 bytes with pattern 10101010 – Followed by 1 byte with pattern 10101011 – Used to synchronize receiver, sender clock rates
• Addresses – 6 bytes, frame is received by all adapters on a LAN and dropped if
address does not match • Type (2 bytes): indicates the higher layer protocol, mostly IP • CRC (4 bytes): checked at receiver, if error is detected, the frame is
simply dropped
Preamble Dest. address
Source address ty
pe
Data (payload) CRC
Ethernet CSMA/CD algorithm
1. Adapter gets datagram from network layer and creates frame
2. If adapter senses channel idle, it starts to transmit frame. If it senses channel busy, waits until channel idle and then transmits
3. If adapter transmits entire frame
without detecting another transmission, the adapter is done with frame!
4. If adapter detects another transmission while transmitting, aborts and sends jam signal
5. After aborting, adapter enters
exponential backoff: after the nth collision, adapter chooses a K at random from {0,1,2,…,2m-1} where m = min(n,10). Adapter waits K • 512 bit times and returns to Step 2
66
Ethernet’s CSMA/CD (more)
Jam Signal • make sure all other
transmitters are aware of collision
Bit time • 51.21 μsec for 10 Mbps
Ethernet (2500m long) • for K=1023, wait time is
about 50 msec
Exponential Backoff • Goal: adapt retransmission
attempts to estimated current load – heavy load: random wait
will be longer • first collision: choose K from
{0,1}; delay is K • 512 bit transmission times
• after second collision: choose K from {0,1,2,3}
• after ten collisions, choose K from {0,1,2,3,4,…,1023}
67
CSMA/CD efficiency
68
Very Old Ethernet Technologies: 10Base2
• 10: 10Mbps; 2: under 200 meters maximal cable length • thin coaxial cable in a bus topology
• repeaters used to connect up to multiple segments • repeater repeats bits it hears on one interface to its other
interfaces: physical layer device only! • Rarely seen at all these days 69
node node node node node
Adapter
Terminator T-connector transmitted packet travels in both directions
Hubbed 10BaseT and 100BaseT • 10/100 Mbps rate; latter a.k.a. “fast ethernet” • T stands for Twisted Pair • Nodes connect to a hub: “star topology”; 100 m max distance
between nodes and hub
• Hubs are essentially physical-layer repeaters – bits coming in on one link go out on all other links – no frame buffering – no CSMA/CD at hub: adapters detect collisions – provides network management functionality
70
hub
nodes
Gigabit Ethernet: today’s commodity Ethernet
71
• Standard Ethernet frame format – plus “Jumbo frames”
• point-to-point links and shared broadcast channels – CSMA/CD is used for shared mode – short distances between nodes to be efficient – Full-Duplex at 1 Gbps for point-to-point links
• 10 Gig, 40 Gig Ethernet now available