lecture 2: basic routing, arp, and basic ip - kth · • target ethernet and ip address • arp is...
TRANSCRIPT
lecture_2
Lecture 2: Basic routing, ARP, and basic IP
• Literature:– Forouzan, TCP/IP Protocol Suite: Ch 6 - 8
Internetworking
lecture_2
Basic Routing
Delivery, Forwarding, and Routingof IP packets
lecture_2
Connection-oriented vs Connectionless• Connection-Oriented Services
– The network layer establishes a connection between a source and a destination
– Packets are sent along the connection.– The decision about the route is made once at connection
establishment– Routers/switches in connection-oriented networks are stateful
• Connectionless Services– The network layer treats each packet independently– Route lookup for each packet (routing table)– IP is connectionless– IP routers are stateless
lecture_2
Direct vs Indirect delivery• Direct delivery
– The final destination is connected to the same physical network as the sender.
– IP destination address and local interface has same netmask
– Map IP address to physical address: ARP
• Indirect delivery– From router to router, last delivery is direct– Destination address and routing table: Routing
direct delivery
R1
A
R2
R3
B
indirect delivery
indirect delivery
indirect delivery
A B
R
direct delivery
direct delivery
lecture_2
Next-hop Routing• How do you hold information about route from A to all other hosts?
– A R1 R2 R3 B
• Store table of host/network address and nexthop in every node
A
C D E F
BR1 R2 R3
R4N1
N2 N3
N4
N1, -N2, R1N3, R1N4, R1
N1, -N2, R2N3, R2N4, R2
N1, R1N2, R4N3, R4N4, R3
N1, R2N2, R2N3, R2N4, -
N1, R3N2, R3N3, R3N4, -
lecture_2
Routing Table Search - Classful
• Determine class from destination address• Search within class• Routing table often divided into ”buckets”
destination IP addressdestination IP address
Class A bucket
Class B bucket
Class C bucket
lecture_2
Routing Table Search - Classless• Longest prefix first
• Conceptually: divide table in 32 ”buckets” - one for each netmask length and match destination with longest prefixes first
• SW algorithms: tree, binary trees, tries (different data structures)• HW support: TCAMs – Content Addressable Memory
Netid
Netid...
0
1
32
31
Masklen
destination IP addressdestination IP address
lecture_2
Routing Tables• The basic idea with IP addressing (and CIDR) is to aggregate
addresses– more specific networks (with longer prefixes)
less specific networks (with shorter prefixes)
• More aggregation leads to smaller routing tables• The ideal situation is to have domains publishing (exporting) only a
small set of prefixes– Effective address assignment policy
• Some mechanisms lead to increased fragmentation– # of available addresses decreasing distribution of long prefixes (/24)
– Multihoming - sites having several subnetworks – from different providers
• Current routing tables (# of entries) is ~150000 (~60% are /24 prefixes)
lecture_2
Routing Table – Common Fields
• Mask – netmask applied for the entry [255.255.255.0]• Network address – destination network [192.168.15.0]• Next-hop address – next router [130.237.15.1] • Interface – outgoing interface [eth0]• Flags – status/info [U(p), G(ateway), H(ost-specific)...]• Reference count – # of users using this route• Use – # of packets transmitted for this destination
..................................................................................
UseReferencecount
FlagsInterfaceNext-hopAddress
NetworkAddress
Mask
lecture_2
IP Router Model
• A Router can be partitioned into a dataplane and a controlplane– The dataplane is fast and special purpose – handles packet
forwarding in real-time
– The control plane is general purpose– handles routing in the background
IPForwarding
EthernetInterface
FDDIInterface
Router
FIB
RIB
Routing Information
Base
Forwarding Information
Base
IPRoutingControl
Plane
Data Plane
lecture_2
IP Forwarding• A router switches packets between network interfaces• Extracts header information from the incoming datagram
– Destination IP address
• Makes a lookup in the forwarding information base by making a match against networks– Next-Hop IP address,– Outgoing interface,...
• Modifies datagram header• Sends on outgoing interface• But a router performs much more than IPv4 lookup
– Access lists, filtering– Traffic management– Other protocols: Bridging, MPLS, IPv6, ...
lecture_2
ARP
Mapping between logical IP addresses and physical addresses
lecture_2
Logical and Physical Addresses
• A host’s network interface card (NIC) has:– a hardcoded, physical MAC address
• e.g., 48-bit Ethernet address
– a configured, logical IP address
– a configured name
0:0:c0:6f:2d:40 0:0:c0:c2:9b:26 140.252.13.35 140.252.13.34
8:0:20:3:f6:42140.252.13.33
Name:
MAC addr:IP addr:
bsdibsdi sunsun svr4svr4
lecture_2
Communicating with a next-hop
• Problem: bsdi wants to send an IP packet to svr4– No routers between sender and receiver – directly connected host
• Getting the IP address of svr4– Static configuration– DNS: Name Address (Later lectures)
• Getting the MAC address of svr4– Static configuration– Dynamic Address Resolution - ARP
bsdibsdi sunsun svr4svr4
0:0:c0:6f:2d:40 0:0:c0:c2:9b:26 140.252.13.35 140.252.13.34
8:0:20:3:f6:42140.252.13.33
Name:
MAC addr:IP addr:
lecture_2
ARP - Address Resolution Protocol
• Problem: we are to send a packet to an interface on a directly attached network - we know the IP-address of the destination but not the MAC address.
• Idea: Broadcast a request - “On which MAC address can IP-address X be reached?”.– ARP request
• The host/router with the destination replies with its MAC address– ARP reply
• This is the basic functionality of ARP
lecture_2
ARP Examplebsdi intends to send an IP datagram to svr4 (140.252.13.34)
1. Send an ARP request on broadcast to all stations:– who has 140.252.13.34?
2. svr4 identifies it as its own address and sends an ARP reply on unicast back to bsdi– I have 140.252.13.34 and its mac address is 0:0:c0:c2:9b:26
3. bsdi sends the datagram to svr4 using the resolved mac address4. Note that sun and svr4 can update their ARP caches with bsdi!
sun svr4bsdi
1 23
lecture_2
ARP Packet• Two length fields
– Hardware (Ethernet address length: 6)
– Protocol (IP address length: 4)
• Sender Ethernet and IP address
• Target Ethernet and IP address• ARP is encapsulated directly into a data link frame (e.g., Ethernet)
senderEthernet addr
target Ethernet addr
senderIP addr
targetIP addr
1 1 6 64 4
hardware size
protocol size
2
ophwlen
protlen
hwtype
prottype
22
lecture_2
ARP Optimizations• ARP cache
– Resolved addresses are saved in a cache.
– Works because of correlations in use of addresses
– Limits ARP traffic
• Entries in the ARP cache times out
• Network is snooped– Since the sender’s Internet-to-Physical address binding is in every
ARP broadcast; (all) receivers update their caches before processing an ARP packet
lecture_2
ARP Timeouts• If there is no reply to an ARP request
– The machine is down or not responding
– Request was lost, therefore retry (but not too often)
– Eventually give up (When?)
• ARP cache timeouts
– completed entry in 20 minutes (BSD Unix)
– incomplete entry in 3 minutes (BSD Unix)
lecture_2
Indirect/Direct Delivery and ARP• A sends an IP packet to B through router R• Ethernet links to connect A and B to R
IP A IP R IP B
MAC a MAC r1 MAC r2 MAC b
IP Header Src: A, Dst: B Src: A, Dst: B
Ethernet Header Src: a, Dst: r1 Src: r2, Dst: b
Indirect delivery Direct delivery
lecture_2
Proxy ARP (RFC 826)• Proxy ARP - someone
responds to ARP requests on someone else’s behalf
• Allows sub-networks to be hidden
• Example: sun is hidden behind netb: Netb responds on behalf of sun.
gemini
netb
sun
slip
140.252.1.183
140.252.1.129
arp request for 140.252.1.129
arp reply
lecture_2
Gratuitous ARP• Host sends an ARP request of its own address
– Generally done at boot time to inform other machines of its address (possibly a new address) - they get a chance to update their cache entries immediately
– Lets hosts check to see if there is another machine claiming the same address ⇒ “duplicate IP address sent from Ethernet address a:b:c:d:e:f”
• As noted before, hosts have paid the price by servicing the broadcast, so they can cache this information - this is one of the ways the proxy ARP server could know the mapping
• Note that faking that you are another machine can be used to provide failover for servers
lecture_2
RARP: Reverse Address Resolution Protocol (RFC 903)
• How to get your own IP address, when all you know is your link address
• Necessary if you don’t have a disk or other stable storage
• RARP request - broadcast to every host on the network (i.e., EtherDST=0xFFFFFF), TYPE=0x8035
• RARP server: “I know that address!” and sends an RARP reply
• Source host - receives the RARP reply, and now knows its own IP addr
• RARP packet has exactly the same format as ARP packet
• BOOTP/DHCP is a more powerful alternative to RARP
lecture_2
RARP Server• Someone has to know the mappings - quite often this is in
the file “/etc/ethers”• Since this information is generally in a file, RARP servers
are generally implemented as user processes• Unlike ARP responses which are generally part of the
TCP/IP implementation (often part of the kernel)• How does the process get the packets - since they aren’t IP
and won’t come across a socket?– PCAP – Packet Capture (used by Tcpdump/Ethereal)– BPF – Berkeley Packet Filter (older)
• RARP requests are sent as hardware level broadcasts -therefore are not forwarded across routers
lecture_2
IP
Basic functionality and the IP packet header
lecture_2
Issues in IP• Following the end2end argument, only the absolutely
necessary functionality is in IP– Best Effort Service: Unreliable and Connectionless
– Application or Transport layer handles reliability
• How to deliver datagrams over multiple links (hops) in an internetwork?– Addressing– Best-effort delivery service
• Forwarding of packets from one link to another
– Error handling
lecture_2
IPv4 Header – RFC 791• Version
• HLEN – Header Length
• Type of Service
• Total Length– Header + Payload
• Fragmentation – ID, Flags, Offset
• TTL – Time To Live– Limits lifetime
• Protocol– Higher level protocol
• Header checksum
• IP Addresses– Source, Destination
• Options©The McGraw-Hill Companies, Inc., 2000
lecture_2
The Version Field• Version 3 (IEN 21)
– Stems from when TCP was being split into one component handling hop-by-hop communication (IP) and one component handling end-to-end communication (TCP). IEN 21 1 February 1978.
• Version 4 (RFC 791)– IPv4
• Version 5 (RFC 1190)– ST-II - Multimedia streaming protocol
• Version 6 (RFC 2460)– IPv6
lecture_2
The Length Fields• Header Length (4 bits)
– Size of IPv4 header including options.
– Expressed in number of 32-bit words (4-byte words)
– Min is 5 words (=20 bytes)– Max is 15 words (=60 bytes) – limited size limited use
• Total Length (16 bits)– Total length of datagram including header.– If datagram is fragmented: length of fragment.
– Expressed in bytes.• Max: 65535 bytes. (This is IPs length limit)
• Many systems only accept 8K bytes.
lecture_2
The Type of Service Field• Type of Service (ToS): 8 bits• Intended as a field for specifying Quality of Service on a
per-packet basis.• Few applications set the TOS field.
– Unless an added cost/policy check/… associated with usage of a precedence level - it is very likely going to be abused.
• Long history of experimental use– RFC 791 – original– RFC 1122, 1349, 1455 modified the meaning of the ToS field– Current proposal: RFC 2474
• Differentiated Services
– Early Congestion Notification (ECN): RFC 2481, 3168
lecture_2
The ToS Byte – Original proposal
• Original Proposal – RFC 791• Bits 0-2: Precedence
– Defines priority e.g., when packets must be dropped
• Bits 3-5: TOS– Bit 3: 0 = Normal Delay, 1 = Low Delay– Bit 4: 0 = Normal Throughput, 1 = High Throughput
– Bit 5: 0 = Normal Reliability, 1 = High Reliability.
Precedence TOS
Bit 0 Bit 7
lecture_2
DSField – Current Proposal
• Differentiated Services (DiffServ) proposes to use 6 of these bits to provide 64 priority levels - calling it the Differentiated Service (DS) field– RFC 2474– Bits 0-6: Differentiated Services CodePoint (DSCP)
• The DSCP is set when entering an area and determines the QoS handling of the IP datagram in the routers within that area– Scheduling – Shaping– Queue Dropping
• Explicit Congestion Avoidance (ECN)– ECN Capable Transport (ECT)– Congestion Experienced (CE)
DSCP
Bit 0 Bit 7
ECN
lecture_2
Fragmentation – MTU
• If the IP datagram is larger than the MTU of the link layer, itmust be divided into several pieces to fit the MTU – this is called fragmentation
©The McGraw-Hill Companies, Inc., 2000
lecture_2
Fragmentation cont’d• Physical networks maximum frame size
– MTU Maximum Transfer Unit.
• A host or router transmitting datagram larger than MTU of link must divide it into smaller pieces - fragments.
• Both hosts and router may fragment– But only destination host reassemble!– Each fragment routed separately as independent datagram
• In effect, only datagram service (e.g. UDP)– TCP uses 576 byte MTU or path MTU discovery
• 3 fields of the IP header concerns fragmentation
lecture_2
The Fragmentation Fields• Identification: 16 bits
– ID + src IP addr uniquely identifies each datagram sent by a host– The ID is copied to all fragments of a datagram upon fragmentation
• Flags: 3 bits– RF (Reserved Fragment) – for future use (set to 0)– DF (Dont Fragment).
• Set to 1 if datagram should not be fragmented.
• If set and fragmentation needed, datagram will be discarded and an error message will be returned to the sender
– MF (More Fragments)• Set to 1 for all fragments, except the last.
• Fragmentation Offset: 13 bits– 8-byte units: (ip ip_frag << 3)– Shows relative position of a fragment with respect to the whole datagram
lecture_2
Fragmentation Example – Offset
©The McGraw-Hill Companies, Inc., 2000
lecture_2
Fragmentation Example – Detailed
IPv4 hdrid=0, DF=0
UDP hdr Data
IPv4 hdrid=n, DF=0
MF=1, off=0
UDP hdr DataIPv4 hdr
id=n, DF=0MF=0, off=185
Data
8 bytes20 bytes
20 bytes 20 bytes8 bytes
1473 bytes
1472 bytes 1 byte
Offset = 185 185x8 = 1480 bytes
MTU = 1500 bytes
lecture_2
The TTL field• TTL - Time To Live: 8 bits• Limit the lifetime of a datagram - avoid infinite loops• A router receiving a TTL>1 decrements the TTL and
forwards it• A TTL <= 1 shall not be forwarded
– ICMP “time exceeded” is returned to the sender (later slide)
• Recommended value is 64• Should really be called Hop Limit (as in IPv6)
– Historically: Every router holding a datagram for more than 1 second should decrement the TTL by the number of seconds.
lecture_2
The Protocol Field
• Demultiplexing to higher layers
• Assigned by IANA – Internet Assigned
Numbers Authority
• A subset (out of 134) assigned
Reservation ProtocolRSVP46
IPv6 in IPv4IPv641
User DatagramUDP17
Transmission ControlTCP6
IP in IP (encapsulation)IP4
Internet Control MessageICMP1protocolkeyworddecimal
lecture_2
Header Checksum• Ensures integrity of header fields
– Hop-by-hop (not end-to-end)
– The header fields must be correct for proper and safe processing.
– The payload is not covered.
• Other checksums– Link-level CRC. IP assumes a strong L2 checksum/CRC. Hop-by-hop.
– L4 checksums, eg TCP/ICMP/UDP checksums cover payload. End-to-end.
• Internet Checksum Algorithm, RFC 1071– Treat header as sequence of 16-bit integers.
– Add them together
– Take the one’s complement of the result.
lecture_2
Summary• Basic Routing
– Connectionless, next-hop routing
– Routing tables: RIBs and FIBs
– Longest prefix match
• Address resolution– ARP
– RARP
• IP – Internet Protocol– Basic functionality
– Header fields