the internet: a distributed system nik/dist-sys.ppt
TRANSCRIPT
The Internet: A Distributed System
http://people.freebsd.org/~nik/dist-sys.ppt
Copyright © 2002 Nik Clayton
All rights reserved.
Redistribution and use, with or without modification, are permitted provided that the following condition is met:
• Redistributions of this presentation must retain the above copyright notice, this list of conditions and the following disclaimer.
THIS PRESENTATION IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Obligatory biographical bit
• Used to be [email protected]
• Now [email protected], [email protected]
• One of five running mail for Citigroup
– 11m msgs/week, 850MB/day
• Editor, “FreeBSD Handbook”
Looking at…
• How the Internet works
• How the Domain Name System (DNS) works on top of this
• How the Simple Mail Transport Protocol uses both of these to shuffle e-mail around the place
So, how does the Internet work?• Three key protocols involved:
– IP: Internet Protocol
– UDP: User Datagram Protocol
– TCP: Transmission Control Protocol, often written TCP/IP
• IP is lowest layer, UDP and TCP sit on top of it.
• Not going to look at the physical layer (ethernet, etc)
• Not going to look at IPv6
Internet and the OSI 7 layer model
7 Application
TELNET
RFC 854
FTP
RFC 959
SMTP
RFC 821
SNMP
RFC 1098
DNS
RFC 10346 Presentation
5 Session
4 Transport TCP
RFC 793
UDP
RFC 768
3 Network ARP
RFC 826
RARP
RFC 903
ICMP
RFC 792
BOOTP
RFC 951
IP
RFC 791
2 Link 802.2
802.3 802.5 Other Medium Access
Protocols
1 Physical
The 7 Layer Burrito
7. Sour cream
6. Cheese
5. Guacamole
4. Tomato
3. Lettuce
2. Seasoned rice
1. Refried beans
A Networking Analogy
• Two office blocks, each contains a number of different companies
• Each company has one or more phone numbers (so there are several phone numbers for the office block)
• Each phone number has a few hundred extensions
• To call anyone, you need their company phone number, and their extension
• 4 numbers identify any call -- source phone number, source extension, destination phone number, destination extension
A Networking Analogy (cont.)
• Imagine if everybody agreed on certain standard phone extenions.
– #25 gets you to the mail room
– #80 is the marketing department
– #123 calls the speaking clock
• That’s almost how the Internet works
In an IP network…
• You have a host (an office building)
• Each host has one or more network interfaces (companies within the building)
• Each interface has one or more IP addresses attached to it (phone numbers)
• Each interface has 65535 ports (extensions)
• Connections are made from a port on an IP address to another port on an IP address
• 4 numbers identify a connection on the Internet -- source IP address, source port, destination IP address, destination port
In an IP network…
• You have a host (an office building)
• Each host has one or more network interfaces (companies within the building)
• Each interface has one or more IP addresses attached to it (phone numbers)
• Each interface has 65535 ports (extensions)
• Connections are made from a port on an IP address to another port on an IP address
• 4 numbers identify a connection on the Internet -- source IP address, source port, destination IP address, destination port
Packet switching
• Internet is a packet switched network
• Data is split into packets
• Each packet has a source IP/port, and a destination IP/port, as well as other meta-information
• Packets may not arrive in the same order as sent
• Packets may not even arrive at all
IP Address: A definition
• 32 bit number
– So there are 232 = 4,294,967,296 of them
• Normally written as 4 * 4 octet values, e.g., 10.10.1.1 (dotted quad notation)
• Are assigned by the network people, who arranged a block of addresses for the company, who were given them by your ISP, who was allocated them by their regional IP authority, who were assigned a regional block by the Internic.
So, tell me what ports are
• Like a telephone extension
• Each IP address has 216 - 1 = 65535 ports
• A server listens on an IP address:port pair for incoming connections
• A client is typically allocated a port at random for outgoing connections, and specifies the destination port it wants to connect to
• Some services (mail, web, etc) have “well known ports” assigned that servers are expected to listen on (25, 80, etc)
Networks are groups of IP addresses• IP addresses are grouped into collections,
called networks
• Network membership is determined by the netmask
• Netmask splits the IP address in to two portions; the host portion, and the network portion
• Two hosts are in the same network if the network portions of their IP addresses are identical
How netmasks work
• 10.10.1.1 is really
00001010 00001010 00000001 0000000110 10 1 1
• and 10.10.2.1 is really
00001010 00001010 00000010 0000000110 10 2 1
How netmasks work (cont.)
• Netmask is another 32 bit binary number
• It is binary-ANDed with the IP address
• All bits still on after this form the network portion of the IP address
• All bits left off are the host portion
How netmasks work (cont.)
• IP: 10.10.1.1
• Netmask: 255.255.255.0
• 00001010 00001010 00000001 0000000110 10 1 1
AND11111111 11111111 11111111 00000000255 255 255 0
=00001010 00001010 00000001 0000000010 10 1 0
• So this is the .1 host in the 10.10.1.0 network
How netmasks work (cont.)
• Netmask doesn’t have to be a continuous string of 1s, then continuous 0s
– 170.170.170.0
– 10101010 10101010 10101010 00000000
• That would be bloody stupid though
• In practice, netmasks are all 1s, then all 0s
How netmasks work (cont.)
• Leads to another common notation for netmasks, /n
• /24 means 24 x 1, then all 0
– 11111111 11111111 11111111 00000000
– Same as 255.255.255.0
• /16 would 16 x 1, then all 0
– 11111111 11111111 00000000 00000000
– Same as 255.255.0.0
How netmasks work (cont.)
• Are these two hosts on the same network?
– 10.10.1.1/24
– 10.10.2.1/24
• No. The first is on the 10.10.1.0 net, the second is on the 10.10.2.0 net
• What about these?
– 10.10.1.1/16
– 10.10.2.1/16
• Yes, they’re both on the 10.10.0.0 net
How netmasks work (cont.)
• Netmasks do not need to be on an octet boundary
– 11111111 11111111 11111111 11000000
– 255.255.255.192
– /26
– 10.10.1.33 = 00001010 00001010 00000001 00100001
– 10.10.1.67 = 00001010 00001010 00000001 01000011
The Network Addresses
• Network address is used to indicate the whole network
• No host can be given the network address
• Consists of the network portion as normal, with the host portion set to all zero
• 10.10.1/24, the network address is 10.10.1.0
• 10.10.1/26 defines four networks
– 10.10.1.0 = 00001010 00001010 0000001 00000000
– 10.10.1.64 = 00001010 00001010 0000001 01000000
– 10.10.1.128 = 00001010 00001010 0000001 10000000
– 10.10.1.192 = 00001010 00001010 0000001 11000000
The Broadcast Addresses
• Broadcast address is used to send to all hosts on the network
• No host can be given the broadcast address
• Consists of the network portion as normal, with the host portion set to all ones
• 10.10.1/24, the broadcast address is 10.10.1.255
• 10.10.1/26 defines four networks and broadcast addresses– 10.10.1.63 = 00001010 00001010 0000001 00111111
– 10.10.1.127 = 00001010 00001010 0000001 01111111
– 10.10.1.191 = 00001010 00001010 0000001 10111111
– 10.10.1.255 = 00001010 00001010 0000001 11111111
Shrinking address space
• /24 has 256 host addresses available
– .0 through to .255
• Lose .0, reserved for network
• Lose .255, reserved for broadcast
• Leaves you with (256 - 2) = 254 available addresses for hosts
Shrinking address space (cont.)• /25 creates two networks
• .0 network
– Network address is .0
– Broadcast address is .127
– Host addresses are .1 through to .126 (126 addresses)
• .128 network
– Network address is .128
– Broadcast address is .255
– Host addresses are .129 through to .254 (126 addresses)
• Only (126 * 2) = 252 available host addresses now
Smaller subnets, fewer hosts
• /26 network has four networks
• Each network reserves 2 addresses
• So there are 4 * 2 = 8 addreses reserved
• 256 - 8 = 248 host addresses available
• And so on
Routing
• Hosts on the same network can contact each other directly
• E.g., 10.10.1.1/24 wants to talk to 10.10.1.2/24.
• It puts a packet on the wire with a destination address of 10.10.1.2, and 10.10.1.2 receives it
• It’s like magic, you don’t need to know how this bit works, it just does
• If you become a network administrator, you will learn, in long, tedious detail, how this magic works
Routing (cont.)
• Hosts on two different networks can’t talk directly, they need a router to route the packets between them
• A router is a device with at least 2 network interfaces present on 2 or more different networks
• Hosts send packets for other networks to the router
• Router looks at the destination address information in the packet, and works out where to send it
Routing (cont.)
• Each Internet host has to maintain a routing table
• The routing table details how packets get from a to b
• The routing table only contains information about the networks the host is directly connected to
Routing (cont.)
10.10.1.2/24
10.10.2.2/24
Internet
10.10.2.1/24
10.10.1.1/2480.194.99.103/24
Routing (cont.)• Here’s the routing table for the workstations on the 10.10.1/24 network
• If it’s on the local network then we know we can reach it directly
• Otherwise send it on to the router, and hope that it knows how to deal with it
Destination Gateway
10.10.1/24 Local interface
Default 10.10.1.1
Routing (cont.)• Here’s the routing table for the workstations on the 10.10.2/24 network
• If it’s on the local network then we know we can reach it directly
• Otherwise send it on to the router, and hope that it knows how to deal with it
Destination Gateway
10.10.2/24 Local interface
Default 10.10.2.1
Routing (cont.)
• Here’s the routing table for the router
Destination Gateway
10.10.1.0/24 Interface 1
10.10.2.0/24 Interface 2
Default Interface 3
Routing (cont.)
• This is very scalable
– No host needs to know the complete route to the destination, or the Internet’s topology
– They just need to know the IP address of the nearest router
– The nearest router hands it off to the next nearest router, and so on
User Datagram Protocol (UDP)
• Runs on top of IP
• Connectionless, just send data
– No guarantee packets will be delivered in order, the applications must deal with this
– No guarantee packets will even arrive, applications must resend data as necessary
– A bit like the Post Office
• But very low overhead
Transmission Control Protocol (TCP)• Runs on top of IP
• Connection oriented (open/send/close)
• Network stack ensures
– Packets are delivered to the application in the correct order
– Missing packets are automatically resent
• Has more overhead than UDP, particularly on the intial connection (three way handshake)
• Handles network congestion well
Internet summary
• Hosts have interfaces
• Interfaces have IP addresses
• IP addresses subdivided in to the network portion and the host portion by the netmask
• Subdividing networks consumes available IP addresses (for network and broadcast address)
• Hosts on the same network can talk to one another directly
• Hosts on different networks need to know the address of the correct router to use
Internet Summary (cont.)
• Data sent using either UDP or TCP
• UDP is faster, but the application has to do more book keeping
• TCP starts slower, but the application has to do less work
IP Design Good Points
• Very scalable
• Easy to understand, simple rules
• Does not enforce specific policy
– Networks can be any size
– Does not require particular cabling standard
– Hardware and OS agnostic
• Open
IP Design Bad Points
• Large networks send a lot of meta data around
– Hosts announcing themselves
• Basic IP design is not secure
– Easy to spoof the source address on a packet
– Leads to denial of service attacks
– Malicious router can sniff traffic, or replace data
– Security in layers 5, 6, and 7 (SSL, SSH, etc)
Domain Name System
(DNS)
The Definitive Reference
• DNS and BIND, Paul Albitz & Cricket Liu
• Everything you ever wantedto know about the DNS
• Can’t recommend this bookhighly enough
IP Addresses are a pain
• Working with IP addresses is
– Cumbersome
– Error prone
– Hard to remember
• We prefer to name things where possible
• Which is why we have domain names
Fully Qualified Domain Names
• FQDN is two or more names, separated by dots
• L/R, the first part is the host name
• The rest is the domain name
• IP addresses are mapped to FQDNs
• FQDNs are mapped back to IP addresses
• How?
One way: The hosts file
10.10.1.1 gateway.example.com
10.10.1.2 me.example.com
10.10.1.3 another.example.com
. . .
This does not scale (!)
So the DNS was invented
• A hierarchical name space, read from right to left
• me.example.com (FQDN) is
. <- The root
.com <- Top level domain
.com.example <- Sub-domain
.com.example.me <- FQDN
• Converting a hostname to an IP address is called “resolving” the address
• “zone” and “domain” are almost interchangable terms
How the DNS is used
• 3 types of host
– DNS servers know how addresses and names map to one another for one or more domains
– DNS caches, given a domain, know how to find out which DNS server knows about that domain, and query it for info
– DNS clients (resolvers) know how to talk to caches
• DNS clients contact their nearest cache when they need to resolve an address. The cache works out which DNS server will have this information, and makes the queries
The root nameservers
• 12 (or so) machines, scattered around the world, that know the nameservers immediately below them
• Every DNS server in the world needs to know the IP addresses of the root nameservers
• That’s the only bit of static configuration required
• Everything else is looked up as necessary
• Which is pretty cool
DNS Hierarchy
.co.uk
www.brunel.ac.uk
brunel.ac.uk
src.doc.ic.ac.uk
doc.ic.ac.uk
ic.ac.uk
.ac.uk
.uk
.net
www.freebsd.org freefall.freebsd.org
freebsd.org slashdot.org
.org
citigroup.com
.com ...
GTLD Nameservers
Root Nameservers
Primary and Secondary DNS
• Each domain has exactly one primary (master) DNS server, and 0 to ‘n’ secondary (slave) servers
• To a client, there is no distinction between the two
• DNS information is updated on the primary DNS server
• Secondary servers periodically check for updates, and copy changes over as necessary
DNS in action
• dns.example.com is the local DNS cache
• me.example.com is a host that uses the DNS server
• You are a user running applications on me.example.com
• You type ‘www.freebsd.org’ in your web browser
• What happens?
DNS in action (cont.)
• First, me.example.com checks to see if it knows the IP address of www.freebsd.org
• It doesn’t
• So it sends a DNS query to dns.example.com
• This query says “Please give me the A record for the FQDN www.freebsd.org”
DNS in action (cont.)
• dns.example.com knows nothing about www.freebsd.org
• So it asks one of the root name servers
• They don’t know either, but they say “Go talk to the .org name servers, here’s their IP addresses”
• So dns.example.com goes and asks the .org name servers
DNS in action (cont.)
• They say “We don’t know, but we do know that ns.freebsd.org is the nameserver that’s authoritive for *.freebsd.org, here’s its address, go ask it”
• So dns.example.com says to ns.freebsd.org “Please give me the A record for www.freebsd.org”
• ns.freebsd.org says “Sure, it’s 216.136.204.117”
DNS in action (cont.)
• dns.example.com caches this information (so if it’s asked again it doesn’t need to redo all the above), and sends the info back to me.example.com
• All this happens in a few seconds
• This is what your browser is doing when it says something like “Resolving hostname”
Other types of DNS record
• That example used “A” records
– They map FQDNs back to IP addresses
• Called a “Forward” lookup
• Not the only type of records in the DNS
– PTR records map IP addresses to FQDNs
• Called a “Reverse” lookup
– NS records list the domain’s name servers
– MX records are used for mail routing
– SOA record is the ‘Start of Authority’
SOA Record
• Every zone has one SOA record
• Describes characteristics for the zone
– Serial number, which is incremented every time the data changes
– Time-to-live, which says how long data should be cached for
– E-mail address of DNS info maintainer
Example of a DNS Zone File
$ORIGIN brunel.ac.uk.brunel.ac.uk. IN SOA sirius.brunel.ac.uk. hostmaster.brunel.ac.uk.
(2002103001 ; Serial number 8000 ; Refresh after 2hrs 13min 7200 ; Retry after 2hrs 604800 ; Expire after 1wk 21600 ; Minimum TTL of 6hrs
)
IN NS sirius.brunel.ac.uk.IN NS ns3.ja.net.
IN MX 5 nemesis.brunel.ac.uk.IN MX 4 eros.brunel.ac.uk.
s70n133 IN A 134.83.70.133s249n88 IN A 134.83.249.88s249n90 IN A 134.83.249.90
… … …
IP Characteristics of DNS
• DNS servers listen on port 53
• Generally uses UDP
– Very short communication lifespan
– TCP overhead is too high
– Protocol is simple and robust
• Didn’t get an answer? Just send the query again
• May use TCP where appropriate
– Zone transfers between primary and secondary servers
Smart things about DNS
• Simple mechanism for synchronising primary and secondary servers
• Distributes data throughout the network, no real single point of failure for the Internet
– With the exception of the root nameservers
– DDoS Attacks
Bad things about DNS
• Not secure, you have to trust your DNS server
– Always do a forward lookup after a reverse lookup
• DNS server is a single point of failure for a network’s presence on the Internet
– So make sure that multiple secondary servers exist
– On different, geographically disparate networks
Bad things about DNS (cont.)
• Difficult to do updates ‘on demand’
– There are enhancements that try to address this
– But they’re not widely deployed
– Commercial interests
Simple Mail Transport Protocol
(SMTP)
SpaM Transport Protocol
What it sometimes feels like
A word from our sponsor…
• Wed 13th to 16th November 2003
• Compass Theatre, Ickenham
• £5.00, £6.50 or £7.50
• 07050 605081
• I’m in it as myself.
• “Nail it to the counter Lord Fergason and damn the cheesmongers!”
An e-mail message consists of…• Envelope
– Contains addressing information
– Discarded once the message is successfully delivered
• Header
– Contains 1-n “name: value” fields
– From:, To:, CC:, BCC:, Subject:, Date:, Received:, X-Foo:, X-Bar:, etc…
• Body
– Unstructured text of the actual message
Sample SMTP conversation# telnet eros.brunel.ac.uk 25220 ************HELO ngo.dnsalias.org250 eros.brunel.ac.uk OKMAIL FROM: [email protected] 2.1.0 OKRCPT TO: [email protected] 2.1.5 Recipient OKDATA354 Enter Mail, end by a line with only ‘.’From: [email protected] (Nik Clayton)To: [email protected] (Simon Taylor)Subject: Slides for lecture
Sorry mate, no chance I’ll have the slides ready in time, we’ll need to fake something. But keep it toyourself, I don’t think they’ll notice.
Nik.250 2.1.5 Submitted & queued (msg.22684-0)QUIT221 2.0.0 eros.brunel.ac.uk says goodbye to ngo.dnsalias.org
SMTP Highlights
• Protocol is entirely plain text
– Easy to debug
– Easy to test by hand
– Easy to script
• Protocol is relatively simple
– Easy to write code for (Microsoft excepted)
• Protocol is unambiguous
– All information is contained in the status codes. The explanatory text is useful but ignored by implementations
SMTP Highlights (cont.)
• Protocol is consistent
– 2xx codes indicate success
– 3xx codes indicate ‘send more data’
– 4xx codes indicate temporary failures
– 5xx codes indicate permanent failures
• The ‘xx’s provide further delineation
• SMTP implementations are supposed to be paranoid
A real SMTP failure
• We had an application that was a buggy SMTP server
• Sometimes it failed to send back a valid SMTP response after generating a bounce message
• The client didn’t know whether or not the message was delivered, temp. failed, or perm. failed
• So it tried, tens of times a second, to resend the message
• This generated thousands of bounce messages very quickly
The Envelope and Bcc:
• From: [email protected]: [email protected]: [email protected]. . .
• 220 . . .MAIL FROM: [email protected] . . .RCPT TO: [email protected] 2.1.5 Recipient OK RCPT TO: [email protected] 2.1.5 Recipient OKDATA354 . . .From: [email protected] (Nik Clayton)To: [email protected] (Simon Taylor)
. . .
Sample Received: Lines
Received: from localhost ([email protected] [127.0.0.1])
by crf-consulting.co.uk (8.12.3/8.12.3) with ESMTP id g9GFo4Tk093919
for <nik@localhost>; Wed, 16 Oct 2002 16:50:04 +0100 (BST)
(envelope-from [email protected])
Received: from ngo.org.uk [212.219.216.39]
by localhost with POP3 (fetchmail-5.9.11)
for nik@localhost (single-drop); Wed, 16 Oct 2002 16:50:04 +0100 (BST)
Received: from nemesis.brunel.ac.uk (nemesis.brunel.ac.uk [134.83.108.17])
by ngo.org.uk (8.9.3/8.9.3) with ESMTP id RAA07600
for <[email protected]>; Wed, 16 Oct 2002 17:01:18 +0100 (BST)
Received: from csstsjt (actually s76n96.brunel.ac.uk) by nemesis.brunel.ac.uk
with SMTP-BRUNEL (PP) with ESMTP; Wed, 16 Oct 2002 16:47:25 +0100
Re-ordered Received: lines
Received: from csstsjt (actually s76n96.brunel.ac.uk) by nemesis.brunel.ac.uk
with SMTP-BRUNEL (PP) with ESMTP; Wed, 16 Oct 2002 16:47:25 +0100
Received: from nemesis.brunel.ac.uk (nemesis.brunel.ac.uk [134.83.108.17])
by ngo.org.uk (8.9.3/8.9.3) with ESMTP id RAA07600
for <[email protected]>; Wed, 16 Oct 2002 17:01:18 +0100 (BST)
Received: from ngo.org.uk [212.219.216.39]
by localhost with POP3 (fetchmail-5.9.11)
for nik@localhost (single-drop);
Wed, 16 Oct 2002 16:50:04 +0100 (BST)
Received: from localhost ([email protected] [127.0.0.1])
by crf-consulting.co.uk (8.12.3/8.12.3) with ESMTP id g9GFo4Tk093919
for <nik@localhost>; Wed, 16 Oct 2002 16:50:04 +0100 (BST)
(envelope-from [email protected])
Re-ordered Received: lines
Received: from csstsjt (actually s76n96.brunel.ac.uk) by nemesis.brunel.ac.uk
with SMTP-BRUNEL (PP) with ESMTP; Wed, 16 Oct 2002 16:47:25 +0100
Received: from nemesis.brunel.ac.uk (nemesis.brunel.ac.uk [134.83.108.17])
by ngo.org.uk (8.9.3/8.9.3) with ESMTP id RAA07600
for <[email protected]>; Wed, 16 Oct 2002 17:01:18 +0100 (BST)
Received: from ngo.org.uk [212.219.216.39]
by localhost with POP3 (fetchmail-5.9.11)
for nik@localhost (single-drop);
Wed, 16 Oct 2002 16:50:04 +0100 (BST)
Received: from localhost ([email protected] [127.0.0.1])
by crf-consulting.co.uk (8.12.3/8.12.3) with ESMTP id g9GFo4Tk093919
for <nik@localhost>; Wed, 16 Oct 2002 16:50:04 +0100 (BST)
(envelope-from [email protected])
Acronyms
• MTA = Mail Transfer Agent
– The software that routes message from host to host (Sendmail, Postfix, Qmail, Exchange (cough))
• MUA = Mail User Agent
– The software that lets users send and receive e-mail (Outlook, Eudora, etc)
• PBCK = Problem Between Chair and Keyboard
– A user. See also “DFU”
Mail Routing
• I tap in [email protected] into my MUA. What happens?
• MUA hands message off to local MTA
• Local MTA uses the DNS to look up MX records for brunel.ac.uk
• MX record?
MX Records
• Are entries in the DNS
• Unlike most other DNS entries (A records, etc), they contain two pieces of information
– A FQDN
– A weight / preference
• A domain (brunel.ac.uk) may have multiple MX records, listing different FQDNs and weights, providing redundancy
• Hosts acting as MXs for a domain do not need to be in the same domain as the domain they are acting as MXs for (!)
Brunel and Citigroup MX recordsWeight Host
4 eros.brunel.ac.uk
5 nemesis.brunel.ac.uk
Weight Host
50 mail1.citigroup.com
50 mail2.citigroup.com
50 mail3.citigroup.com
50 mail4.citigroup.com
50 mail5.ssmb.com
Mail Routing (cont.)
• The local MTA sorts the MX results in order of their weight (lowest first)
• It does a DNS lookup for the IP address(es) of the first FQDN in the list
• It tries to connect to that IP address on port 25
• If the connection succeeds it tries to deliver the message
• If the connection fails, or the delivery attempt failed with a temporary error, it tries again, with the next MX record in the list
Mail Routing (cont.)
• The MTA will queue messages for a period of time (5 days is typical)
• It will make regular attempts to re-deliver messages that generated temporary failures
– Failure after a certain period (normally 4 hours) may generate a “We are still trying to deliver your message” note to the envelope sender address
• Messages that generate a permanent failure from any of the MX hosts are not retried, and are bounced
• Bounces go to the envelope sender address, not the From: address
Exchange Servers
Citigroup Mail Backbone Structure
Anti-spam
Address re-writing
Archiving
Anti-virus
Internet
IP Characteristics of SMTP
• SMTP servers listen on port 25
• Always uses TCP
– Relatively long communication lifespan
– TCP overhead is acceptable
– TCP ensures packets are resent as necessary
Extending SMTP
• Turns out that, as originally specified, SMTP doesn’t do some useful things
• So ESMTP was invented
• But how do you do this without breaking all the existing implementations?
• Hmm…
Extending SMTP (cont.)
• Get out clause in the original SMTP spec
• If an SMTP server receives a command it doesn’t understand, it:
– Does not drop the connection
– Returns an error code (5xx)
– Pretends it never received the command
• Robustness in action, and a stroke of genius
Extending SMTP (cont.)
• EHLO - Extended HELO
• Replaces ‘HELO’ in the beginning of the SMTP spec
• If a server responds to EHLO with a 2xx code you know it speaks ESMTP
• If it responds with a 5xx code then you fall back to regular SMTP, and immediately send a HELO.
EHLO in action
220 issaspam-ny01.ssmb.com ESMTP Go aheadEHLO ngo.dnsalias.org250-issaspam-ny01.ssmb.com Hello 250-ENHANCEDSTATUSCODES250-PIPELINING250-8BITMIME250-SIZE 26214400250-DSN250-DELIVERBY250 HELPMAIL FROM: [email protected] . . .
EHLO failing
220 smtp.example.comEHLO ngo.dnsalias.org502 Error: command not implementedHELO ngo.dnsalias.org250 OKMAIL FROM: [email protected] . . .
A better way of solving the problem• Always embed version information in
to your protocols
• The version should be the first piece of information in any transaction
• Defines the format of the rest of the transaction
• But, still allow unimplemented commands to fail gracefully
Nice things about SMTP
• It’s distributed from the get-go, and it scales
– Need more servers? Add them, and update your MX records
• It’s open and royalty free
– SMTP is fully documented in RFC2821
– Message format is in RFC2822
• Heterogenous
– Nothing in SMTP ties it to a particular platform
More nice things about SMTP
• It’s resilient, and failures are handled
– MX server not responding? Go try another one
– Are they all down? Wait a bit, and try again
– It distinguishes between temporary errors• Disk’s full, I can’t accept any mail at the moment, so try
again letter
– And permanent errors• The e-mail address you’ve provided is invalid, I’m never
going to be able to deliver it.
• Hides implementation details from the user
– User doesn’t need to know the route the message takes
Nice things about SMTP..?
• Secure?
– Not really
– Relatively simple to forge mail
– Harder to forge it perfectly
– Does not address encryption or authentication of message contents
• Nobody’s perfect
Thanks
Questions?
Bonus Slides
Things I wish I knew 10 years ago• Work for a small company
– You learn a lot very quickly
– The hours can be insane
– You can accomplish a lot very fast
• Work for a large company
– You tend to specialise
– Regular hours
– Bureacracy is ever-present
More things to know
• Attend conferences
– You learn a lot
– The networking (people kind) is invaluable
– Speaking at them is great for the CV
• It also forces you to think clearly about a subject
– Never neglect the social side
• Travel whenever possible
– San Francisco is great in the summer
Still more things to know
• Always be aware of the Peter Principle
• Read “The Mythical Man Month”, Brooks
• Learn the Perl programming language
• Stay up to date with the technical journals
• Find time to have a life
Pseudo-code for a server
int s; // The socket handlesockaddr_t addr; // The socket addressint client; // Address info of the client
addr.sin_port = 80; // We’ll listen on port 80
s = socket(AF_INET, SOCK_STREAM, 0); // Create socket
// Assign the address info we specify to the socketbind(s, &addr, sizeof(sockaddr_t));
listen(s, 5); // 5 incoming connections at once
while(accept(s, &addr, &client)) {// If we’re here then something’s connected to us.// Do whatever we’re supposed to do when this happens
}
Pseudo-code for a client
int s; // The socket handlesockaddr_t addr; // The socket addressstruct hostent *he; // Info about the remote host
s = socket(AF_INET, SOCK_STREAM, 0); // Create socket
// Get the IP address of the host we want to connect tohe = gethostbyname(“www.freebsd.org”);
// Store the IP address, and the port we connect toaddr.sin_addr.s_addr = *((int *) he->h_addr_list[0]);addr.sin_port = 80;
if(connect(s, &addr, sizeof(addr)) == 0) {
// Connected to the remote host.// …
close(s); // All done}
User
me.example.com
dns.example.com
Root Nameserver
.org Nameserver
ns.freebsd.org Nameserver