CERN DNSLoad Balancing
Vladimír Bahyl IT-FIONicholas Garfield IT-CS
13 February CHEP 2006, Mumbai, India 2
Outline Problem description Possible solutions DNS – an ideal medium Round robin vs. Load Balancing Evolution of DNS setup at CERN Application Load Balancing system
Server process Client configuration
Operational examples Statistics Conclusion
13 February CHEP 2006, Mumbai, India 3
Problem description User expectations of IT services:
100% availability Response time converging to zero
Several approaches: Bigger and better hardware (= increasing MTBF) Redundant architecture Load balancing + Failover
Situation at CERN: Has to provide uninterrupted services Transparently migrate nodes in and out of production
Caused either by scheduled intervention or a high load
Very large and complex network infrastructure
13 February CHEP 2006, Mumbai, India 4
Possible solutions Network Load Balancing
A device/driver monitors network traffic flow and makes packet forwarding decisions
Example: Microsoft Windows 2003 Server NLB Disadvantages:
Not applications aware Simple network topology only Proprietary
OSI Layer 4 (the Transport Layer – TCP/UDP) switching Cluster is hidden by a switch behind a single virtual IP address Switch role also includes:
Monitoring of all nodes in the cluster Keep track of the network flow Forwarding of packets according to policies
Example: Linux Virtual Server, Cisco Catalyst switches Disadvantages:
Simplistic tests; All cluster nodes should be on the same subnet Expensive for large subnets with many services Switch becomes single point of failure
13 February CHEP 2006, Mumbai, India 5
Domain Name System – ideal medium Ubiquitous, standardized and globally accessible database Connections to any service have to contact DNS first Provides a way for rapid updates Offers round robin load distribution (see later)
Unaware of the applications Need for an arbitration process to select best nodes
Decision process is not going to be affected by the load on the service
Application load balancing and failover
13 February CHEP 2006, Mumbai, India 6
DNS Round Robin
lxplus001 ~ > host lxpluslxplus.cern.ch has address 137.138.4.171 (1)lxplus.cern.ch has address 137.138.4.177 (2)lxplus.cern.ch has address 137.138.4.178 (3)lxplus.cern.ch has address 137.138.5.72 (4)lxplus.cern.ch has address 137.138.4.169 (5)
lxplus001 ~ > host lxpluslxplus.cern.ch has address 137.138.4.177 (2)lxplus.cern.ch has address 137.138.4.178 (3)lxplus.cern.ch has address 137.138.5.72 (4)lxplus.cern.ch has address 137.138.4.169 (5)lxplus.cern.ch has address 137.138.4.171 (1)
Allows basic load distribution
No withdrawal of overloaded or failed nodes
13 February CHEP 2006, Mumbai, India 7
lxplus.cern.ch has address 137.138.4.168lxplus.cern.ch has address 137.138.5.71lxplus.cern.ch has address 137.138.5.74lxplus.cern.ch has address 137.138.4.165lxplus.cern.ch has address 137.138.4.166
lxplus.cern.ch has address 137.138.4.171lxplus.cern.ch has address 137.138.4.177lxplus.cern.ch has address 137.138.4.178lxplus.cern.ch has address 137.138.5.72lxplus.cern.ch has address 137.138.4.169
lxplus.cern.ch has address 137.138.5.80lxplus.cern.ch has address 137.138.4.168lxplus.cern.ch has address 137.138.4.171lxplus.cern.ch has address 137.138.4.174lxplus.cern.ch has address 137.138.5.76
DNS Load Balancing and Failover Requires an additional server = arbiter
Monitors the cluster members Adds and withdraw nodes as required Updates are transactional
Client never sees an empty list
lxplus001 ~ > host lxplus
13 February CHEP 2006, Mumbai, India 8
DNS evolution at CERN – static model
PrimaryDNS Server
SecondaryDNS Server
Generated static data files AXFR or NFS
AXFR = full zone transferIXFR = incremental zone transferzone = [view, domain] pair (example: [internal, cern.ch])
Network DatabaseServer
DelegatedPrimary
DNS Server
DelegatedSecondaryDNS Server
Arbiter
AXFR
Segmented DNS name space – Sub-Domain delegation
Not scalable.
No high-
availability.
Obsolete.
13 February CHEP 2006, Mumbai, India 9
DN
S 1
(e.g
. inte
rna
l vie
w)
DN
S 2
(e.g
. exte
rna
l view
)
Network DatabaseServer
DNS evolution at CERN – scalable model
MasterDNS Server
Slave 2A
Generated static data files
AXFR = full zone transferIXFR = incremental zone transferzone = [view, domain] pair (example: [internal, cern.ch])
Slave 2B
Slave 1A
Slave 1B
AXFR
AXFR
AXFR
AX
FR
No support for
frequent 3
rd party
updates.
Only full z
one
transfers.
13 February CHEP 2006, Mumbai, India 10
DN
S 1
(e.g
. inte
rna
l vie
w)
DN
S 2
(e.g
. exte
rna
l view
)
DNS evolution at CERN – 3rd party updates
Slave 2A
Generated static data files
AXFR = full zone transferIXFR = incremental zone transferzone = [view, domain] pair (example: [internal, cern.ch])
Slave 2B
Slave 1A
Slave 1B
AXFR
AXFR
AXFR
AX
FR
Network DatabaseServer
ArbiterUpdate trigger
DynamicData
DatabaseServer
MasterDNS Server
DynamicData
Multiple points
of failure.
Management
nightmare.
13 February CHEP 2006, Mumbai, India 11
DN
S 1
(e.g
. inte
rna
l vie
w)
DN
S 2
(e.g
. exte
rna
l view
)
DNS evolution at CERN – Dynamic DNS
MasterDNS Server
Slave 2A
AXFR = full zone transferIXFR = incremental zone transferzone = [view, domain] pair (example: [internal, cern.ch])
Slave 2B
Slave 1A
Slave 1B
AXFR or IXFR
AXFR or IXFR
AXFR
or I
XFR
AX
FR or IX
FR
Network DatabaseServer
Arbiter
Generated static data files
Dynamic DNS
13 February CHEP 2006, Mumbai, India 12
Application Load Balancing System
SNMP DynDNS
DNS ServerLoad BalancingArbiter
node1: metric=24node2: metric=48node3: metric=35node4: metric=27
ApplicationCluster
2 best nodes for application.cern.ch:
node1node4
`Q: What is the IP
address of application.cern.ch ?
A: application.cern.ch resolves to:
node4.cern.chnode1.cern.ch
Connecting to node4.cern.ch
13 February CHEP 2006, Mumbai, India 13
Load Balancing Arbiter Collects metric values
Polls the data over SNMP Sequentially scans all cluster members
Selects the best candidates Lowest positive value = best value Other options possible as well
Round robin of alive nodes Updates the master DNS
Uses Dynamic DNS With transactional signature keys (TSIG) authentication
At most once per minute per cluster
Active and Standby setup Simple failover mechanism Heartbeat file periodically fetched over HTTP
Daemon is: Written in Perl Packaged in RPM Configured by a Quattor NCM component
13 February CHEP 2006, Mumbai, India 14
Application Cluster nodes
SNMP daemon Expects to receive a specific MIB OID Passes control to an external program
Load Balancing Metric /usr/local/bin/lbclient
Examines the conditions of the running system Computes a metric value
Written in C Available as RPM Configured by a Quattor NCM component
13 February CHEP 2006, Mumbai, India 15
Load Balancing Metric System checks – return Boolean value
Are daemons running (FTP, HTTP, SSH) ? Is the node opened for users ? Is there some space left on /tmp ?
System state indicators Return a (positive) number Compose the metric formula
System CPU load Number of unique users logged in Swapping activity Number of running X sessions
Integration with monitoring Decouple checking and reporting Replace internal formula by a monitoring metric Disadvantage – introduction of a delay
Easily replaceable by another site specific binary
13 February CHEP 2006, Mumbai, India 16
Operational examples LXPLUS – Interactive Login cluster
SSH protocol Users log on to a server and interact with it
CASTORGRID – GridFTP cluster Specific application on a specific port Experimented with live evaluation of the network traffic by the metric
binary
WWW servers
Could be any application – client metric concept is sufficiently universal !
13 February CHEP 2006, Mumbai, India 17
State Less vs. State Aware System is not aware of the state of connections State Less Application
For any connection, any server will do Our system only keeps the list of available hosts up-to-date Example: WWW server serving static content
State Aware Application Initial connection to a server; subsequent connection to the same
server Our load balancing system can not help here Solution: after the initial connection the application must indicate to
the client where to connect Effective bypass of the load balancing Example: ServerName directive in Apache daemon
13 February CHEP 2006, Mumbai, India 18
LXPLUS statistics Selection process - 2 weeks totals comparision
0
200
400
600
800
1000
lxp
lus0
01
lxp
lus0
02
lxp
lus0
03
lxp
lus0
04
lxp
lus0
05
lxp
lus0
06
lxp
lus0
07
lxp
lus0
09
lxp
lus0
10
lxp
lus0
11
lxp
lus0
12
lxp
lus0
13
lxp
lus0
14
lxp
lus0
15
lxp
lus0
16
lxp
lus0
17
lxp
lus0
18
lxp
lus0
19
lxp
lus0
20
lxp
lus0
60
lxp
lus0
61
lxp
lus0
63
lxp
lus0
64
lxp
lus0
65
lxp
lus0
66
lxp
lus0
67
lxp
lus0
68
lxp
lus0
69
lxp
lus0
70
lxp
lus0
71
lxp
lus0
72
lxp
lus0
73
lxp
lus0
74
Nu
mb
er
of
tim
es
th
e n
od
e w
as
pa
rt o
f th
e D
NS
alia
s(s
ele
cte
d a
s o
ne
of
the
be
st
ca
nd
ida
tes
)
13 February CHEP 2006, Mumbai, India 19
Metric value of 2 nodes - 2 weeks comparision
0
100
200
300
400
500
600
700
800
1/11/2006 0:30 1/13/2006 0:30 1/15/2006 0:30 1/17/2006 0:30 1/19/2006 0:30 1/21/2006 0:30 1/23/2006 0:30
Me
tric
va
lue
re
turn
ed
by
th
e h
os
tg
oo
d
ca
nd
ida
te
ba
d
LXPLUS071 LXPLUS073
LXPLUS statistics
13 February CHEP 2006, Mumbai, India 22
Conclusion Dynamic DNS switching offers possibility to implement
automated and intelligent load-balancing and failover system Scalable
From two node cluster to complex application clusters Decoupled from complexity of the network topology
Need for an Arbiter Monitor the cluster members Select the best candidates Update the published DNS records
Built around OpenSource tools
Easy to adopt anywhere
13 February CHEP 2006, Mumbai, India 23
Thank you.
http://cern.ch/dns(accessible from inside CERN network only)
Vladimír Bahyl
http://cern.ch/vlado