building network resilience – a ucl view - jisc · building network resilience – a ucl view bob...
TRANSCRIPT
Building Network Resilience – a UCL View
Bob Lawrence Network Services Group, Information
Systems University College London
Definition
résile´ (-z-), v.i … ; have or show elasticity or buoyancy or recuperative power. Hence or cogn. résil´iENCE, résil´iENCY, nn., résil´iENT a., …
The Concise Oxford Dictionary of Current English
Agenda
Technology
Pros/Cons
Guidelines
Campus Network
Connects departments, merged institutes, UCL Medical School
Other HEIs in the Bloomsbury area connected at border devices, ie. Birkbeck College, IoE, LSHTM, SOAS
Third parties connected under JANET sponsored connection arrangements eg. British Museum, National Gallery, Institute for Fiscal Studies
UCL part of network has >70000 data points, and >30000 connected end systems
Topology
Access (Cisco Catalyst 2950, 2960, 3750, 4948, 3011X/3012 IBM BladeCenter)
Distribution (Cisco Catalyst 6509/Sup720, 3550, 3750)
Core (Cisco Catalyst 6509/Sup720) Institutional Firewall (Cisco Catalyst 6500
FWSM) Border (Cisco Catalyst 6509/Sup720) Other (Cisco Catalyst 6500 FWSM, ACE,
WLSM)
Figure 1 UCL Multi-tier switch schematic
Border 2 x 6500
Core 2 x 6500
Distribution ~ 20 x 6500
IFW 2 x FWSM
Access > 1500 various
Catalyst 6500 network
Cisco Catalyst 6500 switched network from Easter 2000 onwards
Investment protection 20+ 6509s in distribution/core/border layers Distribution switches have dual uplinks Four 6504s for “special” projects 1Gb/10Gb interconnectivity Other 6500 modules, eg. FWSM, ACE,
WLSM Diverse fibre routing (eg. across public
roads)
Figure 2 6500 switch network
Border 2 x 6500
Core 2 x 6500
Distribution ~ 20 x 6500
Routing technology
Distribution-Core layer – Dual 1/10Gb links per distribution device into core
network – Each link is a routed point-to-point “Switched
Virtual Interface” – Active/active with L3 load-sharing courtesy of
routing protocol – Cisco proprietary EIGRP – Routing convergence comparable with a Link
State Routing Protocol (typically sub-one second)
Routing technology
Core-Border layer – Dual 10Gb links between Core and Border
switches – All forwarding via IFW – EIGRP internally, static routing externally
Routing technology
Border-Provider layer – Regional provider is LMN – BGP peering arrangement with provider
(aggregated prefixes out/default in) – Internal peering between border routers – Originally active/standby configurations – Now 10Gb (active/active policy-based forwarding)
to Stewart House/Imperial College
iBGP
eBGP eBGP
Figure 3 MAN/Border peerings (both links forwarding)
0.0.0.0/0
UCL prefixes
IFW
KLB MAN
Stewart House Imperial College
Wolfson House MAN
10Gb/s 10Gb/s
Other prefixes
Institutional FireWall (IFW)
Cisco Catalyst 6500 FWSM in each core switch
Inside/outside/DMZ virtual interfaces Active/Standby configuration with stateful
failover 10Gb between core switches Default permit outbound, default deny
inbound policies
Figure 4 IFW resilient configuration (normal)
HSRP
HSRP
Failover
outside
inside
HSRP
HSRP
Failover
Figure 5 IFW resilient configuration (secondary FWSM active)
outside
inside
Server Load Balancing
Cisco Catalyst 6500 Application Control Engine (ACE), in Wolfson House/Torrington Place Edge 6509 switches
Supports Server Load Balancing of web, email, and other corporate applications
Virtualisation providing multiple SLB virtual machines
Active/Active configuration with stateful failover
Server Load Balancing
Provides for flexible management of real servers in server farms
Health monitoring of real servers Graceful insertion/removal of real servers “Sticky” server farms where necessary Sub-one second failover
Core
ACE/Switch
ACE
“HSRP”
“HSRP”
Real server Vlan eg. 144.82.108.0/24
Figure 6 ACE/SLB resilient configuration (normal)
Data centre
2x6509 switches Dual 1Gb/10Gb uplinks from access
switches to 6500 switches 2x10Gb bonded between 6500 switches 4x10Gb to core switches VLANs distributed across uplinks so both
links active “Uplink fast” to avoid default spanning tree
behaviour Configuration replication between 6500s HSRP active/standby routing for hosts
Data centre
Two major data centres, geographically dispersed
Services replicated within/across data centres
Full 10Gb mesh between 6509s 1xFWSM per data centre for resilient
firewalling service 1xACE per data centre for resilient load-
balancing service
1GB/10GB uplink port Up/forwarding
WH-A WH-B
Figure 7 Dual uplinks from access switch (one blocking)
1GB/10GB uplink port Up/blocked
Access layer
Distribution layer
1GB/10GB uplink port Down/down
WH-A WH-B
Figure 8 Dual uplinks from access switch (one down)
1GB/10GB uplink port Up/forwarding
Access layer
Distribution layer
WH-A Priority=100
.253
HSRP
TP-A Priority=95
Gateway=.254
Figure 9 Gateway resilience with HSRP (normal)
.252 .254
WH-A Priority=90
.253
HSRP
TP-A Priority=95
Gateway=.254
Figure 10 Gateway resilience with HSRP (tracked interface down)
.252 .254
Wireless
RoamNet/ eduroam / VisiNet services FWSM devices in core switches WLSM devices in distribution switches Active/standby firewall configuration Active/standby wireless tunnel termination in
core switches Active/standby WLSM configurations
providing management of APs and user L2/L3 roaming between APs
Pros/Cons
Pros: – Continuity of service – Limits disruption due to power outages – Permits seamless infrastructure upgrades – Permits seamless server upgrades/maintenance – Helps defend against infrastructure “bugs” – Helps defend against “own goals”
Cons: – Cost (doubles up on hardware) – Complexity – Verification
Guidelines
Simplicity! Design resilience in from outset (eg. in High
Level Designs) Avoid features which add nothing useful or
whose cost is excessive complexity or unacceptable risk (eg. VTP)
Protect your network against misconfiguration by others (eg. filter routing updates)
Isolate research networks from production facilities (eg. RCN, Legion)
Out-of-band network management
Guidelines (cont)
Use virtualisation where possible (eg. router interfaces, firewalls, load balancers)
Set STP roots Use NTP Avoid changing timers! Over-provision bandwidth where possible
(avoids compensatory complexity, eg. QoS) Single vendor, end-to-end
Challenges
Remote Access project New wireless network VoIP network
Final thoughts
“However beautiful the strategy, you should occasionally look at the results.”
Winston S Churchill