building physical in a virtual world
TRANSCRIPT
Who am I?
Infrastructure Operations @ HootSuite
Chris Maxwell!Lead Operations Engineer!@[email protected]!
Previously
Coral Princess, 2010!Left: bow thrusters, core network!Right: improvised cooling!!
Princess Cruises – Drydock / datacenter refit team
Why should I listen to you?
Just a guy who’s been in the trenches a long time.
• Learned to code in C long ago. BSD kernel hacking, secure messaging, managed security appliances, nomadic file systems.!
• >1000 wireless access points deployed to 14 cruise ships!
• 6 Cisco core network replacements from Nortel Passport!
• First live-voyage core network replacement (Diamond Princess)!
• Built 22 broadband wireless towers (of 75)!
• Regional Voice-over-IPX (DSP on OS/2 over Novell !)!
Why HootSuite went physical
“unique” workload: • 95% write • 12TB dimension • I/O bound • Noisy
neighbours • pre- PIOPS
(AWS 100io/vol) • Need >68GB • No lock-in
What is “cloud”
Not a cloud definition slide!
• Just datacenter best practices from 1998 (infrastructures.org)!
• Gold disk deploy - AMI!
• Version Control - config mgmt!
• Automate everything - APIs!
Cloud is like cutting your legs off at the knee - stop trying to walk somewhere, just clone a new server in place – me.!
Compromising
Balancing best vs. budget
• We chose software routers. OpenBSD + OpenBGPD on Dell!
• We chose Cisco core switching!
• We chose software firewalls. OpenBSD + PF on Dell!
• We chose CloudStack on VMware!
• We chose SAN + iSCSI!
Compromising
We chose software routers. OpenBSD + OpenBGPD on Dell
• OpenBSD is secure, OpenBGPD is stable!
• Scales to 1.5-2 Gbps per host, depending on packet size!
• Redundant pairs instead of internally redundant (live upgrades!)!
• Ops team understands BSD tools!
• Added support for Intel 520 (82599) 10GE NICs!
• Much lower cost than hardware routers!
Compromising
We chose Cisco core switching
• Cisco is solid. Cisco engineers can be hired!
• OSPF with millisecond timers = sub-second convergence!
• Wanted 10Gig in the network core!
• Needed minimal port count!
• Ops team has Cisco experience.!!
Compromising
We chose software firewalls. OpenBSD + PF on Dell
• OpenBSD is secure, PF is stable!
• Scales to 1-1.5 Gbps per host, depending on states/rules (~300k)!
• CARP + Pfsync is great! We run Active+Standby, alternating Masters.!
• Redundant pairs instead of internally redundant (live upgrades!)!
• Ops team understands BSD tools. Scripts sync security groups from AWS to PF tables.!
!
Compromising
We chose CloudStack on VMware
• 2012: CloudStack more mature than OpenStack!
• Wanted VMware hypervisor for core data services (MySQL, Mongo)!
• We use vMotion + HA on core services!
• Did not want vendor lock-in, layered CloudStack for future options!
• Original plan was mixed VMware + XenServer, but small Ops team!
Compromising
We chose SAN + iSCSI
• We chose iSCSI for flexibility:!
• We need snapshots. Most backups are sync+snap!
• We like live migration of virtual machines!
• We tolerate latency penalty of SAN for snapshot flexibility!
• We run RAID-6 (2 parity disks)!Tolerate 2 disk failures per slice before data loss!Painful on write – 5,000 writes è 30,000 read + write!Remote equipment – time to replacement is not instant!!
Thank You! Chris Maxwell!@[email protected]!