swift at scale: the ibm softlayer story

Swift at Scale:The IBM SoftLayer StoryBrian Cline, Object Storage Development Lead

OpenStack Summit • Ocata series 2016.10.25 Barcelona, Spain

twitter/irc: @briancline

Our History with Public Object Storage• 2012 — First three clusters go live (DAL, AMS, SNG)

• 2014 — Dedicated development team • 2014 — Launch 11 clusters in new datacenters

• 2015 — Launch 5 clusters in new DCs • 2015 — Product integrations with IBM Bluemix

• 2016 — Launch 3 clusters in new DCs (and expand an existing cluster into multiple DCs)

2012: When things were [mostly] simpler…• 7-10 nodes in each cluster • Two node types • Proxy • Data - account, container, object services

• Load balancer

• FreeBSD with ZFS ⚠ Do not attempt.

• No centralized logs • No log analysis tools

2016: Adjusted for scale (blood, sweat, tears, dreams, starlight…)

• Up to hundreds of nodes per cluster • Three node types • Proxy • Meta - account and container services • Data - object services

• Load balancer cluster

• Debian Linux • Centralized and searchable logs • Analytics via Spark and Hadoop

Our Scale

22 Swift Clusters

24 Datacenters

16 Countries

Tens of Billions of Objects

7 Million Containers

Hundreds of thousands of Swift Accounts

90 PB of Capacity

Thousands of Nodes

40,000+ Disks

Tens of thousands requests per second

GET HEAD PUT DELETE

(with notable variability between clusters)

Hardware

Hardware we like• Supermicro 36-disk chassis

• 12-16 physical cores (24-32 HT cores)

• 128GB RAM for proxies • 256GB RAM for data nodes

• 10Gbps NICs (separate API vs. storage/replication networks)

• 3 - 4 TB disks

• Controller card • 2 disks for OS (RAID1) • 1 disk for OS hotswap • 4 disks for SSD caching • 29 disks for data storage

• Usually expand by ½-row or afull row at a time

Software

Our Stack — SoftwareOS Debian

Base Swift (duh) — sometimes with backports

Authentication Swauth — some internal patches and enhancements Keystone (APIv3) — starting with Bluemix accounts

Metadata Search Elasticsearch

Monitoring & Logging

collectd, Nagios, Capacity Dashboard Logstash, Kibana, Graphite, Grafana slogging

Automation Chef, Jenkins, Fabric

Our Stack — Custom Middlewares

• CDN operations (purge, load, CNAMEs, TTL, compression, etc.) • CDN origin pull

• Search indexer (on successful PUT/POST/DELETE) • Search query operations

• Checkpoint (account enable/disable/etc. abilities for resellers) • Internal management (sysmeta read/write, proxy-level recon)

Lessons Learned

Lessons Learned: Automation• Make automation a must-have, day-one deliverable • Never launch something new without test/deploy automation

• Must work across all environments (dev, QA, UAT/staging, prod)

• Automation needs tests and metrics, too — it is code!

• Functional testing should be an automated part of every deploy

• Remember your orchestration (knowledge of Swift zones)

Lessons Learned: Monitoring• Scale test any monitoring/logging infrastructure you put into place

• Very obvious stuff: • Space and IOPS, errors from SMART/XFS/kernel/controller, etc. • HTTP response code aggregates, latency aggregates by verb, etc.

• Swift metrics: • If nothing else, async pendings • Replicator failures and partitions/sec rates • Replicator last completion timestamp vs. ring push timestamp

Lessons Learned: Monitoring

• Any middleware you create needs to emit ops metrics

• New features benefit from emitting usage metrics

• Don’t forget debug-level log messages

• Automatic checks for precipitating conditions that lead to failures (not just for the error log lines that result from them afterwards)

Lessons Learned: Rebalancing

• Keep tabs on your rebalance times(and keep them small when possible)

• Coordinate rebalances around node/cluster maintenance

• Don’t let IOPS levels grow too high before expanding capacity

• Customer IOPS vs. Replicator & Auditor IOPS — know your limits

Lessons Learned: Swift• Use 256 byte inode sizes (or the smallest you can get away with)

• Using swauth? Use an SSD storage policy for AUTH_.auth containers

• Namespace any custom API additions (and be consistent)

• When possible, ask community about new middleware thoughts

• Upstream is important! Stay involved and give back when possible

Thank you!

Questions?

swift at scale: the ibm softlayer story

Technology