Download - A Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWC
Photo by Navinhttps://flic.kr/p/7vSNe7
By Craig DeLatte & Bryan Stillwell - May 20, 2015
What we will cover
• All views are from a systems admin perspective• Cinder
– Evaluating storage– Traditional vs. “grid” type storage
• Ceph• Adding more backends
Craig DeLatte & Bryan Stillwell
Our Criteria for Evaluating Storage
Craig DeLatte & Bryan Stillwell
•Must have passed the base cinder matrix for release•Must have open API access to allow for monitoring and gathering statistics•Must support nova’s live-migration•Ideally support a rack anywhere methodology
What Led Us to Using Ceph
• Supports live-migration• Ability to use x86 architecture instead of vendor specific hardware
Craig DeLatte & Bryan Stillwell
Our First Ceph Design
• What led to our design and where it went wrong– In our environment (and maybe yours) our customers only plan on
capacity, they assume unlimited performance
Craig DeLatte & Bryan Stillwell
First OpenStack Deployment
• Live migration testing• Ceph to the rescue
Craig DeLatte & Bryan Stillwell
Early Life with Ceph
• Out of family upgrades require a leap of faith• Be prepared to scare your co-workers• How your first production upgrade will feel
Craig DeLatte & Bryan Stillwell
Initial Ceph Cluster
OSDs: 60Journal Ratio: 5:1Drive Size: 1TBRaw Capacity: 60TBUsable Capacity: 20TB
Craig DeLatte & Bryan Stillwell
First Expansion
OSDs: 60Journal Ratio: 5:1Drive Size: 1TBRaw Capacity: 60TBUsable Capacity: 20TB
OSDs: 75Journal Ratio: 5:1Drive Size: 1.2TBRaw Capacity: 90TBUsable Capacity: 30TB
Craig DeLatte & Bryan Stillwell
What Went Wrong
• Performance issues– Too high HDD:SSD ratio for journals– Not enough placement groups (PGs)
• VMs lost sight of storage (libvirt)• Legacy tunables• VMs lost site of storage again! (version mismatch)
Craig DeLatte & Bryan Stillwell
Corrections Made
• Ordered more SSDs to reduce HDD:SSD ratio• Re-used mon IPs• Placement groups went from 512 PGs/pool to 4096 PG/s pool• Tunables switched to ‘firefly’• Need to make sure ‘ALL’ systems are upgraded to the new version
ceph osd set nobackfillceph osd set noscrubceph osd set nodeep-scrub
osd max backfills = 1osd recovery max active = 1osd recovery op priority = 1
Craig DeLatte & Bryan Stillwell
Second Expansion
OSDs: 75Journal Ratio: 5:1Drive Size: 1.2TBRaw Capacity: 90TBUsable Capacity: 30TB
OSDs: 189Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 226.8TBUsable Capacity: 75.6TB
Craig DeLatte & Bryan Stillwell
What Went Wrong
• More performance problems during expansion• Unintentional upgrades (giant)
Craig DeLatte & Bryan Stillwell
Corrections Made
• Decided we needed dedicated mon nodes• Added a couple more options to improve performance• Started work on replacing ceph-deploy with puppet-ceph
osd max backfills = 1osd recovery max active = 1osd recovery op priority = 1osd recover max single start = 1osd op threads = 12
Craig DeLatte & Bryan Stillwell
Third Expansion
OSDs: 189Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 226.8TBUsable Capacity: 75.6TB
OSDs: 297Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 356.4TBUsable Capacity: 118.8TB
Craig DeLatte & Bryan Stillwell
What Went Wrong
• Performance problems when adding OSDs• Started removing OSDs before the data was off them
Craig DeLatte & Bryan Stillwell
Corrections Made
• Work started on replacing ceph-deploy with puppet-ceph• Added option to bring in new OSDs with a weight of 0
osd max backfills = 1osd recovery max active = 1osd recovery op priority = 1osd recover max single start = 1osd op threads = 12osd crush initial weight = 0
Craig DeLatte & Bryan Stillwell
Fourth Expansion (most recent)
OSDs: 297Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 356.4TBUsable Capacity: 118.8TB
OSDs: 306Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 367.2TBUsable Capacity: 122.4TB
Craig DeLatte & Bryan Stillwell
Multiple Ceph Clusters
• 2 production• 2 staging• 2 lab• Virtual clusters for each member of the team
Craig DeLatte & Bryan Stillwell
The Next Cinder Hurdle
• Going from single backend to multi-backend• Naming of backends needs to be planned for• Not all lab testing will reveal issues when going to production
Craig DeLatte & Bryan Stillwell
Looking Foward
• New storage tiers (Performance-SSD, Capacity-HDD)• Emerging drive technologies• Newstore
Craig DeLatte & Bryan Stillwell
Takeaways
• Don't start small if you're going big
• Order the right number and type of SSDs
• Determine the right number of PGs early
• Dedicated mon nodes (fsync)
• Be careful with mon nodes in OpenStack
• Ceph upgrades (don't forget the compute nodes)
Craig DeLatte & Bryan Stillwell
What Made It Worth the Effort
• We are now not locked into vendor specific hardware• Scaling across racks, rows, and rooms• Nasty data migrations are a thing of the past• It allows us to future proof our data against EOL hardware support• We have a say!
– The Ceph working session is today at 11:50 in room 217
Craig DeLatte & Bryan Stillwell
Questions or Comments
• Email: [email protected]
• irc: cdelatte
• Email: [email protected]
• irc: bstillwell
Craig DeLatte & Bryan Stillwell
More TWC Talks
Wednesday, May 20th
Getting DNSaaS to Production with DesignateGrowing OpenStack at Time Warner CableChanging Culture at Time Warner CableNeutron in the Real World - TWC Implementation and Evolution
Thursday, May 21st
Real World Experiences with Upgrading OpenStack at Time Warner Cable
9:50a11:00a11:50a1:50p
2:20p
Craig DeLatte & Bryan Stillwell