case study: the university of alabama at birmingham presented by openstack , ceph and dell
DESCRIPTION
The University of Alabama at Birmingham gives scientists and researchers a massive, on-demand, virtual storage cloud using OpenStack and Ceph for less than $0.41 per gigabyte. OpenStack, Ceph and Dell joined forces at OpenStack Summit 2014 to outline how the university IT staff deployed a private storage cloud infrastructure using the Dell OpenStack cloud solution with Dell servers, storage, networking and OpenStack, and Inktank Ceph. Learn more:TRANSCRIPT
Case Study: The University of Alabama at BirminghamOpenStack , Ceph, Dell
Kamesh Pemmaraju, Dell
John-Paul Robinson, UAB
OpenStack Summit 2014
Atlanta, GA
2
Dell - Internal Use - Confidential
An overview
• Dell – UAB backgrounder
• What we were doing before
• How the implementation went
• What we’ve been doing since
• Where we’re headed
3
Dell - Internal Use - Confidential
Dell – UAB background
• 900 researchers working on Cancer and Genomic Projects.• Their growing data sets challenged available resources
– Research data distributed across laptops, USB drives, local servers, HPC clusters
– Transferring datasets to HPC clusters took too much time and clogged shared networks
– Distributed data management reduced researcher productivity and put data at risk
• They therefore needed a centralized data repository for Researchers in order to insure compliances concerning retention of data.
• They also wanted scale-out cost-effective solution and hardware that could be re-purposed for compute & storage
4
Dell - Internal Use - Confidential
Dell – UAB background (contd..)
• Potential solutions investigated– Traditional SAN– Public cloud storage– Hadoop
UAB chose Dell/Inktank to architect a platform that would be very scalable and provide lost costs per GB and was the best of all worlds that provide compute and storage on the same hardware.
5
Dell - Internal Use - Confidential
A little background…
• We didn’t get here overnight
• 2000s-era High Performance Computing
• ROCKS-based compute cluster
• The Grid and proto-clouds
• GridWay Meta-scheduler
• OpenNebula an early entrant that connected grids with this thing called the cloud
• Virtualization through-and-through
• DevOps is US
6
Dell - Internal Use - Confidential
Challenges and Drivers
• Technology– Many hypervisors– Many clouds– We have the technology…can we rebuild it here?
• Applications– Researcher started shouting “Data”!
NextGen SequencingResearch Data RepositoriesHadoop
– Researcher kept on shouting “Compute”!
7
Dell - Internal Use - Confidential
Data Intensive Scientific Computing
• We knew we needed storage and computing
• We knew we wanted to tie it together with an HPC commodity scale-out philosophy
• So August 2012 we bought 10 Dell 720xd servers– 16-core– 96GB RAM– 36TB Disk
• A 192-core, ~1TB RAM, 360TB expansion to our HPC fabric
• Now to integrate it…
8
Dell - Internal Use - Confidential
December 2012
• Bob said:
Hearing good things about open stack and ceph at this week at dell world.Simon anderson, CEO of dream host , spoke highly of dell, open stack, and ceph today.He is also chair of company that supports He also spoke highly of dell crowbar deployment tool.
I
9
Dell - Internal Use - Confidential
December 2012
• Bob said:
Hearing good things about open stack and ceph at this week at dell world.Simon anderson, CEO of dream host , spoke highly of dell, open stack, and ceph today.He is also chair of company that supports He also spoke highly of dell crowbar deployment tool.
• I said:
Good to hear. I've been thinking a lot about dell in this picture too. We have the building blocks in place. Might be a good way to speed the construction.
10
Dell - Internal Use - Confidential
Lesson 1:
Recognize when a partnership will help you achieve your goals.
11
Dell - Internal Use - Confidential
The 2013 Implementation
• The Timeline– In January we started our discussions with Dell and Inktank– By March we had committed to the fabric– A week in April and we had our own cloud in place
• The Experience– Vendors committed to their product– Direct engagement through open communities– Bright people who share your development ethic
12
Dell - Internal Use - Confidential
Next Step…Build Adoption
• Defined a new storage product based on the commodity scale-out fabric
– Able to focus on strengths of Ceph to aggregate storage across servers
– Provision any sized image to provide Flexible Block Storage
• Promote cloud adoption within IT and across the research community
• Demonstrate utility with applications
13
Dell - Internal Use - Confidential
Applications
• Crashplan Backup in the cloud– A couple hours to provision the VM resources– An easy half-day deploy with the vendor because we controlled
our resources a.k.a. firewall– Add storage containers on the fly as we grow…10TB in few clicks
• Gitlab hosting– Start a VM spec’d according to project site– Work with Omnibus install. Hey it uses Chef!
• Research Storage– 1TB storage containers for cluster users– Uses Ceph RBD images and NFS– The storage infrastructure part was easy – Scaled provisioning, 100+ user containers (100TB) created in
about 5 minutes.– Add storage servers as existing ones fill
14
Dell - Internal Use - Confidential
Ceph Rebalances as Storage Grows :)
15
Dell - Internal Use - Confidential
Lesson 2:Use it! That’s what it’s for!
The sooner you start using the cloud the sooner you start thinking like the cloud.
16
Dell - Internal Use - Confidential
How PoC Decisions Age Over Time
• Pick the environment you want when you are in operation…you’ll be there before you know it
• Simple networking is good– But don’t go basic unless you are able to reinstall the fabric– Class B ranges to match the campus fabric– We chose a split admin range to coordinate with our HPC admin
range– We chose a collapsed admin/storage network due to a single
switch…probably would have been better to keep separate and allow growth
– It’s OK to add non-provisioned interfacing nodes…know your net
• Avoid painting yourself in corner– Don’t let the Paranoid Folk box-in your deployment– An inaccessible fabric is an unusable fabric
• Fixed IP range mismatch with “fake” reservations
17
Dell - Internal Use - Confidential
Lesson 3:The fabric is flexible. Let it help you solve your problems
18
Dell - Internal Use - Confidential
Problems will Arise
• The release version of the ixgbe driver in Ubuntu 12.04.1 kernel didn’t perform well with our 10Gbit cards
– Open source has an upstream– Use it as part of debug network– Upgrading the drivers was a simple fix
• Sometimes when you fix something you break something else
• There are still a lot of moving parts but each has a strong open source community
– Work methodically– You will learn as you go – Recognize the stack is integrated and respect tool
boundaries
Dell - Internal Use - Confidential
Sometimes a Problem is just a Problem
• Code ex
20
Dell - Internal Use - Confidential
Lesson 4:The code *is* the documentation
…and that’s a *good* thing
21
Dell - Internal Use - Confidential
Where we are today
• OpenStack plus Ceph are here to stay for our Research Computing System
– They give us the flexibility we need for an ever expanding research applications portfolio
– Move our UAB Galaxy NextGen Sequencing platform to our Cloud
– Add Object Storage services– Put the cloud in the hands of researchers
• The big question…
22
Dell - Internal Use - Confidential
…how far can we take it?
• The goal of process automation is scale
• Incompatible, non-repeatable, manual processes are a cost
• Success is in dual-use– Satisfy your needs and customer demand– Automating process implies documenting process…great for
compliance and repeatability– Recognize the latent talent in your staff today’s system
admins are tomorrows systems developers
• Traditional infrastructure models are ripe for replacement
To learn more…
Please visit Dell.com/RedHat