2014 ceph nylug talk
DESCRIPTION
Talk from 05 June 2014 NYLUG meeting at Bloomberg NYC. Short history of where Ceph came from, an architectural overview, and the current state of the community.TRANSCRIPT
![Page 1: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/1.jpg)
2014 New York, NYCeph @ NYLUG
![Page 2: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/2.jpg)
Copyright © 2014 by Inktank | Private and Confidential
WHO?
2
Patrick McGarry
Director, Community – Red Hat
/. -> ALU -> P4 -> Inktank
scuttlemonkey
Lies and misinformation!
![Page 3: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/3.jpg)
Copyright © 2014 by Inktank | Private and Confidential
AGENDA
3
INDUSTRY MUSINGS
INTRO TO CEPH
ARCHITECTURE
COMMUNITY
INKTANK CEPH ENTERPRISE
![Page 4: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/4.jpg)
Copyright © 2014 by Inktank | Private and Confidential
THE FORECAST
By 2020over 15 ZB of data will be stored.1.5 ZB are stored today.
4
![Page 5: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/5.jpg)
Copyright © 2014 by Inktank | Private and Confidential
THE PROBLEM
Existing systems don’t scale
Increasing cost and complexity
Need to invest in new platforms ahead of time
2010 2020
IT Storage Budget
Growth of data
5
![Page 6: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/6.jpg)
Copyright © 2014 by Inktank | Private and Confidential
THE SOLUTION
PAST: SCALE UP
FUTURE: SCALE OUT
6
![Page 7: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/7.jpg)
INTRO TO CEPH
![Page 8: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/8.jpg)
Copyright © 2013 by Inktank | Private and Confidential
HISTORICAL TIMELINE
8
RHEL-OSP & RHEV Support FEB 2014
MAY 2012Launch of Inktank
OpenStack Integration 2011
2010Mainline Linux Kernel
Open Source 2006
2004 Project Starts at UCSC
Production Ready Ceph SEPT 2012
2012CloudStack Integration
OCT 2013Inktank Ceph Enterprise Launch
Xen Integration 2013
![Page 9: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/9.jpg)
A STORAGE REVOLUTION
PROPRIETARY HARDWARE
PROPRIETARY SOFTWARE
SUPPORT & MAINTENANCE
COMPUTER
DISKCOMPUTE
RDISK
COMPUTER
DISK
STANDARDHARDWARE
OPEN SOURCE SOFTWARE
ENTERPRISEPRODUCTS &
SERVICES
COMPUTER
DISKCOMPUTE
RDISK
COMPUTER
DISK
![Page 10: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/10.jpg)
Copyright © 2014 by Inktank | Private and Confidential
ARCHITECTURE
![Page 11: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/11.jpg)
Copyright © 2014 by Inktank | Private and Confidential
ARCHITECTURAL COMPONENTS
11
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata management
APP HOST/VM CLIENT
![Page 12: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/12.jpg)
Copyright © 2014 by Inktank | Private and Confidential
ARCHITECTURAL COMPONENTS
12
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata management
APP HOST/VM CLIENT
![Page 13: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/13.jpg)
OBJECT STORAGE DAEMONS
13
FS
DISK
OSD
DISK
OSD
FS
DISK
OSD
FS
DISK
OSD
FS
btrfsxfsext4
M
M
M
![Page 14: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/14.jpg)
RADOS CLUSTER
14
APPLICATION
M M
M M
M
RADOS CLUSTER
![Page 15: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/15.jpg)
RADOS COMPONENTS
15
OSDs: 10s to 10000s in a cluster One per disk (or one per SSD, RAID
group…) Serve stored objects to clients Intelligently peer for replication & recovery
Monitors: Maintain cluster membership and state Provide consensus for distributed decision-
making Small, odd number These do not serve stored objects to
clients
M
![Page 16: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/16.jpg)
WHERE DO OBJECTS LIVE?
16
??APPLICATION
M
M
M
OBJECT
![Page 17: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/17.jpg)
A METADATA SERVER?
17
1
APPLICATION
M
M
M
2
![Page 18: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/18.jpg)
CALCULATED PLACEMENT
18
FAPPLICATION
M
M
MA-G
H-N
O-T
U-Z
![Page 19: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/19.jpg)
EVEN BETTER: CRUSH!
19
RADOS CLUSTER
OBJECT
10
01
01
10
10
01
11
01
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
![Page 20: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/20.jpg)
CRUSH IS A QUICK CALCULATION
20
RADOS CLUSTER
OBJECT
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
![Page 21: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/21.jpg)
CRUSH: DYNAMIC DATA PLACEMENT
21
CRUSH: Pseudo-random placement algorithm
Fast calculation, no lookup Repeatable, deterministic
Statistically uniform distribution Stable mapping
Limited data migration on change Rule-based configuration
Infrastructure topology aware Adjustable replication Weighting
![Page 22: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/22.jpg)
Copyright © 2014 by Inktank | Private and Confidential
ARCHITECTURAL COMPONENTS
22
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata management
APP HOST/VM CLIENT
![Page 23: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/23.jpg)
ACCESSING A RADOS CLUSTER
23
APPLICATION
M M
M
RADOS CLUSTER
LIBRADOS
OBJECT
socket
![Page 24: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/24.jpg)
L
LIBRADOS: RADOS ACCESS FOR APPS
24
LIBRADOS: Direct access to RADOS for applications C, C++, Python, PHP, Java, Erlang Direct access to storage nodes No HTTP overhead
![Page 25: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/25.jpg)
Copyright © 2014 by Inktank | Private and Confidential
ARCHITECTURAL COMPONENTS
25
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata management
APP HOST/VM CLIENT
![Page 26: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/26.jpg)
THE RADOS GATEWAY
26
M M
M
RADOS CLUSTER
RADOSGWLIBRADOS
socket
RADOSGWLIBRADOS
APPLICATION APPLICATION
REST
![Page 27: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/27.jpg)
RADOSGW MAKES RADOS WEBBY
27
RADOSGW: REST-based object storage proxy Uses RADOS to store objects API supports buckets, accounts Usage accounting for billing Compatible with S3 and Swift applications
![Page 28: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/28.jpg)
Copyright © 2014 by Inktank | Private and Confidential
ARCHITECTURAL COMPONENTS
28
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata management
APP HOST/VM CLIENT
![Page 29: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/29.jpg)
STORING VIRTUAL DISKS
29
M M
RADOS CLUSTER
HYPERVISORLIBRBD
VM
![Page 30: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/30.jpg)
SEPARATE COMPUTE FROM STORAGE
30
M M
RADOS CLUSTER
HYPERVISORLIBRB
D
VM HYPERVISORLIBRB
D
![Page 31: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/31.jpg)
KERNEL MODULE FOR MAX FLEXIBLE!
31
M M
RADOS CLUSTER
LINUX HOSTKRBD
![Page 32: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/32.jpg)
RBD STORES VIRTUAL DISKS
32
RADOS BLOCK DEVICE: Storage of disk images in RADOS Decouples VMs from host Images are striped across the cluster
(pool) Snapshots Copy-on-write clones Support in:
Mainline Linux Kernel (2.6.39+) Qemu/KVM, native Xen coming soon OpenStack, CloudStack, Nebula,
Proxmox
![Page 33: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/33.jpg)
Copyright © 2014 by Inktank | Private and Confidential
ARCHITECTURAL COMPONENTS
33
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata management
APP HOST/VM CLIENT
![Page 34: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/34.jpg)
SEPARATE METADATA SERVER
34
LINUX HOST
M M
M
RADOS CLUSTER
KERNEL MODULE
datametadata 0110
![Page 35: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/35.jpg)
SCALABLE METADATA SERVERS
35
METADATA SERVER Manages metadata for a POSIX-compliant
shared filesystem Directory hierarchy File metadata (owner, timestamps,
mode, etc.) Stores metadata in RADOS Does not serve file data to clients Only required for shared filesystem
![Page 36: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/36.jpg)
CEPH AND OPENSTACK
36
RADOSGWLIBRADOS
M M
RADOS CLUSTER
OPENSTACK
KEYSTONE CINDER GLANCE
NOVASWIFTLIBRB
DLIBRB
D
HYPER-
VISORLIBRBD
![Page 37: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/37.jpg)
![Page 38: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/38.jpg)
Ceph Developer Summit
38
• Recent: “Giant”
• March 04-05• wiki.ceph.com • Virtual
(irc, hangout, pad, blueprint, youtube)
• 2 days (soon to be 3?)
• Discuss all work
• Recruit for your projects!
![Page 39: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/39.jpg)
New Contribute Page
39
• http://ceph.com/community/Contribute
• Source tree• Issues• Share
experiences• Standups• One-stop shop
![Page 40: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/40.jpg)
New Ceph Wiki
40
![Page 41: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/41.jpg)
Accepted as a mentoring organization 8 mentors from Inktank & Community http://ceph.com/gsoc2014/ 2 student proposals accepted Hope to turn this into academic outreach
Google Summer of Code 2014
41
![Page 42: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/42.jpg)
Ceph Days
42
• inktank.com/cephdays
• Recently:London, Frankfurt, NYC, Santa Clara
• Aggressive program
• Upcoming:Sunnyvale, Austin, Boston, Kuala Lumpur
![Page 43: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/43.jpg)
Meetups
43
• Community organized
• World wide• Wiki• Ceph-
community• Goodies
available• Logistical
support• Drinkup to
tradeshow
![Page 44: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/44.jpg)
We haven’t forgotten! Looking for potential founding members Especially important to keep the IP clean
Ceph Foundation
44
![Page 45: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/45.jpg)
Coordinated Efforts
45
• Always need help
• CentOS SIG• OCP• Xen• Hadoop• OpenStack• CloudStack• Ganetti• Many more!
![Page 46: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/46.jpg)
Copyright © 2014 by Inktank | Private and Confidential
http://metrics.ceph.com
46
![Page 47: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/47.jpg)
THE PRODUCT
![Page 48: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/48.jpg)
Copyright © 2014 by Inktank | Private and Confidential48
INKTANK CEPH ENTERPRISEWHAT’S INSIDE?
Ceph Object and Ceph Block
Calamari
Enterprise Plugins (2014)
Support Services
Subscription-based
Priced on capacity
Single price for all protocols
![Page 49: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/49.jpg)
![Page 50: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/50.jpg)
Copyright © 2014 by Inktank | Private and Confidential
ROADMAPINKTANK CEPH ENTERPRISE
50
1.2 2.0
CEPH
CALAMARI
PLUGINS
Erasure Coding
RHEL7 Support
Cache Tiering
User Quotas
RADOS Management
Analytics Hosted/SaaS
SNMP, Hyper-V
Ceph 0.77 Firefly Ceph 0.87 “H-Release”
April 2014 September 2014
CephFS
Ubuntu 14.04 Support
VMware
HDFS Support
iSCSI
Intelligent Objects
QoS
2015
![Page 51: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/51.jpg)
Copyright © 2014 by Inktank | Private and Confidential
Emperor Giant
H
I
J
Inktank Ceph Enterprise v1.1 (Dumpling LTS until May 2015)
Inktank Ceph Enterprise v1.2 (Firefly LTS until November 2015)
RELEASE SCHEDULE
51
2013 2014 2015
Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2
FireflyDumpling
![Page 52: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/52.jpg)
Copyright © 2014 by Inktank | Private and Confidential
Read about the latest version of Ceph. The latest stuff is always at http://ceph.com/get
Deploy a test cluster using ceph-deploy. Read the quick-start guide at http://ceph.com/qsg
Read the rest of the docs! Find docs for the latest release at http://ceph.com/docs
Ask for help when you get stuck! Community volunteers are waiting for you at
http://ceph.com/help
GETTING STARTED WITH CEPH
52
![Page 53: 2014 Ceph NYLUG Talk](https://reader037.vdocuments.site/reader037/viewer/2022110303/54b823e14a79598b168b46d7/html5/thumbnails/53.jpg)
THANK YOU!
Patrick McGarryDirector, CommunityRed Hat
@scuttlemonkey
YOUR PICTURE HERE