federating grid and cloud s torage in eudat
DESCRIPTION
Federating Grid and Cloud S torage in EUDAT. International Symposium on Grids and Clouds 2014, 23-28 March 2014. Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich , CERN. Agenda. Introduction … … … Test results Future work. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/1.jpg)
Shaun de Witt, STFCMaciej Brzeźniak, PSNCMartin Hellmich, CERN
Federating Grid and Cloud Storage in EUDAT
International Symposium on Grids and Clouds 2014,
23-28 March 2014
![Page 2: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/2.jpg)
Agenda• Introduction• …• …• …• Test results• Future work
3rd EUDAT Technical meeting in Bologna 7th February 2013
![Page 3: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/3.jpg)
Introduction• We present and analyze the results
of Grid and Cloud Storage integration• In EUDAT we used:
– iRODS as Grid Storage federation mechanism– OpenStack Swift as scalable object storage solution
• Scope:– Proof of concept– Pilot OpenStack Swift installation in PSNC– Production iRODS servers
in PSNC (Poznan) and EPCC (Edinburgh)
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 4: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/4.jpg)
EUDAT project introduction• pan-European Data Storage & mgmt infrastructure• Long term data preservation:
• Storage safety, availability – replication, integrity control
• Data Accessibility – visibility, possibility to refer over years
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• Partners: data center & communities:
![Page 5: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/5.jpg)
EUDAT challenges:
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• Federate heterogeneous data management systems:• dCache, AFS, DMF, GPFS, SAM-FS
• File systems, HSMs, file servers
• Object Storage systems (!)
while ensuring:• Performance, scalability,
• Data safety, durability, HA, fail-over
• Unique access, Federation transparency,
• Flexibility (rule engine)
• Implement the core services:• safe and long-term storage: B2SAFE,
• efficient analysis: B2STAGE,
• easy deposit & sharing: B2SHARE,
• Data & meta-data exploration: B2FIND.
Picture showing various storagesystems federated under iRODS
![Page 6: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/6.jpg)
EUDAT CDI domain of registered data:
![Page 7: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/7.jpg)
Grid – Cloud storage integration• Need to integrate Grids and Cloud/Object storage
• Grids get another, cost-effective, scalable backend• Many institutions and initiatives
are testing & using in production object storage including
• Most Cloud Storage use Object Storage concept• Object Storage solutions have limited support
for federation that is well addressed in Grids• In EUDAT we integrated:
• object storage system – OpenStack Swift• iRODS servers and federations
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 8: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/8.jpg)
Context: Object Storage Concept• The concept enables building
low-cost, scalable, efficient storage:• Within data centre• DR / distributed configurations
• Reliability thanks to redundancy of components:• Many cost-efficient storage servers w/ disk drives (12-60
HDD/SSD)• Typical (cheap) network: 1/10 Gbit Ethernet
• Limitations of traditional appraoches:• High investment cost and maintenance• Vendor lock-in, Closed architecture, Limited scalability• Slow adoption of new technologies than in commodity market
![Page 9: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/9.jpg)
Context: Object Storage importance• Many institutions and initiatives
(DCs, NRENs, companies, R&D projects)are testing & using in production object storage including:• Open source / private cloud:
• Open Stack Swift• Ceph / RADOS• Sheepdog, Scality…
• Commercial:• Amazon S3, RackSpace Cloud Files…• MS Azzure Object Storage…
• Most promising open source: Open Stack Swift & Ceph
![Page 10: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/10.jpg)
Object Storage: Architectures
OpenStack Swift
User Apps
Load balancer
ProxyNode
ProxyNode
ProxyNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
UploadDownload
CEPH
LibRados
RadosGW RBD CephFS
APP HOST / VM Client
Rados
MDSMDS.1
MDS.n
......
MONsMON.1
MON.n
......
OSDsOSD.1
OSD.n
......
![Page 11: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/11.jpg)
Object Storage: concepts:
OpenStack Swift Ring
Source:The Riak Project
Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Ceph’s map
• No meta-data lookups, no meta-data DB!, data placement/location computed!
• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes
• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.
![Page 12: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/12.jpg)
Object Storage concepts: no DB lookups!
OpenStack Swift Ring
Source:The Riak Project
Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Ceph’s map
• No meta-data lookups, no meta-data DB!, data placement/location computed!
• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes
• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.
![Page 13: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/13.jpg)
Grid – Cloud storage integration
• Most cloud/object storage solutions expose:• S3 interface• Other native interfaces: OSS: Swift; Ceph: RADOS
• S3 (by Amazon) is de facto standard in cloud storage:• Many PetaBytes, Global systems• Vendors use it (e.g. Dropbox) or provides it• Large take up
• Similar concepts:• CDMI: Cloud Data Management Interface –
SNIA standard, not many implementationshttp://www.snia.org/cdmi
• Nimbus.IO: https://nimbus.io
• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/
• RackSpace Cloud Files:www.rackspace.com/cloud/files/3rd EUDAT Technuical meeting in Bologna 7th
February 2013
![Page 14: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/14.jpg)
S3 and S3-like in commercial systems:• S3 re-sellers:
• Lots of services• Including Dropbox
• Services similar to S3 concept:• Nimbus.IO:
https://nimbus.io • MS-Azzure blob Storage:
http://www.windowsazure.com/en-us/manage/services/storage/• RackSpace Cloud Files:
www.rackspace.com/cloud/files/• S3 implementations ‚in the hardware’:
• Xyratex• Amplidata
3rd EUDAT Technuical meeting in Bologna 7th February 2013
o
![Page 15: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/15.jpg)
Why build PRIVATE S3-like storage?• Features/ benefits:
• Reliable storage on top of commodity hardware• User still able to control the data• Easy scalability, possible to grow the system
• Adding resources and redistributing data possible in non-disruptive way
• Open source software solutions and standards available:• e.g. OpenStack Swift: Open Stack Native API and S3 API• Other S3-enabled storage: e.g. RADOS• CDMI: Cloud Data Management Interface
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 16: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/16.jpg)
Why to federate iRODS with S3/OpenStack?
• Some communities have data stored in OpenStack• VPH is building reliable storage cloud on top of OpenStack
Swift within pMedicine project (together with PSNC)
• These data should be available to EUDAT• Data Staging: Cloud -> EUDAT -> PRACE HPC and back• Data Replication: Cloud -> EUDAT -> other back-end
storage• We could apply rule engine to data in the cloud,
assign PIDs
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• We were asked to consider cloud storage:• From EUDAT 1st year review report:
![Page 17: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/17.jpg)
EUDAT’s iRODS federation
VPH case analysis:
iRODS serverS3 driver
S3 APIOSS API
iRODS serverother storage driver
Storage system
S3/OSS
client
iRODS client
HPC system
iRODS serverstorage driver
Data access
Data ingestion
Regi-stration
Data Staging
EUDAT’s PID Service
Replication
Dataingestion
Dataaccess
Dataaccess
PIDassigned
![Page 18: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/18.jpg)
Our 7.2 project• Purpose:
• To examine existing iRODS-S3 driver• (possibly) to improve it / provide another one
• Steps/status:• 1st stage:
• Play with what is there – done for OpenStack/S3 + iRODS• Examine functionality• Evaluate scalability – found some issues already
• Follow-up• Try to improve the existing S3 driver
• Functionality• Performance
• Implement native Open Stack driver?• Get in touch with iRODS developers
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 19: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/19.jpg)
iRODS-OpenStack testsTEST SETUP:• iRODS server:
• Cloud as compoundresources
• Disk cache in front of it• Open Stack Swift:
• 3 proxies, 1 with S3• 5 storage nodes• Extensive functionality and perf. tests
• Amazon S3:• Only limited functionality tests
3rd EUDAT Technuical meeting in Bologna 7th February 2013
S3/OpenStack API
S3 API
iRODS server(s)
![Page 20: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/20.jpg)
iRODS-OpenStack testTEST RESULTS:
• S3 vs native OSS overhead• Upload: ~0%• Download: ~8%
• iRODS overhead:• Upload: ~19%• Download:
• From compound S3: ~0%• Cached: SPEEDUP: 230% (cache resources faster than S3)
![Page 21: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/21.jpg)
iRODS-OpenStack test
![Page 22: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/22.jpg)
Conclusions and future plans:• Conclusions
• Performance-wise iRODS does not bring much overhead – files <2GB• Problems arise for files >2GB – no support for multipart upload
in iRODS-S3 driver – this prevents iRODS from storing files >2GB in clouds• Some functional limits (e.g. imv problem)• Using iRODS to federate S3 clouds in large scale
would require improving the existing or developing a new driver
• Future plans:• Test the integration with VPH’s cloud using existing driver• Ask SAF for supporting the driver development • Get in touch with iRODS developers to assure the sustainability of our
work
![Page 23: Federating Grid and Cloud S torage in EUDAT](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681666d550346895dda0a93/html5/thumbnails/23.jpg)
EUDAT’s iRODS federation
Object storage on top of iRODS?
S3 driver
S3 API S3/OSS
client
iRODS client
Data Access/
ingest
Dataingestion
Dataaccess
iRODS server
Other storage
iRODS serverother storage driver
Storage system Storage system
iRODS API S3 API
Problems:• Data organisation mapping: * filesystem vs objects * big files vs fragments
• Identity mapping? * S3 keys/accounts vs X.509?
• Out of scope of EUDAT? * a lot of work needed