peter tiernan - ceph at the digital repository of ireland
DESCRIPTION
The DRI has a need for vastly scalable and dynamic storage. In this presentation we explore 4 storage solutions and describe how we made the choice to use Ceph.TRANSCRIPT
![Page 1: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/1.jpg)
Peter TiernanSystems and Storage EngineerDigital Repository of Ireland TCHPC
Ceph at the DRI
![Page 2: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/2.jpg)
DRI:
The Digital Repository Of Ireland (DRI) is an interactive, national trusted digital repository for contemporary and historical, social and cultural data held by Irish institutions.
The DRI follows the Open Archival Information System (OAIS) ISO reference model and The Trusted Repository Audit Checklist (TRAC)
![Page 3: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/3.jpg)
OAIS Model:
- is concerned with all technical aspects of digital repositories- describes ‘components and services required to develop and maintain archives’- is broken down into 'Functional Entities' and 'Work Packages'. - WP8 is responsible for the ‘Archival Storage’ functional entity.
![Page 4: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/4.jpg)
OAIS Model:
Source:www.digital-preservation.com
![Page 5: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/5.jpg)
DRI Storage Requirements:
OAIS/TRAC requires the following from storage:
- Minimal conditions for performing long-term preservation of digital assets
- Long Term Preservation of digital assets, even if the OAIS (repository) itself is not permanent or present.
![Page 6: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/6.jpg)
DRI Storage Requirements:
- Open Source/Open Standards- Independence- High Availability- Dynamically Configurable- Ease of Interoperability (Interfaces, APIs)- Data Security/Placement (Replication, Erasure coding,
Placement, Tiering, Federation)- Self Contained- Commodity Hardware
![Page 7: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/7.jpg)
Storage Solutions We Tested:
![Page 8: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/8.jpg)
Why we didn't choose HDFS:
- Interfaces limited. Not posix compliant due to immutable nature of filesystem.- Performance geared towards large data streams. I/O of
many small files is poor. - Single point of failure and bottleneck at its Namenode.- Doesn’t provide any federation
![Page 9: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/9.jpg)
Why we didn't choose iRODS:
- Default Interfaces limited. No Restful, RBD.- Single point of failure at its iCAT metadata server- Overlapping functionality with Fedora Commons
Why we didn't choose GPFS:- Default Interfaces limited. No Restful, RBD.- Data Replica limit of 2.- Closed source
![Page 10: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/10.jpg)
Why we chose Ceph:
- We like its distributed, clustered architecture- Provides complete high availability on install- Scales out horizontally to massive levels- Data Security: Distributed, Replicated- Many interface options - Rich, documented, multi-level APIs- Dynamically configurable- Very good Performance for general use (many small file I/O)- Solid release schedule, new features
![Page 11: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/11.jpg)
Findings:HDFS iRODS Ceph GPFS
API Yes Yes Yes Yes
Fedora 3.6.x Driver
Yes No No No
Interface: Posix No Yes Yes Yes
Interface: RBD No No Yes No
Interface: RESTful
Yes No Yes No
Dynamic Configuration
Yes Yes Yes Yes
High Availability: Data
Yes Yes Yes Yes
High Availability: Service
No No Yes Yes
Max Raw Storage (PetaByte)
>100 N/A >100 4 - 10^14
On-Read Data Checking
No Yes No No
Max Replicas 512 >2 ~2.1 Billion 2
Federation No Yes No Yes
![Page 12: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/12.jpg)
DRI Infrastructure
![Page 13: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/13.jpg)
Performance
Poor performance with low number of OSDs (6) and replication.
![Page 14: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/14.jpg)
Performance
Adding OSDs (26) improves replicated performance
Source: Diana Gudu, KIT
Sou
rc e: Dia
na Gud
u, KIT
Source: D
iana G
udu, K
IT
![Page 15: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/15.jpg)
Features we want from Ceph:
- Asynchronous Replication- Erasure Coding - Tiering- Multi-datacenter/Rados level async replication- Micro-services
![Page 16: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/16.jpg)
Other Ceph Projects:
- TCHPC: 100TB cluster used for backups
- collaboration with KIT/PSNC: performance testing, WAN scale replication testing (Sync/Async)
![Page 17: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/17.jpg)
DRI: www.dri.ieTrinity HPC: www.tchpc.tcd.ie
Trinity College Dublin: www.tcd.ie
Questions?
![Page 18: Peter Tiernan - Ceph at the Digital Repository of Ireland](https://reader034.vdocuments.site/reader034/viewer/2022051610/5496edd6ac7959042e8b5203/html5/thumbnails/18.jpg)
Links:
Ceph: www.ceph.comHDFS: hadoop.apache.orgIRODS: www.irods.orgGPFS: www.ibm.com/systems/software/gpfs/
Project Hydra: projecthydra.orgFedora Commons: www.fedora-commons.orgApache SOLR: lucene.apache.org/solr/HAProxy: haproxy.1wt.eu