academic workflows with irods final

13
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. Academic Workflow for Research Repositories Using iRODS and Object Storage 2016 iRODS User’s Group Meeting 9 June 2016 Randall Splinter, Ph.D. HPC Research Computing Solutions Architect [email protected] 770.633.2994

Upload: randy-splinter

Post on 13-Apr-2017

39 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

1!Academic Workflow for Research Repositories Using iRODS and Object Storage 2016 iRODS User’s Group Meeting

9 June 2016 Randall Splinter, Ph.D. HPC Research Computing Solutions Architect [email protected] 770.633.2994

Page 2: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

2! Agenda

▶  Introduction to the Problem • HPC Workflows • The Problem of Long Term Archiving

▶  Object Store to the Rescue ▶  How iRODS Enables Object Storage ▶  Why iRODS with DDN WOS is a Superior

Solution for Research Repositories ▶  A Case Study

Page 3: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

3!Introduction to the Problem

▶  The Problem of Collaboration • NAS technologies (NFS, CIFS) are local o They don’t tend to scale well over WAN distances

– But collaborators are frequently widely separated in distance

o How to enable researchers to share data without administrative overhead – securely – Typically, only system administrators can control the

ACLs on NFS/CIFS mounts • FTP o Security

• Tape o Yech

Page 4: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

4!HPC Workflows

▶  Ingest data from a source (Analysis of data) • Pre-analysis on low-end storage • Move cleaned up data to a PFS and compute • After full analysis data is moved to long term storage o  Can this be automated? – Yes, with iRODS.

▶  No ingest (Pure simulation) • Compute models are run on a compute cluster with PFS • After full analysis data is moved to long term storage o  Again automation is key.

▶  Data sets are exploding!

Page 5: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

5!The Problem of Long Term Archiving

▶  Data must be secure • From deletion (accident or deliberate) •  Loss from theft o  Security hacks o  Faculty or students leave and take IP with them

▶  Must satisfy regulatory restrictions • HIPAA, for instance

▶  Changing hardware standards •  In particular tape standards

▶  Hardware availability • Spinning media is mechanical and will not last forever

Page 6: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

6!Object Store to the Rescue

▶  All Object stores provide a way to replicate data over

large distances • Some more effectively than others o  Provides a way to effectively share data over WAN scales

• Most object stores were designed for cloud storage o  Security has always been important o  Ease of data sharing has been important

▶  This now enables more effective data sharing and data security than with traditional storage solutions at price points that traditional NAS systems cannot approach.

Page 7: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

7!How iRODS Enables Object Storage

▶  iRODS is a very effective middleware layer for accessing multiple storage resources

▶  iRODS handles the security and database management of the ingested data

▶  Provides a powerful metadata search capability for ingested data

▶  Provides a rules engine for the processing of incoming data and the manipulation of data on the back-end •  For instance, o  Data can be moved to slower storage resources as they age or

another criteria is met (Essentially HSM) o  Data can be secured from removal, editing or modification based

upon criteria using the “null” chmod – Retention policies! o  Anything else?

Page 8: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

8!DDN WOS: Key Features

Fully-Integrated Object Storage Appliance WOS7000, 60 Drives in 4U, with 1 or 2 object storage servers per appliance

Federated, Global Namespace Locally or across multiple geographies with smart policies for performance and/or disaster recovery on a per-object basis

Pure Object Storage Formats drives with custom WOS disk file system, no Linux file I/Os, no fragmentation, fully contiguous object read and write operations for maximum disk efficiency

Latency-Aware Access Manager WOS intelligently makes decisions on

the best geographies to get from based upon location access load and latency

User Defined Metadata and Metadata Search Applications can assign their own metadata via object storage API, WOS now also supports parallel search of WOS user metadata

Self-healing Architecture No hard tie between physical disks and data. Failed drives are recovered through dispersed data placement – rebuilds happen at read, not write, speed – rebuilding only data.

Flexible Data Protection Select multiple policy-driven data protection schemes to meet application, workflow and

disaster recovery requirements

Exabyte Scalability Create virtually limitless data repositories, non-disruptively

seamlessly scale to over an exabyte of capacity

Page 9: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

9!Why iRODS with DDN WOS is a Superior Solution for Repositories

▶  Ease of Scalability with WOS ▶  Ease of administration – Once rules are tested and in

place the system can be managed with a minimum of administrative overhead

▶  Automating workflows to guarantee consistency and reproducibility in the science that is produced

Page 10: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

10!Why iRODS with DDN WOS is a Superior Solution for Repositories

▶  Ease of auditing for both usage and back charging and for maintaining adequate data security compliance

▶  DDN WOS makes remote replication simple and provides a straightforward way to manage DR systems

Page 11: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

11!Why iRODS with WOS is a Superior Solution for Repositories

▶  Central to any repository is the ability to add metadata tags and search metadata. •  iRODS has extensive abilities to do that – Significantly

better than any competing options

# imeta add –d filename “Date” “2 Feb 2016”# imeta ls –d filename

AVUs defined for dataObj filename:attribute: Meta1value: hellounits:---attribute: Datevalue: 2 Feb 2016units:

# imeta rm –d filename “Meta1” “hello”

Page 12: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

12!A Case Study

Hrothgar Compute Cluster

Lustre Filesystem

Page 13: Academic Workflows with iRODS FINAL

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

13!

9351 Deering Avenue Chatsworth, CA 91311

1.800.837.2298 1.818.700.4000

company/datadirect-networks

@ddn_limitless

[email protected]

Thank You! Keep in touch with us

Questions?

.