sgi’s lustre hsm - national computational infrastructure · ©2016 sgi sgi’s lustre hsm rob...

24
©2016 SGI ©2016 SGI SGI’s Lustre HSM Rob Mollard Senior Storage Specialist (APAC) Some names and brands may be claimed as the property of others

Upload: dinhdieu

Post on 19-Apr-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

  • 2016 SGI2016 SGI

    SGIs Lustre HSM

    Rob MollardSenior Storage Specialist (APAC)

    Some names and brands may be claimed as the property of others

  • 2016 SGI

    Why consider Lustre HSM?

    Galen Shipman (LANL) Former OpenSFS Chairman

    Nov, 2013

    Some names and brands may be claimed as the property of others 2

  • 2016 SGI

    Data always lives longer than the hardware its stored on.

    Forward migration to new technology should never adversely

    impact the users.

    Some names and brands may be claimed as the property of others 3

  • 2016 SGI

    Overview of Lustre HSM

    Features: Migrate data to and from external storage (HSM) Free disk space when needed Restore data on cache-miss Policy management (migration, purge, soft rm,...) Import from existing backend Disaster recovery (restore Lustre filesystem from backend)

    Some names and brands may be claimed as the property of others 4

  • 2016 SGI

    Supported Lustre HSM Actions

    Archive Archiving a file means pre-copying a file from Lustre to an external HSM.

    A Copy Tool (copytool) reads file content and copies it to an external HSM.

    Once it has been copied, a file is then ready to be released.

    Release Remove all file data objects.

    Synchronous action which does not involve the copytool nor coordinator.

    Restore All file accesses are blocked until the file is fully restored.

    Copytool will write file data back from an external HSM to Lustre.

    File data accesses are unblocked when the restore is finished.

    Some names and brands may be claimed as the property of others 5

  • 2016 SGI

    Components of Lustre HSM

    Coordinator

    Policy Engine (robinhood)

    Copytool (DMF copytool)

    HSM (DMF)

    TAPE

    ZWS ZWS ZWS ZWS ZWS

    TAPE TAPE TAPE TAPE

    Tier 2:Disk Cache Manager

    Tier 3:Secondary Storage

    DMF Managed Environment

    copytoolLustre HSM Client AgentPolicyEngine DB

    Lustre MDS/MDT

    Coordinator

    Lustre OSS/OST Building Block

    Lustre OSS/OST Building Block

    Lust

    re C

    lient

    s

    Some names and brands may be claimed as the property of others 6

  • 2016 SGI

    Lustre HSM | Coordinator Centralises reception of HSM requests and ignores duplicate ones. Dispatches and balances requests across the available copytools. Manages a log of all requests.

    Replay request if MDS/MDT has crashed.

    Can be manually cancelled

    Behaviour is tuneable

    Can be stopped, resumed, purged

    Retry/No retry on error

    Timeouts, number of simultaneous requests, etc

    TAPE

    ZWS ZWS ZWS ZWS ZWS

    TAPE TAPE TAPE TAPE

    Tier 2:Disk Cache Manager

    Tier 3:Secondary Storage

    DMF Managed Environment

    copytoolLustre HSM Client AgentPolicyEngine DB

    Lustre MDS/MDT

    Coordinator

    Some names and brands may be claimed as the property of others 7

  • 2016 SGI

    Lustre HSM | Policy Engine

    Policy Engine manages pre-migration and purge policies. A user-space tool which communicates with the MDS and the coordinator.

    Watches Lustre file system changes.

    Triggers HSM actions/requests like pre-migration, purges and removal.

    Lustre HSM Client Agent

    PolicyEngine DB

    Lustre MDS/MDT

    Coordinator

    Lustre OSS/OST Building Block

    Lustre OSS/OST Building Block

    Lust

    re C

    lient

    s

    Some names and brands may be claimed as the property of others 8

  • 2016 SGI

    Policy Engine: Robinhood

    Robinhood is a user-space daemon for managing filesystems Purge oldest files when needed

    Custom policies

    With a database backend Persistent, avoid scanning for each action

    Currently MySQL or MariaDB are supported

    Supports features like: User/group usage accounting, including file size profiling.

    Extra-fast 'du' and 'find' clones.

    Customizable alerts on filesystem entries.

    Aware of Lustre OSTs and pools. (stripping)

    Filesystem disaster recovery tools.

    Robinhood User Group 2016 - September 19th, 2016 - Paris, France

    DB

    MDT ChangeLogs

    OST usage

    Policies

    archive requests(copy data to target)

    release requests(free disk space in Lustre)

    restore requests(copy data from target)

    Robinhood

    Some names and brands may be claimed as the property of others 9

  • 2016 SGI

    Robinhood Policies

    Robinhood manages 3 types of policies Migration policy Purge policy Removal policy

    Policies - schedule actions on filesystem entries according to admin-defined criteria File class definitions, associated to policies Based on file attributes (path, size, owner, age, xattrs, ...) Rules can be combined with boolean operators Least Recently Used (LRU) based migration/purge policies Entries can be white-listed

    Some names and brands may be claimed as the property of others 10

  • 2016 SGI

    Use casesHPC Active BackupHPC Disaster RecoveryHPC Workflow OptimisationPersistent HPC StorageLower HPC storage costsLong Term HPC ArchivesHPC Active Archive

    Some names and brands may be claimed as the property of others 11

  • 2016 SGI2016 SGI

    Lustre DMF

    Active Data Protection

    Some names and brands may be claimed as the property of others 12

  • 2016 SGI

    Data Assurance and ReliabilityA 3-2-1 Approach

    3 copies

    2

    1

    med

    ia

    type

    sco

    py

    offs

    ite

    Advantage of 3 copies of all data:

    Optimized use of storage HW Elimination of backup

    Advantage of keeping data on two different media types:

    Fast data access Data retention Archive resilience

    Advantage of keeping one copy offsite:

    Lower power consumption Base for compliance Disaster recovery

    1 2 3

    Performance copy

    Securecopy

    Disaster Recovery copy

    Primary Data Center

    Offsite or Cloud Storage

    RAID, Flash, Disc, Tape,

    Object & ZWS Tape or Cloud Object

    1

    1 2

    Some names and brands may be claimed as the property of others 13

  • 2016 SGI

    Lustre DMF Overview

    ManagementNetwork

    Object StorageTargets (OSTs)

    MetadataTarget (MDT)

    ManagementTarget (MGT)

    Storage servers groupedinto failover pairs

    Data Network (LNET)(InfiniBand/Ethernet/Omnipath)

    Lustre Clients

    Policy EngineServer

    DMF Agent(copytool)

    StorageMonitoring

    Archive

    DMF Storage Network(InfiniBand/Fibre Channel/SAS)

    Management Server (MGS)Metadata Server (MDS) Object Storage Servers (OSS)

    Some names and brands may be claimed as the property of others 14

  • 2016 SGI

    HSM | Data Migration Facility (DMF)Hierarchical Storage Management

    Transparently migrate data to Tape, MAID or Cloud

    Lustre*

    DMF

    Data life cycle management- DMF manages the placement of data

    within multiple tiers of storage Automated data migration

    - From expensive, production disk to 2nd or 3rd tier storage

    Transparent to user- All data appears on line all the time

    Key Benefits- DMF reduces tier 1 disk investment- DMF reduces power consumption- DMF protects data long term

    SGI DMF 25 years in production Active User Community (DMFUG)

    Cloud

    Some names and brands may be claimed as the property of others 15

  • 2016 SGI

    Lustre DMF | Communication & Data FlowLustre* HSM CommunicationsDMF CommunicationsDMF Metadata updateDMF Data

    TAPE Parallel DMF Data Mover

    Building Block

    TAPE Parallel DMF Data Mover

    Building Block

    TAPE TAPEParallel DMF Data Mover

    Building Block

    TAPE Parallel DMF Data Mover

    Building Block

    Lustre* HSMClient Agent

    PolicyEngine DB

    DMF MDS

    DMF Managed HSM Environment

    Lustre* OSS/OSTBuilding Block

    Lustre* OSS/OSTBuilding Block

    Lustre* OSS/OST

    Building BlockLustre* OSS/OST

    Building Block

    copytool

    Lustre* MDS/MDTBuilding Block

    Coordinator

    Lustre* Clients Lustre* Clients

    Lustre* Clients

    Parallel Data Mover Option Data migration from multiple

    parallel servers Scales I/O performance Add Additional data movers

    as required

    ZWS ZWS ZWS

    DMF Direct Archiving

    Some names and brands may be claimed as the property of others 16

  • 2016 SGI

    DMF Direct Archiving | Data Flow

    DMF Servers

    OSS 4

    MDS 1 MDS 2

    OSS 1

    OSS 2

    OSS 3

    High Performance Disk Cache

    Lustre* Clients

    Lustre* OSS

    Storage

    Lustre* MDS

    DMF MDS 1

    DMF MDS 2

    DMF Data Mover

    Metadata

    Data

    Logical PathPhysical Path

    PrimaryStorage

    TAPE

    ZWS DMF Tier2

    DMF Tier3

    Some names and brands may be claimed as the property of others 17

  • 2016 SGI

    High performance data migrations DMF Parallel Data Movers (pDMO)

    Simplified Lustre HSM communication Single DMF copytool instance

    Optimised data migrations DMF Direct Archiving

    MAID storage target DMF Zero Watt Storage

    Trusted data protection Over 25 years preserving data

    Active user community DMF User Group (Feb

    2017)http://hpc.csiro.au/users/dmfug/

    Key Differentiators

    Some names and brands may be claimed as the property of others 18

    http://hpc.csiro.au/users/dmfug/

  • 2016 SGI

    SGI Standardised Lustre Implementation Service Tasks & Deliverables

    Project management Pre-sales assessment of storage requirements

    - Provide guidance on Lustre best practices- Determine LUN layout- Assess Lustre networking needs

    Connect/verify all HW Install Lustre Environment

    - Install & configure Lustre SW for servers- Tune Lustre file system environment- Install Lustre clients- Create failover configuration files, if required

    Benchmark & Test environment Documentation of system and Lustre networking configuration Knowledge Transfer & Onsite Training

    Some names and brands may be claimed as the property of others 19

  • 2016 SGI2016 SGI

    Lustre DMF

    Customer Examples

    Some names and brands may be claimed as the property of others 20

  • 2016 SGI

    Lustre Version: 2.5.41.1 HSM: DMF 6.5 (ISSP 3.5) In use: since early 2015 Capacity: 3PB Performance: ~50GB/s How is it used: Protection of scientific data.

    Customer supports users all over France, with Lustre used for persistent storage. Data replication/duplication is not feasible with PBs.

    Some names and brands may be claimed as the property of others 21

  • 2016 SGI

    Lustre Version: 2.5

    HSM: DMF 6.5 (ISSP 3.5)

    In use: since April 2015

    Capacity: 1PB

    Performance: ~5GB/s

    How is it used: Active Archive (HSM)

    Some names and brands may be claimed as the property of others

    High PerformanceAutomotive ManufacturingFormula 1 Engine Manufacturer

    22

  • 2016 SGI

    Lustre Version: 2.5 DDN EXAScaler (Intel EEL) HSM: DMF 6.5 (ISSP 3.5) In use: Testing for several months just going into

    production now Capacity: 1.9PB Performance: ~20GB/s How is it used: Robinhood is running in HSM mode,

    although there is sufficient capacity for all data to remain online in Lustre (dual state)

    Some names and brands may be claimed as the property of others 23

  • 2016 SGI

    Q & A

    24Some names and brands may be claimed as the property of others

    SGIs Lustre HSMWhy consider Lustre HSM?Slide Number 3Overview of Lustre HSMSupported Lustre HSM ActionsComponents of Lustre HSMLustre HSM | CoordinatorLustre HSM | Policy EnginePolicy Engine: RobinhoodRobinhood PoliciesUse casesLustre DMFData Assurance and ReliabilityA 3-2-1 ApproachLustre DMF OverviewHSM | Data Migration Facility (DMF)Lustre DMF | Communication & Data FlowDMF Direct Archiving | Data FlowKey DifferentiatorsSGI Standardised Lustre Implementation Service Tasks & DeliverablesLustre DMFSlide Number 21High PerformanceAutomotive ManufacturingSlide Number 23Q & A