storage for hpc, hpda and machine learning (ml) · · 2017-07-17storage for hpc, hpda and machine...

Storage for HPC, HPDA and

Machine Learning (ML)

Frank Kraemer, IBM Systems Architect

mailto:[email protected]

mailto:[email protected]

IBM Data Management for Autonomous Driving (AD)

▪ significantly increase development

efficiency by reducing manual efforts for

video tagging, eliminated wasted time for

data search and manual data copy/move

processes and by automating workflows

▪ significantly increase test through-put,

allowing you to run more test cases in

less time, therefore increasing time-to-

market as well as the quality of your camera

and ADAS products

▪ to reduce IT costs for local storage

hardware by globally centralizing data

▪ increase the entire flexibility through the

ability to move work-load from one place

to another

▪ guarantee long-year data verifiability and

recoverability of test data via archiving

DESY High Performance Computing with Data

Introducing IBM Spectrum Scale

Remove data-related bottlenecks

Demonstrated 400 GB/s throughput, building to 2.5TB/s

Local caching for Read and Write

Enable global collaboration

Data Lake serving HDFS, files & object across sites

Multi-cluster configurations; Sync & Async

Optimize cost and performance

Up to 90% cost savings & 6x flash acceleration

Transparently Tier to Cloud

Ensure data availability, integrity and security

End-to-end checksum, Spectrum Scale RAID, NIST/FIPS certification

Compression, Encryption, Audit Logging

Highly scalable high-performance

unified storage for files and objects with integrated analytics

The History of Spectrum Scale

* Gartner, Magic Quadrant for Distributed File Systems and Object Storage, 20 October 2016, Document No. G00307798

Infrastructure requirement IBM answer

Scalability Parallel File System

Flexibility Software Defined Storage

Agility Unified Storage

New gen workloads HDFS connector

Performance Parallel File system

Cloud OpenStack integration & Transparent Cloud Tiering

Object Unified Storage

Global AFM for multi-cluster storage

IBM Storage for unstructured data

Spectrum Scale: The flexible cognitive Storage Solution

Client workstations

Users and applications

HPC & AI

Computefarm

Big DataAnalytics

Global name space

IBM Spectrum ScaleAutomated data placement and data migration

SMB/CIFSNFSPOSIX HDFS Controller

Disk Tape Storage RichServers

Flash

On/Off Premise

OpenStack

Cinder Swift

Glance Manila

TransparentCloud Tiering

▪Site B

▪Site A

▪Site C

Cloud Data Sharing

Users and applications

Object

Container

Docker

Kubernetes

text

Highly Available Write Cache (HAWC)• Improves performance of small synchronous writes• Small synch writes are written to the log. As log fills, rewrite to home

Local Read Only Cache (LROC)• Extend the page pool memory to include local DAS/SSD for read caching

Compression• Compress what makes sense & extends to cache

Quality of Service• Throttle background functions such as rebuild or async replication• Set by flexible policy, such as day-of-week and time-of-day

Distributed and flash accelerated metadata• Metadata includes directories, inodes, indirect blocks

Lift data to the highest tiers based on the file’s “heat”• Automate workload pipelines with Spectrum Computing LSF

IBM Spectrum Scale performance featuresPerformance Leadership for large and small files

Advanced File Management (AFM)Tie together multiple clusters to serve users across the globe

Spans geographic distance and unreliable networks • Caches local ‘copies’ of data distributed to one or more

Spectrum Scale clusters

• Low latency ‘local’ read and write performance

• As data is written or modified at one location, all other locations see that same data

• Efficient data transfers over wide area network (WAN)

Speeds data access to collaborators and resources around the world• Unifies heterogeneous remote storage

Asynchronous DR is a special case of AFM • Bidirectional awareness for Fail-over & Fail-back with data integrity

• Recovery Point Objectives for volume & application consistency

HPC Performance with simplified user accessTransparent Tiering & Data Migration

|

1

0

Analyze and Archive In-Place

▪ Enterprise HPC with Flash for performance

▪ Network Shared Disk for modular scaling

▪ Tier data based upon policy, users actions or workflow

▪ Lower economics with tape, object, or cloud

▪ Data always available to end-users

▪ Auto-migrate to higher tiers

▪ Full data, not stubs

▪ Global namespace extends across physical storage and

multiple sites

▪ General High capacity tiered NAS with fast data

ingest/retention/share and long term retention

▪ Deployed today in multiple clients

System pool

(Flash)

Gold pool

(Disk)

Tape Library

NFS/SMB

Cluster

2nd Site

text

Unified File & Object + HDFSStore everywhere. Run anywhere.

Challenge: Object storage for data & cloud• Seamless scaling• RESTful data access• Object metadata replaces hierarchy

IBM Spectrum Scale Swift & S3• High-performance for object• Native OpenStack Swift support w/ S3• File or object in; Object or file out• Enterprise data protection & Features

Full OpenStack cloud support • Cinder (block), Manilla (file), Swift (object)

text

Analytics without complexityStore everywhere. Run anywhere.

Challenge: Separate storage systems for ingest, analysis, results

• HDFS requires locality aware storage (namenode)• Data transfer slows time to results• Different frameworks & analytics tools use

data differently

HDFS Transparency• Map/Reduce on shared, or shared nothing storage• No waiting for data transfer between storage systems• Immediately share results• Single ‘Data Lake’ for all applications• Enterprise data management• Archive and Analysis in-place

Ingest

ObjectFile

Direct Access

POSIX

Raw Data Analysis

Comparing HDFS v. IBM Spectrum Scale

DASS Initial Serial Performance

• Compute the average temperature for

every grid point (x, y, and z)

• Vary by the total number of years

• MERRA Monthly Means (Reanalysis)

• Comparison of serial c-code to

MapReduce code

• Comparison of traditional HDFS

(Hadoop) where data is sequenced

(modified) with GPFS where data is

native NetCDF (unmodified, copy)

• Using unmodified data in GPFS with

MapReduce is the fastest

• Only showing GPFS results to

compare against HDFS

Preliminary Results

• Presented

SC16 U/G

• 20 nodes

(compute & storage)

• Cloudera

Map/Reduce

http://files.gpfsug.org/presentations/2016/SC16/06_-_Carrie_Spear_-_Spectrum_Sclale_and_HDFS.pdf

http://files.gpfsug.org/presentations/2016/SC16/06_-_Carrie_Spear_-_Spectrum_Sclale_and_HDFS.pdf

text

IBM Spectrum Scale: Transparent Cloud TieringSingle namespace and control of data placement for hybrid cloud

Intelligent data placement

• On or off-premises objects

Policy driven tiering• Managed data placement or migration

of cold data

Automated data movement• Recall on user demand

IBM Spectrum Scale• High-performance

• Single namespace

• Unified file, object and HDFS

• Encrypted

• Secure data in cloud

text

IBM Spectrum Scale: Cloud Data SharingPolicy-driven data movement for hybrid cloud

Managed data sharing

• Policy driven replication and

synchronization

• Granular control: Type, action,

metadata or heat

Bridging cloud and file

• Storage-to-storage

• Data and metadata

Automated data movement

• Secure, reliable connection

• High-speed and scalable

• Clustered configurations IBM Spectrum Scale• High-performance file, object and HDFS

• Clustered, tiered and scalable

• Bridge legacy applications and new

workloads

Cloud storage• Cloud native applications

• Dev/Ops development

• New workloads

IBM Spectrum ScaleFeatures and Benefits

Simplified, self-tuning options

New GUI & health monitoring

Unified File, Object & HDFS

Distributed metadata &

high-speed scanning

QoS management

1 Billion Files &

yottabytes of data

Multi-cluster and system

management integration

with IBM Spectrum Control

Advanced routing with latency

awareness

Read or Write Caching

Active File Management for

WAN deployments

File Placement Optimization

End-to-end data integrity

Snapshots

Sync or Async DR

zLinux support

Tier seamlessly

Incorporate and share flash

Policy driven compression

Data protection with erasure

code and replication

Native Encryption and Secure

Erase compliance

Target object store and cloud

Leading performance for

Backup and Archive

Heterogeneous commodity

storage: Flash, disk & tape

Software, appliance or Cloud

Data driven migration

to practically any target

File/Object In/Out with

OpenStack SWIFT & S3

Transparent native HDFS

Integration with cloud

Storage management at scale

Store everywhere. Run anywhere.

Improve dataeconomics

Software DefinedOpen Platform

17

ESS New Generation

Performance and Capacity

Announced on April 11, 2017

New all Flash options in Q3

Spectrum

Scale

ESS

New! Model GL2S: 2 Enclosures, 14U

166 NL-SAS, 2 SSD


334 NL-SAS, 2 SSD


502 NL-SAS, 2 SSD

ESS 5U84

Storage

ESS 5U84

Storage

Max: .9PB raw Max: 1.6PB raw Max: 1.8PB raw Max: 3.3PB raw Max: 2.8PB raw Max: 5PB raw

Model GL2: 2 Enclosures, 12U

116 NL-SAS, 2 SSD

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84

Storage

Model GL6:6 Enclosures, 28U

348 NL-SAS, 2 SSD

Model GL4: 4 Enclosures, 20U

232 NL-SAS, 2 SSD

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84

Storage

36 GB/s

25 GB/s

17 GB/s

12 GB/s

8 GB/s

24 GB/s

18

ESS All Flash ESS: All Flash Performance *NEW*

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

EXP3524

8

9

16

17

Model GS1S

24 SSD

EXP3524

8

9

16

17

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

EXP3524

8

9

16

17

Model GS2S

48 SSD Drives

EXP3524

8

9

16

17

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

EXP3524

8

9

16

17

EXP3524

8

9

16

17

EXP3524

8

9

16

17

Model GS4S

96 SSD Drives

All Flash Speed

368 Raw TB

14 GB/sec736 Raw TB

26 GB/sec

1472 Raw TB

40 GB/sec

Capacity calculations are

based on 15.36 TB SSD’s

Performance Numbers are

based on 4MB blocksize and

100% Reads without any

cache hits. Writes are typically

20% less than Read

Performance

All numbers are based on 100

Gb EDR (4 ports connected

per ESS Node)

IBM Software-Defined Storage Portfolio

Flash

AnyStorage

Cloud Services

Family of Storage Management

and Optimization Software

Private, Publicor Hybrid Cloud

Storage Rich Servers

Secure Efficient HybridCloud

High-Performance

Simplified copy data management that can

increase business velocity and efficiencyIBM Spectrum CDM

High-performance, highly scalable

hybrid cloud storage for unstructured dataIBM Spectrum Scale

Highly flexible, scale-out enterprise block

storage for hybrid clouds that deploys in minutesIBM Spectrum Accelerate

Long term retention for active archive data that lowers costs up to 90% by delivering a fast tape file system

IBM Spectrum Archive

Virtualization and optimization of of hybrid cloud block environments that helps improve flexibility and stores up to 5x more data

IBM Spectrum Virtualize

Optimized hybrid cloud data protection that can simplify restores and reduce backup costs by up to 53 percent

IBM Spectrum Protect

Hybrid cloud storage and data management that helps optimize applications and reduce costs by up to 73%

IBM Spectrum Control

Flexible and economical scalable hybrid cloud object storage with geo-dispersed enterprise availability and security

IBM Cloud Object Storage

storage for hpc, hpda and machine learning (ml) · · 2017-07-17storage for hpc, hpda and machine...

Documents