storage on the lunatic fringe nnsa advanced simulation and … · storage on the lunatic fringe...

11
Storage on the Lunatic Fringe Storage on the Lunatic Fringe NNSA NNSA Advanced Simulation and Advanced Simulation and Computing Program Computing Program (ASC) (ASC) Panel at SC2003 November 19 Panel at SC2003 November 19 Bill Boas Bill Boas Lawrence Livermore National Lawrence Livermore National Laboratory Laboratory UCRL-PRES-201057 This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.

Upload: others

Post on 12-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

Storage on the Lunatic FringeStorage on the Lunatic Fringe

NNSANNSAAdvanced Simulation and Advanced Simulation and

Computing ProgramComputing Program(ASC)(ASC)

Panel at SC2003 November 19Panel at SC2003 November 19Bill BoasBill Boas

Lawrence Livermore National Lawrence Livermore National LaboratoryLaboratoryUCRL-PRES-201057

This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.

Page 2: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

2November 19, 2003Panel at SC03

AgendaAgenda

• ASC Program Role• ASC Storage Roadmap• File System Requirements• Programming Model• Cluster Vision• Q1 CY’04 OCF Cluster Deployment• Issues on the Fringe

Page 3: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

3November 19, 2003Panel at SC03

Role of the Advanced Simulation and Computing Program (ASC)Role of the Advanced Simulation and Role of the Advanced Simulation and Computing Program (ASC)Computing Program (ASC)

• ASC Mission: Provide computational means to assess and certify the safety, performance and reliability of nuclear stockpile and its components

• ASC Goals: Deliver predictive codes based on multi-scale modeling, code verification and validation, small-scale experimental data, test data, judgment engineering analysis, expert judgment

• Started in 1996: approximately 1/8 of the SSP budget • PathForward and Alliances: support h/w and s/w,

developments with research, academia and industry• Scalable Global Secure File System (SGSFS): awarded to

HP, Intel and CFS in 2002 to develop a high performance file system to meet ASC goals

Page 4: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

4November 19, 2003Panel at SC03

ASC Program is more than platforms and physics codesASC Program is more than platforms and physics codes

AdvancedApplications

Materials and PhysicsModeling

tegration

Simulation Support

Physical InfrastructureAnd Platforms

ComputationalSystems

UniversityPartnerships In

AdvancedArchitectures

Verification and

Validation

Problem SolvingEnvironmentVIEWS PathForward

DISCOM

Page 5: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

5November 19, 2003Panel at SC03

ASC Data Storage and I/O RoadmapASC Data Storage and I/O Roadmap

Area CY 02 CY 03 CY 04 CY 05 CY 06 CY 07ASC Perf. Targets

30 TF1 PB Archive7- 20 GB/s parallel FS1 GB/s to Arch. tape

70-100 TF7 PB Archive100 GB/s parallel FS10 GB/s to Arch. tape

200 TF25 PB Archive200 GB/s parallel FS20 GB/s to Arch.tape

SGSFS Lustre Lite on Linux

Lustre Lite limitedproduction

Lustre w. OST striping

Lustre early prod.

Lustre stable prod.

SIO Libs Limited App use

Use by key Apps

Broad App use

Perf. tuned for Lustre

Archive HPSS 4.1 production

HPSS 4.5 production

HPSS 5.1 Metadata fixes

HPSS 6.1 replace DCE

TBD

DFS DFS in production

Pilot NFSv4 on Linux

Deploy NFSv4

Integrate NFSv4 w. Lustre

COTS 180 GB/disk30 MB/s single disk300 GB tape capacity70 MB/s max tape rate

600 GB/disk80 MB/s single disk600 GB tape capacity120 MB/s tape rate

1200 GB/disk200 MB/s single disk2 TB tape capacity200 MB/s tape rate

Page 6: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

6November 19, 2003Panel at SC03

ASCI Scale File System RequirementsASCI Scale File System Requirements

• Global Access• Multi-Gigabyte per Second Performance• Scalable Infrastructure for Clusters and

Archive throughout Site or Facility• Integrated Infrastructure for WAN Access• Scalable Management and Operational

Facilities• Security

Page 7: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

7November 19, 2003Panel at SC03

An Identical Programming ModelAn Identical Programming ModelAcross Scalable Platforms (at LLNL)Across Scalable Platforms (at LLNL)

LocalGlobal SharedGlobal Shared

Scalable I/OScalable I/O

OpenMPOpenMP

OpenMP

Shared Serial(NFS) I/O

MPI

MPI Communication

OpenMP

Local

IdeaIdea: Provide a consistent : Provide a consistent programming model for multiple programming model for multiple platform generations and across platform generations and across multiple vendors!multiple vendors!

IdeaIdea: Incrementally increase : Incrementally increase functionality over time!functionality over time!

Page 8: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

8November 19, 2003Panel at SC03

Architectural Vision for a Linux clusterArchitectural Vision for a Linux cluster

QsNet Elan3, 100BaseT Control

Example 1,498 8-Way 8 Compute Nodes

4 Login nodeswith 8x10Gb-Enet

2 Service

84 RAID subsystems2 GB/s raw delivered each

Lustre Total 106 GB/s

2 MetaData (fail-over) Servers128 Gateway nodes @ 1 GB/s

delivered Lustre I/O

100BaseT Management

OST OST OST OST OST OST OST OSTMDS MDS

Dual 1,632 Port (51x32D32U+32x64D0U) IBA 12x

1 & 10GbEnet Federated Switch SATA Switches

System and Storage Parameters• 128 OSTs with SATA attached RAID Arrays

•20 B:F = 2.0 PB of global disk• Lustre file system with 100 GB/s delivered

parallel I/O performance• Could scale up to 2,048 nodes or over 130

teraFLOP/s

System and Storage Parameters• >100 TF/s and 50 TB of memory• Dual IBA 12x interconnect

• B:F =12:64 = 0.1875• ~5 µs MPI latency and 10 GB/s MPI Bandwidth• 1,466*8 = 11,728 MPI tasks (limited by number of

compute nodes)

Page 9: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

9November 19, 2003Panel at SC03

MCR - 1152 Port QsNet Elan3

GW GW MDS MDS

ALC - 960 Port QsNet Elan3

GW GW MDS MDS

OCF SGS File System Cluster (OFC)

B439LLNLExternal

Backbone

SW

SWSW

SW

SW

SW

SW

SW

2 Login nodes 32 Gateway nodes @ 190 MB/s

PVC - 128 Port Elan3B451

GW GW

HPSSArchive

24

PFTP

OST OST OST OST OST OST

OST OST OST OST OST OST

OST

Thunder - 1024 PortQsNet Elan4

128OST

Heads

128OST

Heads

NASSystems

MDS MDS

USERS

924 Dual P4 Compute Nodes

B439

OCFMetaDataCluster

FederatedEthernet

2 Login nodeswith 4 Gb-Enet

32 Gateway nodes @ 190 MB/sdelivered Lustre I/O over 2x1GbE

B 113116etc.

with 4 Gb-Enet delivered Lustre I/O over 2x1GbE

B439

52 Dual P4Render Nodes

6 Dual P4Display

1,116 Dual P4 Compute Nodes

MM Fiber 1 GigE

SM Fiber 1 GigE

Copper 1 GigE

2 Login nodes

2Gig FC

2Gig FC

Dual P4Head

FC RAID

146, 73, 36 GB

1008 4 - Way Itanium2 Compute Nodes

16 GW NodesItanium2

400- 600 Terabytes

B451

USERS

MultiMulti--Cluster Global Scalable StorageCluster Global Scalable Storage

Page 10: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

10November 19, 2003Panel at SC03

Issues on the Fringe in Issues on the Fringe in Parallel/Cluster File SystemsParallel/Cluster File Systems

• Multi-Cluster Interconnection• Metadata Services Scaling• Performance at Scale• Scaling Multiple Clusters• Recovery at Scale• Availability at Scale• Geographic dispersion• Security in Multi-Clusters/Geographic• $$ at scale of Storage Hardware• “Non-exotic” direct interconnect use

Page 11: Storage on the Lunatic Fringe NNSA Advanced Simulation and … · Storage on the Lunatic Fringe DISCLAIMER. This document was prepared as an account of work sponsored by an agency

11November 19, 2003Panel at SC03

Storage on the Lunatic FringeStorage on the Lunatic Fringe

DISCLAIMER

This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes.

This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.