Page 1: User Environment Enhancements in the  DoD  HPC Modernization Program

Solving the hard problems . . .Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-1

User Environment Enhancements in the DoD HPC

Modernization Program7 April 2011

Steve Scherr, DoD HPCMP

Page 2: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-2


Background: HPCMP Storage Initiative Enhanced User Environment HPC EUE Infrastructure HPC Portal

MB Revised: 5/4/2009

Page 3: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-3

HPC Modernization Program

VisionA pervasive culture existing among DoD’s scientists and

engineers where they routinely use advanced computational environments to solve the most demanding problems

transforming the way DoD does business─finding better solutions faster.

MissionAccelerate development and transition of advanced defense

technologies into superior warfighting capabilities by exploiting and strengthening US leadership in

supercomputing, communications and computational modeling.

MB Revised: 12/11/2009

Page 4: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-4

HPCMP Serves a Large, Diverse DoD User Community

● FY11 statistics– 501 active projects with 4,408 users at

250 sites – 5,098 Habus* batch requirements

● FY10 statistics (as of 9/30/2010)– 496 projects with 4,345 users– 2,866 Habus* non-real-time requirements

* Requirements and usage measured in Habus

92 users are self characterized as “Other”New CTA Space and Astrophysical Science (SAS)

Computational Structural Mechanics – 465 Users

Electronics, Networking, and Systems/C4I – 211 Users

Computational Chemistry, Biology & Materials Science – 690 Users

Computational Electromagnetics & Acoustics – 323 Users

Computational Fluid Dynamics – 1,223 Users

Environmental Quality Modeling & Simulation – 163 Users

Signal/Image Processing – 586 Users

Integrated Modeling & Test Environments – 105 Users

Climate/Weather/Ocean Modeling & Simulation – 315 Users

Forces Modeling & Simulation – 235 Users

Source: Portal to the Information Environment – July 2010

MB Revised: 1/26/2011

Customer Focus

Page 5: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-5

● DSRC systems support classified, unclassified and open computing capabilities

● 17 large HPC systems– 1 systems ― 44,000+ cores– 6 systems ― 10,000 to 22,000+ cores– 10 systems ― 2,000 to 9,000+ cores– 1.873 peak PetaFlops – 4,750 Habus

● Three new FY10 HPC systems– 773 TeraFlops– 2,251 Habus

● 14 Petabytes single copy data storage– 28 Petabytes including Disaster


● Connections to Customers

– 212 locations MB Revised: 12/22/2010

DoD Supercomputing Resource Centers (DSRCs)Six Large HPC Centers

Page 6: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-6

HPCMP Data Storage Growth

43% increase over FY 2008

34% increase over FY 2009

MB Revised: 12/22/2010

Page 7: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-7




HPC File System

Center Archive Cache


$WORKDIRshort-term storage

$WORKDIR short-term storageHPC File




Computational results used in many different ways

– Source for additional computation

– Interrogated for post-processing

– Archived for scientific value

Users are mobile within HPCMP

User View of HPCMP Storage

Page 8: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-8

Storage Lifecycle Management (SLM) Rationale

● HPCMP can provide enough storage for NEW data● Centers support 2+ generations of storage media

– Older media unreadable after tech obsolescence

Users: we can live with constraints & manage data– Need tools to manage data– Need intermediate-length storage

Active Use Archival Use


Page 9: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-9


Page 10: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-10

Evolving Enterprise Service Model

Single AuthenticationAdvance ReservationsWeb Portal FrameworkRemote SciVizezHPC

Research CommunityT & E

Software Development

Acquisition Community

HPC Center 1HPC



Utility Server


Job Submit

Metadata Attributes

Disk Tape

HPC Center 2HPC



Utility Server


Job Submit

Metadata Attributes

Disk Tape

HPC Center 3HPC



Utility Server


Job Submit

Metadata Attributes

Disk Tape

HPC Center 4HPC System 1HPC

System 2

Utility Server



Metadata Attributes30-day

Disk Storage

Archive Tape


HPC Center 5HPC



Utility Server


Job Submit

Metadata Attributes

Disk Tape

HPC Center 6HPC


HPC 15

Utility Server


Job Submit

Metadata Attributes

Disk Tape

Remote Job Management

Computational Infrastructurefor Software Development(Tools / Environment)

Data Management Tools – Metadata


Customers Services Infrastructure

Interactive Grid Generation

MB Revised: 8/27/2010

Page 11: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-11



Temporary Storage10 days

Temporary Storage10 days



Replication Between all DSRCs

HPC Enhanced User EnvironmentArchitecture

Data Analysis Services

Center-wide Job


DR&E Portal

Grid-Generation Capabilities

Single Point of Access

Services Compute Storage

Storage Lifecycle Management

Software Development Environment


Center-wide ILM-

managed File System30 days

SLM Metadata Catalog Service

Remote Disaster

Recovery Facility




MB Revised: 12/22/2010

Page 12: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-12

HPC Enhanced User Environment

● Interactive Computing– Single point of access– Center-wide job management– Remote data analysis

● Center-wide filesystem– Medium-term storage– User-specified metadata

● Data Management Tools– Insight into file archives– Program-wide visibility

● HPC Portal– Supercharge the engineering


MB Revised: 8/3/2010

Page 13: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-13


Page 14: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-14

Hardware Components

● Center-wide File System: Panasas PAS 8– 340 blades, 4 TB unformatted– Arista 7508 switch

● Utility Server: Appro 1U Tetra, 88 nodes– 44 compute: 2 AMD Opteron 2.3 GHz CPUs, 16 cores, 128 GB

memory– 22 large memory: 4 AMD Opteron 2.3 GHz CPUs, 32 cores, 256 GB

memory– 22 graphics: 2 AMD Opteron 2.3 GHz CPUs, 16 cores, 256 GB

memory, NVIDIA Tesla M2050


Page 15: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-15

System Configuration

● $HOME− 10 GB quota

● $WORKDIR− 200 TB− 100 TB user quota− Standard scrubbing

● $CENTER− 800 TB− Possible user quota (200 TB)− 30-day scrub policy− SLM compatible

● $ARCHIVE− Managed by SLM− Accessed through SLM

● Center-wide Job Management− qsub, qstat, qdel

● Resource Requests− PBS Pro

Page 16: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-16

Storage Lifecycle Management

● Based on Nirvana SRB and SAM-QFS

● Manages $ARCHIVE– Set metadata to specify retention period

● Can register files on $CENTER -- target to automate registration by end 2011

● HPC access to $ARCHIVE through transfer queue– Also working PBS parameter mechanism – future just-in-time

● Customer Experience workgroup developing auxiliary commands (Sdata) for user-defined metadata

● Global visibility

Page 17: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-17


Page 18: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-18

HPC Desktop Portal Initiative

GoalsEnable DoD scientists and engineers to apply the power of HPC without being HPC expertsProvide access to HPC resources using current web technology—attract and retain new technology experts to DoD

Methods– Provide HPC Software as a Service over web with

zero or minimal footprint– Provide common analysis tools enabled for

seamless HPC use (MATLAB)– Provide accessible optimized tools for technology

domains (CREATE, institutes)– Extension of desktop; interactive response– Single sign-on through CAC

Page 19: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-19

HPC Portal

● Engaging with DoD engineering organizations– Understand their requirements and how we can support

● Examining Cloud Computing Concepts– Software as a Service– Infrastructure as a Service

● Phase 1: Parallel MATLAB capability– ARL lead, deliver in June– Built on Microsoft HPC Server– Additional available applications, FMS, CFD, etc.

● Phase 2: Present CREATE capability– Identifying API, middleware, design framework

Page 20: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-20

HPC Modernization Program

MB Revised: 11/23/2009

Page 21: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-21


Page 22: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-22

Storage Lifecycle Management

● Layered Software Capability– Information Lifecycle Management

− Metadata – user and system defined− Policies – drive HSM− Reporting

– Hierarchical Storage Management− Tiered Storage− Disaster Recovery

● Multi-system, multi-center– Assign metadata attributes from all HPC systems– Work toward “shared” files between centers

Page 23: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-23

Storage Lifecycle Management

● Information Lifecycle Management– Provide capability to users and

administrators– Control costs

● Hierarchical Storage Management– Based on ILM information– Includes disaster recovery

● Common user interface● Work toward shared files



Page 24: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-24

ILM Requirements

● Metadata attributes– User-assignable– System-assignable– Defaults

● Tools and Reports– Enable management of data files

● Policies– Based on attributes– Used to drive HSM

Page 25: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-25

ILM Attribute Requirements

● Associated with all objects● Arbitrary number, size, type● Attribute permissions separate from underlying files

– System read/write– Creator/Owner read/write– Collections of other users

● Inheritance or default-setting at creation– Settable via templates or functions

● ILM must scale to 1B files today– No impact on I/O performance for HSM

● Attributes can be output textually

Page 26: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-26

ILM Tool Requirements

● Tools for manipulating files under ILM control– Attribute-aware– Attribute-preserving– Operate on files, directories, or lists of objects– Create/modify attributes

● Reports– Based on multiple criteria, attribute values– Status of pending operations– Consistent with attribute permissions

Page 27: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-27

HPCMP Storage Initiative

Computing power grows annually—so do stored files Archived data is hard for users to use and manage Costs: User time, labor, hardware, software and media Storage Initiative

– Objective: Refresh to manage data for next 10 years– Goals: 10-year architecture

− Leverage advances in technology− Improve user productivity− Improve reliability & adaptability− Sustain within current storage budget

MB Revised: 5/4/2009

Page 28: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-28

2001 2002 2003 2004 2005 2006 2007 2008 2009 20100










le C


of H


P St


e in



esHPCMP Data Storage Growth

Single Copy Data Storage

● Impact of 16x growth in eight years– Data Analysis– Data Locality and

Movement– Data Duplication– Disaster Recovery– Network Loading– Storage


22 x

MB Revised: 12/22/2010

Page 29: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-29

HPC Enhanced User Environment (HEUE)

● Purpose– Provide computational scientists more tools

and capabilities to perform research more efficiently and effectively

● Benefit– Decrease time-to-solution, increase S&E

productivity and analytical power, reduce future costs of data archive

● Tasks– Storage lifecycle management

implementation− Metadata for file management and identification− Program-wide datafile visibility and access

– Center-wide filesystem: efficient storage for data analysis and extraction

– Center-wide job management: single point-of-access, increase user productivity

– Remote visualization for large datasets– Web-based access to HPC capability

MB Revised: 12/22/2010

Page 30: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-30

Requested Software

System Software● PBS Pro, OpenMPI

● InfiniBand Software Stack

● NVIDIA Linux x86_64 driver set

● Compliance with BCT policies

Development Tools● PGI Compiler Suite (C/C++/Fortran)

● GNU Compiler Suite & debugger

● TotalView debugger

● NVIDIA GPGPU development Environment (OpenCL and CUDA)

● Common Set of Open Source Utilities

● BC policy: PAPII, SCALASCA, TAU, PDT, Valgrind

● DDT and DDT with CUDA debugger

Data Analysis Tools● CEI – Ensight Suite

● Intelligent Light – FieldView

● RSI, Inc. – IDL

● Mathworks – Matlab

● NCAR Graphics Library

● Kitware – ParaView

● Tecplot, Inc. –Tecplot

● VisIt Visualization Tool

● Computational Science Environment (CSE)

● ezVIZ

Page 31: User Environment Enhancements in the  DoD  HPC Modernization Program

Distribution A: Approved for public release, distribution unlimited.HPC User Forum7 Apr 2011 Page-31

Requested Software

Pre/Post Processing Software● ANSYS CFD

● Abaqus

● LS-PrePost

● Parasolid Designer (pre)

● Pointwise – Gridgen

Math Libraries● ARPACK, FFTW, PETSc, SuperLU,


New● Pipeline Pilot (Accelrys product) –

automation of the process of predicting compute intensity on the fly and submitting jobs to the US

● Isight (DSS product) - design optimization & process integration (some portions are interactive & some are for batch processing)

Secure Remote Visualization● PKI-VNC

● Longhorn

Top Related