nci.org.au @ncinews computational environments and analysis methods available on the nci hpc &...
TRANSCRIPT
nci.org.aunci.org.au
@NCInews
Computational Environments and Analysis methods available on the NCI HPC & HPD Platform
IN53E – 01
Ben Evans1, Lesley Wyborn1, Adam Lewis2, Clinton Foster2, Stuart Minchin2, Tim Pugh3, Alf Uhlerr4, Bradley Evans5,
1ANU, 2Geoscience Australia, 3Bureau of Meteorology, 4CSIRO, 5Macquarie University
nci.org.au
Overview
• High Performance Data (HPD) - data that is carefully prepared, standardised and structured so that it can be used in Data-Intensive Science on HPC (Evans et al., in press)
– HPC – turning compute into IO-bound problems
– HPD – turning IO-bound into ontology + semantic problems
• What are the HPC and HPD drivers?
• Build re-usable/sustainable software for use in Virtual Laboratories – integrated set of software for science, a mix of new and
familiar
• What have we done?• What’s next?
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
1/34
nci.org.au© National Computational Infrastructure 2014
Numerical Weather Prediction Roadmap
Model Topography of Sydney, NSW
2 x daily 10-day & 3-day forecast40km Global Model
4 x daily 3-day forecast12km Regional Model
Sydney, NSW (research 1.5km topography)
4 x daily 36-hour forecast4km City/State Model
TC
Increasing model resolutionfor improved local information
Future model ensembles for likelihood of significant weather
2 x daily 10-day & 3-day forecast12km Global Model
8 x daily 3-day forecast5km Regional Model
24 x daily 18h or 36h forecast1.0km City/State Model
2013
2020
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
C/- Tim Pugh, BoM
2/34
nci.org.au© National Computational Infrastructure 2014
Capture, analysis & application of Earth Obs
c/- Adam Lewis, GA
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
3/34
nci.org.au© National Computational Infrastructure 2014
How to bring as much observational scrutiny as possible
to the CMIP/IPCC process?
How to best utilize the wealth of satellite observations for the
CMIP/IPCC process?
c/- Robert Ferraro, NASA/JPL, ESGF F2F, 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Combining Satellite and Climate4/34
nci.org.au
Top 500 Super Computer list since 1990
• Fast-and-flexible data access to structured data is required
• The needs to be a balance between processing power and ability to access data (data scaling)
• The focus is for on-demand direct access to large data sources
• enabling High performance analytics and analysis tools directly on that contenthttp://www.top500.org/statistics/perfdevel/
Current NCI
Next NCI
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
© National Computational Infrastructure 2014
5/34
nci.org.au© National Computational Infrastructure 2014
Elephant Flows Place Great Demands on Networks
Physical pipe that leaks water at rate of .0046% by volume.
Network ‘pipe’ that drops packets at rate of .0046%.
Result100% of data transferred, slowly, at <<5% optimal speed.
Result 99.9954% of water transferred.
essentially fixed
determined by speed of
light
With proper engineering, we can minimize packet loss.
Assumptions: 10Gbps TCP flow, 80ms RTT. See Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, and Jason Zurawski. The Science DMZ: A Network Design
Pattern for Data-Intensive Science. In Proceedings of the IEEE/ACM Annual SuperComputing Conference (SC13), Denver CO, 2013.
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
6/34
nci.org.au© National Computational Infrastructure 2014
Raijin:• 57,472 cores (Intel Xeon Sandy Bridge technology,
2.6 GHz) in 3592 compute nodes;• 160 TBytes (approx.) of main memory;• Infiniband FDR interconnect; and• 7 PBytes (approx.) of usable fast filesystem (for
short-term scratch space).• 1.5 MW power; 100 tonnes of water in cooling
Partner Cloud• Same generation of technology as raijin (Intel
Xeon Sandy Bridge technology, 2.6 GHz) but only 1500 cores;
• Infiniband FDR interconnect;• Collaborative platform for services and• The platform for hosting non-batch services
NCI Nectar Cloud• Same generation as partner cloud• Non-managed environment• Weak integration
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Computational and Cloud Platforms7/34
nci.org.au© National Computational Infrastructure 2014
Per-Tenant public IP assignments (CIDR boundaries – typically /29)
FD
R I
B
FD
R I
B
FD
R I
BF
DR
IB
FD
R I
B
FD
R I
B
OpenStack private IP (flat network*) - quota managed
NFS
Lustre
NFS
SSDSSDSSDSSDSSDSSD
NCI Cloud
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
8/34
nci.org.au
NCI’s integrated high-performance environment
10 GigE
/g/data 56Gb FDR IB Fabric
/g/data1~7.4 PB
/g/data2~6.7 PB
/short7.6PB
/home, /system, /images,
/apps
Cache 1.0PB, Tape 12.3PB
Massdata (tape) Persistent global parallel filesystem
Raijin high-speed filesystem
Raijin HPC Compute
Raijin Login + Data movers
NCI data movers
To s
eco
nd
dat
a ce
ntr
e
Raijin 56Gb FDR IB Fabric
Internet
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
9/34
nci.org.aunci.org.au
Building The Platform for Earth System modeling & Analysis
© National Computational Infrastructure 2014
10PB+ Research Data
Server-side analysis and visualization
Data Services
THREDDS
VDI: Cloud scale user desktops on data
Web-time analytics software
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
10/34
nci.org.au© National Computational Infrastructure 2014
10 PB of Data for Interdisciplinary Science
BOM GA CSIRO ANU Inter-national
Other National
CMIP5 3PB
Astronomy (Optical) 200 TB
WaterOcean1.5 PB
Atmosphere2.4 PB
Earth Observ.
2 PB
MarineVideos 10 TB
Geophysics 300 TB
Weather340 TB
Mirrored from major science agencies and other sources
Bathy, DEM
100 TB
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
11/34
nci.org.au© National Computational Infrastructure 2014
Data Collections Approx. Capacity
CMIP5, CORDEX ~3 Pbytes
ACCESS products 2.4 Pbytes
LANDSAT, MODIS, VIIRS, AVHRR, INSAR, MERIS 1.5 Pbytes
Digital Elevation, Bathymetry, Onshore Geophysics 700 Tbytes
Seasonal Climate 700 Tbytes
Bureau of Meteorology Observations 350 Tbytes
Bureau of Meteorology Ocean-Marine 350 Tbytes
Terrestrial Ecosystem 290 Tbytes
Reanalysis products 100 Tbytes
National Environment Research Data Collections (NERDC)1. Climate/ESS Model Assets and Data Products2. Earth and Marine Observations and Data Products3. Geoscience Collections4. Terrestrial Ecosystems Collections5. Water Management and Hydrology Collections
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
12/34
nci.org.aunci.org.au
Internationally sourced • Satellite Data (USGS, NASA, JAXA, ESA, …)• Reanalysis (ECMWF, NCEP, NCAR, …)• Climate Data (CMIP5, AMIP, GeoMIP, CORDEX, …)• Ocean Modelling (Earth Simulator, NOAA, GFDL, …)These will only increase as we depend on more data, and some will be replicated.
How should we keep this in sync, versioned, and back-referenced for the supplier?
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
13/34
nci.org.au© National Computational Infrastructure 2014
• allow multiple data types but convert proprietary ones• standardize record format and conventions
• Expose all attributes for search • not just collection-level search, not just datasets, all data• What are the handles we need to access the data?
• Provide more programmatic interfaces and link up data and compute resources• More server side processing
• Add the semantic meaning to the data• Is it scientifically appropriate for a data service to aggregate/interpolate?• CMIP5 successful because we constrained the problem
• What unique identifiers do we need? • DOI is only part of the story.• Versioning is important.
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Some Data Challenges14/34
nci.org.au
Recording Hierarchy in 191391. Data collection – eg Climate and Weather modelling2. Series – eg. Landsat 73. Datasets – Semantically the same4. Attributes – including variables (versions, errata)
Metadata Hierarchy for discovery
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
15/34
nci.org.au
Geonetwork: Collection (and Series?)
Dataset specific Geonetworks
Dataset 1 Dataset 2 Dataset 3 Dataset n
Dataset 1Dataset 2Dataset 3…
CSW Harvesting and Cross-walks (eg RIF-CS Adapter)
Full harvest of the metadata
Full Search GeoNetworkFull Search GeoNetwork (or domain)
Dataset 1Dataset 2Dataset 3…
Domain Specific or User deep query
NCI GeoNetwork architecture (basic catalogues)
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Metadata Hierarchy implementation16/34
nci.org.au
GeoNetwork catalogue
Lucene database
DAP, OGC, … Services
/g/data1
/g/data2
Supercomputer access
Virtual lab
© National Computational Infrastructure 2014
Trialing Elastic Search
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Finding data and services17/34
nci.org.au© National Computational Infrastructure 2014
Recording full product description … now need to contextually embed for programs
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
18/34
nci.org.au© National Computational Infrastructure 2014
The selfish practical researcher:• Not Virtual Organisations. Interoperable tools in
virtual laboratories. Make seamless.• Anti-collaboration: just apply standards• Micro-ambition: did I get stuff done quicker/better• Data handling (and particularly movement!) is a
complete waste of time.Sustainability:• The system should capture my operations. Why am I a
secretary? I can’t remember what I did? The system did things that I didn’t know anyway! www.onlychild.org.uk
What’s worse?
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
• Cite-me! People should recognise my genius! Do I have to be in PR?• I’ve done my bit, and its really clever. Here you go, I am going to do something else.
(Actually same issue with sub-contracting work, and multiparty agreements)
19/34
Collaborating with Researchers/Developers
nci.org.au© National Computational Infrastructure 2014
The selfish practical researcher:• Not Virtual Organisations. Interoperable tools in
virtual laboratories. Make seamless.• Anti-collaboration: just apply standards• Micro-ambition: did I get stuff done quicker/better• Data handling (and particularly movement!) is a
complete waste of time.Sustainability:• The system should capture my operations. Why am I a
secretary? I can’t remember what I did? The system did things that I didn’t know anyway! www.onlychild.org.uk
What’s worse? Perhaps the opposite to all these items.
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
19/34
Collaborating with Researchers/Developers
• Cite-me! People should recognise my genius! Do I have to be in PR?• I’ve done my bit, and its really clever. Here you go, I am going to do something else.
(Actually same issue with sub-contracting work, and multiparty agreements)
nci.org.au© National Computational Infrastructure 2014
The selfish practical researcher:• Not Virtual Organisations. Interoperable tools in
virtual laboratories. Make seamless.• Anti-collaboration: just apply standards• Micro-ambition: did I get stuff done quicker/better• Data handling (and particularly movement!) is a
complete waste of time.Sustainability:• The system should capture my operations. Why am I a
secretary? I can’t remember what I did? The system did things that I didn’t know anyway! www.onlychild.org.uk
What’s worse? Perhaps the opposite to all these items. Need a strategy to properly address this.
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Collaborating with Researchers/Developers 19/34
• Cite-me! People should recognise my genius! Do I have to be in PR?• I’ve done my bit, and its really clever. Here you go, I am going to do something else.
(Actually same issue with sub-contracting work, and multiparty agreements)
nci.org.au© National Computational Infrastructure 2014
• Project driven means:• define a use-case• end-date on the work
• The researcher / leading developers may be ahead of the curve• We want to best tap this time and energy, … and to have a reasonable chance of converting for sustainability
The Nth Degree, ST-TNG
Barclay: Computer, begin new program. Create as follows: workstation chair. Now, create a standard alphanumeric console, positioned for the left hand. Now an iconic display console, positioned for the right hand. Tie both consoles into the Enterprise main computer core, utilizing neural-scan interface. Enterprise Computer: There is no such device on file. Barclay: No problem. Here's how you build it.
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Collaborating with Researchers/Developers 20/34
nci.org.au© National Computational Infrastructure 2014
Virtual Labs:• Separating Researcher from Software builders• Cloud is an enabler, but:
• don’t make researchers become full system admins.• save developers from being operational
Productivity
Perspiration
Proj1:Start Proj1:End
Project lifecycle – and preparing success
Proj2-4:Start Proj2-4:End
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Prototype to Production - anti-Mine craft 21/34
nci.org.au© National Computational Infrastructure 2014
Development Phase in a project
VL Managers
Dev
elop
ers
Hea
dspa
ce h
ours
VL Managers
Dev
elop
er
Poorly executed
Developer
Reasonablyexecuted
VL Mgr.
Wellexecuted
?
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Prototype to Production - anti-Mine craft 22/34
nci.org.au© National Computational Infrastructure 2014
Prototype to Production - anti-Mine craft
Development Phase in a project
VL Managers
Dev
elop
ers
Hea
dspa
ce h
ours
VL Managers
Dev
elop
er
Poorly executed
Developer
Reasonablyexecuted
VL Mgr
Wellexecuted
Changed Scope – adopted broadly
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
22/34
nci.org.aunci.org.au
Virtual Laboratory driven software patterns
Basic OS functions
Common Modules
Bespoke Services
Special config choices
Super Software Stack
NCI Stack 1NCI Env Stack
WorkflowX
Analytics Stack
2xStack1
Modify Stack1
Modify Stack 2P2P
Vis Stack
Gridftp
Take Stacks from UpstreamAnd use as Bundles
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
23/34
nci.org.au© National Computational Infrastructure 2014
Step 1: Development• Get template for development• What is special, separate out what is common• Reuse other software stacks where possible
Step 2: Prototype• Deploy in an isolated tenant of a cloud• Determine dependencies.• Test cases to demonstrate correctly functioning.
Step 3: Sustainability• Pull repo into operational tenant• Prepare bundle for integration with rest of framework• Hand back cleaned bundle• Establish DevOps process
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Transition from developer, to prototype, to DevOps24/34
nci.org.aunci.org.au
NCI Core Bundles
Community1 repo
Community2 repo
Virtual LaboratoryOperational Bundle- Git controlled- pull model- continuous integration testing
DevOps approach to building and operating environments
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
25/34
nci.org.au© National Computational Infrastructure 2014
• Separates roles and responsibilities: • Specialist on package• VL managers• system admin
• anti-architecture: “Architecture” to “framework”• flexible with technology change• makes handover easier
• Both Test/Dev/Ops and patches/rollback become BAU• Sharable bundles• Can tag release of software stacks• Precondition for trusted software stacks• Provenance - Scientific / gov policy scrutiny
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Advantages26/34
nci.org.au© National Computational Infrastructure 2014
Transforms the system admins• Role Change, from gatekeeper to DevOps management• New skills, new way of thinking
• Separates out root trust for global storage• dev teams are limited to test areas• Root access for ops but can be a limited group
• Only Operating System provided to boot from• Remove old-style Golden (fragile) Images• Easier to security patch
• glue bundles together into different software stacks• addresses the bloated node problem• scale out generally easier• Standard system configs go into “core” bundle (LDAP, logs, easter eggs)• Recast project specific bundles to common, or core.
• Performance issues can be addressed across the Virtual Labs/in the core
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Advantages cont…27/34
nci.org.aunci.org.au© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
A snapshot of layered bundles28/34
nci.org.aunci.org.au
Collaboration: Bureau of Meteorology, CSIRO, NCI, ARCCSS
Climate and Weather Science Lab
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
29/34
nci.org.au
Timetable- Early access started on 2 Sept, General release to CWSlab week Late September- Incorporate into all VLs (eg current AGDC Datacube to be upgraded)
VDI - Virtualised Desktop Infrastructure
© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
30/34
nci.org.au© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
VDI – cont …31/34
nci.org.au© National Computational Infrastructure 2014
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
VDI – cont …32/34
nci.org.au© National Computational Infrastructure 2014
• Trans-disciplinary science To publish, catalogue and access self-documented data and software for enhancing trans-disciplinary, big data science within interoperable data services and protocols.
• Integrity of ScienceManaged services to capture a workflow’s process as a comparable, traceable output.Ease-of-access to data and software for enhanced workflow development and repeatable science which can be conducted with less effort or an acceleration of outputs.
• Integrity of DataThe data repository services to ensure data integrity, provenance records, universal identifiers, repeatable data discovery and access from workflows or interactive users.
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
Progress toward Major Milestones33/34
nci.org.au© National Computational Infrastructure 2014
• Auth: Authentication and Authorisation• Path forward …. Oauth2-style model.• How to enable at all service provider points?• Attributes, not virtual organisations
• Trusted software• Related to citation, but same issues as data
• Provenance • Need well thought out complex graphs, not just pre-canned stacks
• Effectively using new data technology• Its no longer just POSIX• Do we have to copy the same data into different forms?• Libraries increasingly have a new role to play to hide complexity
IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans
New Challenges34/34