anl royal society - june 2004 the teragyroid project - aims and achievements richard blake...

25
Royal Society - June 2004 ANL The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury Laboratory This ambitious project was the result of an international collaboration linking the USA’s TeraGrid and the UK’s e-Science Grid, jointly funded by NSF and EPSRC. Trans-Atlantic optical bandwidth is supported by British Telecommunications.

Upload: maud-cooper

Post on 12-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

The TeraGyroid Project - Aims and Achievements

Richard BlakeComputational Science and Engineering

DepartmentCCLRC Daresbury Laboratory

This ambitious project was the result of an international collaboration linking the USA’s

TeraGrid and the UK’s e-Science Grid, jointly funded by NSF and EPSRC. Trans-Atlantic optical bandwidth

is supported by British Telecommunications.

Page 2: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Overview

• Project Objectives• The TeraGyroid scientific experiment• Testbed and Partners• Applications Porting and RealityGrid

Environment• Grid Software Infrastructure• Visualization• Networking• What was done• Project Objectives - How well did we do?• Lesson Learned

Page 3: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

UK-Teragrid HPC Project Objectives

Joint experiment combining high-end computational facilities in the UK e-Science Grid (HPCx and CSAR) and the Teragrid sites:

– world class computational science experiment– enhanced expertise/ experience to benefit UK and USA– inform construction/operation of national/ international

grids– stimulate long-term strategic technical collaboration– support long-term scientific collaborations – experiments with clear scientific deliverables– choice of applications to be based on community codes– inform future programme of complementary

experiments

Page 4: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

The TeraGyroid Scientific Experiment

High-density isosurface of the late-time configuration in a ternary amphiphilic fluid as simulated on a 643 lattice by LB3D.

Gyroid ordering coexists with defect-rich, sponge-like regions.

The dynamical behaviour of such defect-rich systems can only be studied with very large scale simulations, in conjunction with high-performance visualisation and computational steering.

Page 5: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

The RealityGrid project

Mission: “Using Grid technology to closely couple high performance computing, high throughput experiment and visualization, RealityGrid will move the bottleneck out of the hardware and back into the human mind.”

• to predict the realistic behavior of matter using diverse simulation methods

• LB3D - highly scalable grid based code to model dynamics and hydrodynamics of complex multiphase fluids

• mesoscale simulations enables access to larger physical and longer timescales

• RealityGrid environment enables multiple steered and spawned simulations, the visualised output being streamed to a distributed set of collaborators located at AG nodes across the USA and UK.

Page 6: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Testbed and Project Partners

Reality Grid partners: – University College London (Application, Visualisation,

Networking)– University of Manchester (Application, Visualisation,

Networking)– Edinburgh Parallel Computing Centre (Application)– Tufts University (Application)

Teragrid sites at: – Argonne National Laboratory (Visualization, Networking)– National Center for Supercomputing Applications

(Compute)– Pittsburgh Supercomputing Center (Compute, Visualisation)– San Diego Supercomputer Center (Compute)

UK High-End Computing Services- HPCx run by the University of Edinburgh and CCLRC

Daresbury Laboratory (Compute, Networking, Coordination)- CSAR run by the University of Manchester and CSC (Compute

and Visualisation)

Page 7: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Computer Servers

Site System Procs TF (Peak) Memory (TB)

HPCx (Daresbury Laboratory)

IBM Power 4 Regatta

1024 6.6 1.024

Computer Services for Academic Research (CSAR)

SGI Origin 3800

512 0.8 0.512 (shared)

Pittsburgh Supercomputing Centre (PSC)

HP-Compaq Alpha EV68

3000 6 3.0

Itanium 2 256 1.3 0.512 National Centre for Supercomputing Applications (NCSA) Itanium 2 256 1.3 1.536

San Diego Supercomputing Centre (SDSC)

Itanium 2 256 1.3

0.512

~ 7 TB memory - 5K processors in integrated resource

The TeraGyroid project has access to a substantial fraction of the world's largest supercomputing resources, including the whole of the UK's supercomputing facilities and the USA's TeraGrid machines. The largest simulations are in excess of one billion lattice sites.

Page 8: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Networking

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

SouthamptonLondon

Belfast

DL

RAL

TeraGrid

UK

Amsterdam

BT provision

Netherlight

Page 9: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Applications Porting

• LB3D written in Fortran90• Order 128 variables per grid point 1Gpoint =

1TB• Various compiler issues to be overcome at

different sites• Site configuration issues important eg I/O

access to high speed global file systems for checkpoint files

• Connectivity of high-speed file systems to network

• Multi heading required of several systems to separate control network from data network

• Port forwarding required for compute nodes on private network

Page 10: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Exploring parameter spacethrough computational steering

Initial condition: Random water/ surfactant mixture.

Self-assembly starts.

Rewind and restart from checkpoint.

Lamellar phase: surfactant bilayers between water layers.

Cubic micellar phase, low surfactant density gradient.

Cubic micellar phase, high surfactant density gradient.

Page 11: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Reality Grid - Environment

• Computations run at HPCx, CSAR, SDSC, PSC and NCSA

• Visualisation run at Manchester, UCL, Argonne, NCSA, Phoenix

• Scientists steering calculations from UCL and Boston over Access Grid

• Visualisation output and collaborations multicast to Phoenix and visualised on the show floor in the University of Manchester booth

Page 12: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Visualisation servers

• Amphiphilic fluids produce exotic mesophases with a range of complex morphologies - need visualisation

• The complexity of these data sets (128 variables) makes visualisation a challenge

• Using the VTK library, with patches refreshing each time new data available

• Video stream multicast to Access Grid using FLXmitter library

• SGI OpenGL Vizserver used to allow remote control of visualisation

• Visualisation of billion node models requires 64-bit hardware and multiple rendering units

• Achieved visualisation of 10243 lattice using ray-tracing algorithm developed at University of Utah on 100 proc Altix on showroom floor at SC’03

Page 13: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Grid Software Infrastructure

• Various versions of Globus Toolkit 2.2.3, 2.2.4, 2.4.3 and 3.1 (including GT 2 compatibility bundles)

• Used GRAM, GridFTP Globus-I/O - no incompatibilities

• Not use MDS- robustness/ utility of data • 64 bit version of GT2 required for AIX (HPCx)

system - some grief due to tendency to require custom-patched versions of third party libraries

• Lot of system management effort required to work with/ around toolkit

• Need a more scalable CA system that bypasses every system administrator having to study everyone else’s certificates

Page 14: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

TeraGyroid Network

VisualizationComputation

Starlight (Chicago)

Netherlight(Amsterdam)

BT provision

PSC

ANL

NCSA

Phoenix

Caltech

SDSC

UCL

Daresbury

Manchester

SJ4MB-NG

Network PoP

Access Grid node

Service Registry

production netw ork

Dual-homed system

10 Gbps

2 x 1 Gbps

Page 15: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

steering

VizEng2

PHOENIX

SimEng1

UK

SimEng2

PSC

Disk1

UK

(realtime) UDP

realtime)-(non TCP

realtime)-(near TCP

files kpointchec

data vis

storage

Networking

Page 16: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Networking

• On-line visualization requires O(1 Gbps) bandwidth for larger problem sizes

• Steering requires 100% reliable near-real time data transport across the Grid to visualization engines.

• Reliable transfer is achieved using TCP/IP: handshaking for each single packet that is transferred (to check and repair loss). This slows down transport limits data transfer rates limits LB3D steering of larger systems.

• Point-to-n-point transport for visualization, storage and job migration uses n times more bandwidth since unicast is used.

Page 17: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

What Was Done?

The TeraGyroid experiment represents the first use of collaborative, steerable, spawned and migrated processes based on capability computing.

– generated 2TB of data

– exploration of the multi-dimensional fluid coupling parameter space with 643 simulations accelerated through steering

– study of finite size periodic boundary condition effects, exploring the stability of the density of defects in the 643 simulations as they are scaled up to 1283, 2563, 5123, 10243

– 100K to 1,000K time steps

– exploring the stability the crystalline phases to perturbations and variations in effective surfactant temperature

• 1283 and 2563 simulations - clear of finite size effects

• Perfect crystal not formed in 1283 systems - 600K steps

• Statistics of number of defects, velocity and lifetimes requires large systems as these have sufficient defects

Page 18: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

World’s Largest Lattice Boltzmann Simulation?

• 10243 lattice sites• scale up 1283 simulations with periodic

tiling and perturbations for initial state• Finite-size effect free dynamics• 2048 processors• 1.5 TB of memory• 1 minute per time step on 2048 processors• 3000 time steps • 1.2TB of visualisation data Run on LeMieux at Pittsburgh SC

Page 19: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Access Grid Screen at SC ‘03 during SC Global Session on Application Steering

Page 20: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Measured Transatlantic Bandwidths during SC’03

Page 21: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Demonstrations/ Presentations

Demonstrations of the TeraGyroid experiment at SC’03:

TeraGyroid on the PSC Booth Tue 18, 10:00-11:00 Thu 20, 10:00-11:00RealityGrid and TeraGyroid on UK e-Science Booth Tue 18, 16:00-16:30 Wed 19, 15:30-16:00RealityGrid during the SC'03 poster session: Tue 18, 17:00-19:00HPC-Challenge presentations: Wed 19 10:30-12:00SC Global session on steering: Thu 20, 10:30-12:00

Demonstrations and real-time output at the University of Manchester and HPCx booths.

Page 22: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Most Innovative Data Intensive Application - SC 03

Page 23: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Project Objectives - How Well Did We Do? - 1

• world class computational science experiment– science analysis is ongoing - leading to new insights

into properties of complex fluids at unprecedented scales

– SC’03 award - ‘Most Innovative Data Intensive App’• enhanced expertise/ experience to benefit UK and USA

– first transatlantic federation of major HEC facilities– applications need to be adaptable to different architectures

• inform construction/operation of national/ int grids– most insight gained into end to end network integration,

performance and dual homed systems– remote visualisation, steering and checkpointing require high

bandwidth which is dedicated and reservable– results fed directly into ESLEA proposal to exploit UKLight

optical switched network infrastructure

• stimulate long-term strategic technical collaboration– strengthened relationships between Globus,

networking and visualisation groups

Page 24: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Project Objectives - How Well Did We Do? - 2

• support long-term scientific collaborations – built on strong and fruitful existing scientific

collaborations between researchers in UK and USA

• experiments with clear scientific deliverables- an explicit science plan was published, approved and

then executed. Data analysis is ongoing.

• choice of applications to be based on community codes– experiences will be of benefit to other grid based

applications in particular in the computation engineering community

• inform future programme of complementary experiments – Report to be made available on RG Website– EPSRC Initiating another Call for Proposals - not targetting

SC’04.

Page 25: ANL Royal Society - June 2004 The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury

Royal Society - June 2004

ANL

Lessons Learned

• How to support such projects - full peer review?• Timescales were very tight - September - November• Resource estimates need to be flexible• Need complementary experiments for US and UK to

reciprocate benefits• HPC centres/ e-science and networking groups can

work very effectively together on challenging common goals

• Site configuration issues very important - network access

• Visualisation capabilities in UK need upgrading• Scalable CA, dual address systems• Network QoS very important for checkpointing, remote

steering and visualisation

• Do it again?