hea 11th november1 irish centre for high-end computing bringing capability computing to the irish...

23
HEA 11th November 1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

Upload: dinah-alexander

Post on 04-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 1

Irish Centre for High-End Computing

Bringing capability computing to the Irish research community

Andy ShearerDirectorICHEC

Page 2: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 2

Why ICHEC?

• Ireland lags well behind the rest of Europe in terms of installed HEC capacity– See www.arcade-eu.info/academicsupercomputing/comparison.html– Ireland now has an installed capacity of about 6000 Gflops/s– Ireland has about15-20 systems above 64 Gflops/s sustained

Page 3: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 3

Why ICHEC?

• Ireland - <2500 processors• But … now about 1500 Kflops/s per capita

Page 4: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 4

High-End Computing in Ireland

DFT+U spin density. Michael Nolan1, Dean C. Sayle2, Stephen C. Parker3 and Graeme W. Watson1

(1) TCD, (2) Cranfield University, (3) University of BathMarine Modelling Centre, NUI, Galway

Llyod D., et al, Dept. of Biochemistry, TCD

G.Murphy, et al, , DIAS/CosmoGrid

Page 5: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 5

Context and Motivation

• Ireland’s ability to perform internationally competitive research and to attract the best computational scientists is currently hindered due to a lack of high end computational resources

• Ireland has the research demands to justify a large-scale national facility

• The Irish Centre for High-End Computing (ICHEC)– SFI-funded (€2.6M) with in collaboration with the

PRTLI project CosmoGrid (€700k), plus links to the TCD/IITAC cluster. CosmoGrid - Grid-Enabled Computational Physics of Natural Phenomena.

– Initially funded for one year, with proposal to extend to a minimum of three years

Page 6: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 6

Participating Institutions

National University of Ireland, Galway

Dublin City University

Dublin Institute for Advanced Studies

National University of Ireland, Maynooth

Trinity College Dublin

Tyndall Institute

University College Cork

University College Dublin

HEAnet

Page 7: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 7

ICHEC’s objectives

• Support world-class research programmes within Ireland• Undertake collaborative research programmes• Provide access to HEC facilities to students and researchers

in Ireland• Provide HPC training• Increase the application of HEC and GRID technologies• Encourage and publicise activities conducive to establishing

and sustaining a world-class HEC facility in Ireland• Foster collaboration with similar internationally recognised

organisations

Page 8: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 8

Phase I (2005) – examine the challenges associated with the management and operations of a distributed national HEC centre

→ Hire key personnel, set up infrastructure and policies, gather user requirements,…

Phase II (2006) – bring the personnel required in to support such a system and develop large-scale new science

→ Fully utilise resources, deploy points of presence, participate in community-led initiatives, explore prospects for technology transfer activities,…

Phase III (2007/8) – ICHEC fully established as a national large-scale facility

ICHEC’s Roadmap

Page 9: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 9

ICHEC Hardware Resources

We have a diverse user group so we require heterogeneous resources.

Our tender looked for

Shared Memory System (20%)Cluster (60%)Shared File system (20%)

We were also open to imaginative solutions - as long as they were within our budget

Acceptable responses from Bull, IBM, Clustervision, HP, Sun.

Other vendors did not read the tender documents and/or deliver on time!

and of course if we do this again we would do it differently - and so I would hope would the manufacturers

Page 10: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 10

Shared memory system

• Bull NovaScale NS6320– 32 Itanium 2 @1.5GHz– 256GB RAM– 2.1Tb storage– 192 GFlops/s peak– 166GFlops/s LINPACK

• Bull Novascale NS4040– front-end functionalities

– 4 Itanium 2 @1.5GHz

Page 11: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 11

Distributed memory cluster and filesystem

• IBM cluster 1350– 474 e326 servers

• Dual Opterons @2.4GHz each

– 2.2Tb distributed memory• Mix of (410) 4Gb and (64) 8Gb nodes

– 20Tb storage• SAN IBM DS4500 using GPFS

– 4.55 TFlops/s peak, 2.8Tflops/s• Will feature in the upper end of the Top500

– 15 additional nodes• front-end functionalities and storage/cluster management

– Interconnect initially Gigabit Ethernet

Page 12: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 12

ICHEC Machine Room

• Installation on schedule– Acceptance tests mainly completed…– Transitional Service opened 1st September

Page 13: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 13

Software

• Establishing user requirements – discussions with user groups

• Limited software budget– Prices for National Centres higher than for academic research groups– NB… not possible to purchase all packages on people’s wish list

• We had pressure to purchase many packages with overlapping functionalities – need to rationalise– Preference given to packages with best scalability/performance– Will work with research communities to reduce this list

Page 14: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 14

Software (cont.)• Currently shortlisted ~30 packages

• Current list – Operating systems

• SUSE Linux Enterprise 9, Bull Linux (based on Red Hat Enterprise)

– Programming environments• Intel compilers, Portland Group Cluster Development Kit, DDT

– Pre-/Post-processing, data visualisation• IDL, Matlab, OpenDX, VTK

– Libraries and utilities• FFTW, Scalapack, PetSC, (Par)Metis …

Page 15: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 15

Software (cont.)– Engineering software

• CFX, Abaqus, Marc, Patran

– Computational chemistry / materials sciences / life sciences

• Amber, (MPI-)BLAST, Charmm, CPMD, Crystal, DLPoly, Gamess UK, Gaussian03, Gromacs, MOE, Molpro, NAMD, NWchem, Siesta, Turbomole, VASP

– Environmental sciences• POM

• Expensive packages (Accelerys, etc.) not initially available– defer decision until Phase 2 budget approved

Page 16: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 16

Scheduling Policies

• Transitional Period, cluster

– Day regime: week days, 10am – 6pm– Night regime: week days, 6pm – 10am– Week-end regime: Friday 6pm – Monday 10am

• Tuning pending…– Scalability studies of key applications– User feedback

• Gaussian users expected to use the Production region / HiMem– Allow acceptable turn-around time on the Bull for FEM users

Size Job size Mode Max. runtime Production region LoMem 640 16 to 512 Continuous 120 hours Production region HiMem 128 8, 16 Continuous 120 hours

Day 1 hour Night 8 hours

Development region

160

4, 8

Week-end 32 hours

Page 17: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 17

Scheduling Policies

• Transitional Period, shared-memory system

– Day/night/week-end regimes: same as cluster

• Feedback from user community is important to help us tune policies for the start of the full National Service

• Need for checkpoint/re-start capability for long runs

• Fair share mechanism enforced by Maui

• Mechanism to “jump the queue” for important deadlines

Job size Mode Max. runtime Production region 4, 8, 16 Continuous 72 hours

Day 1 hour Night 8 hours

Development region

4, 8

Week-end 24 hours

Page 18: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 18

Supporting the research community

• ICHEC will be set up as a distributed national centre

• The Centre will be an equally important mix of Hardware and People– Large groups in Dublin, Galway and Cork

– Points of presence in each of the participating institutions (Phase 2)

– Support will be essential

• Plans are to recruit applications scientists in scientific areas:– engineering (FEM/CFD/LB)

– life sciences (bioinformatics, drug design)

– physics /chemistry (soft and hard condensed matter)

– environmental sciences (geology/geophysics, oceanography, meteorology/climatology)

– applied maths

– HPC

Page 19: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 19

Supporting the research community (cont.)

• The face of the Centre will be the support scientists– Serve their research community, rather than providing

“free staff” to individual groups• Code development (community-led initiative)

• Make the best use of the infrastructure (parallelise/optimise)

• Development of specialised material (course, documentation)

• Technical support for “Grand Challenge” projects

• Persistent communication channel with ICHEC

• Key to our success will be a close and sustained collaboration with the research community

Page 20: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 20

Training activities

• Training is seen as a way of increasing performance and efficiency of the machines - our programme reflects the relative youth of Ireland’s HEC activity.

• ICHEC has developed courses - the first was delivered to UCD in October.– An introduction to High-Performance Computing (HPC)

• What is HPC, HPC as a tool for furthering research, etc. • Overview of current hardware architectures • Overview of common programming paradigms • Decomposition strategies • An introduction to the ICHEC national service

– Parallel programming with MPI: an introduction– Parallel programming with OpenMP: an introduction

• Specialised material to be developed in 2006, e.g.– HPC for engineers, HPC in bioinformatics, parallel linear algebra, etc.

Page 21: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 21

Now …

• Service started on the 1st September (Bull)– 21st September (IBM)

• Have 60 approved projects plus 17 from CosmoGrid– 38% Physical Sciences– 22% Engineering– 25% Life Sciences– 7% Environmental Sciences– 7% Other (Maths, Computer Sciences, Humanities)

• At least three times as many applications will be submitted in 3-6 months

• Running at full capacity by the end of the year

Page 22: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 22

Problems

• Getting the community up to speed on – Batch submission– Moving from serial to parallel computing

• Licences?– National software agreements

• Firewalls and security– Having a miriad of security and firewall standards for HEAnet is insanity

• DCU and TCD do not accept zip files but have different ways around this• Proxy servers for ssh is different on each site• Grid access is a nightmare …….

– A common approach via HEAnet would help

• The last mile problem– Our access is fine, redundant Gigabit; but the universities throttle this back

Page 23: HEA 11th November1 Irish Centre for High-End Computing Bringing capability computing to the Irish research community Andy Shearer Director ICHEC

HEA 11th November 23

The future … enhanced hardware

• Technology/Architecture?– Massive clusters bring their own problems of reliability– Large scale SMP vs tightly bound clusters of ‘fat’ nodes– Novel architecture - IBM’s BlueGene or FPGAs - others

• Software?– Massive clusters bring massive problems

• Scalability• Fault tolerance - how to do a check point restart on 10000 nodes?

• Licences?– How best to licence software on a 10000 node cluster?

• How to benchmark?In 2006/7 we will be looking for original solutions to these problems