helics petteri johansson & ilkka uuhiniemi. helics cow –amd athlon mp 1.4ghz –512 (2 in same...

Post on 19-Dec-2015

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HELICS

Petteri Johansson & Ilkka Uuhiniemi

HELICS

• COW– AMD Athlon MP 1.4Ghz– 512 (2 in same computing node)– 35 at top500.org– Linpack Benchmark 825 Gflops– COTS -> 1.3M EUROs

HELICS

• 256 GBytes ECC RAM

• 10 TB local disks

• Myrinet 2000 (fiber)

• 6 switches (128 port)

• Ethernet

• Peak performance 512*2.8GFlops = 1.43TFlops

Interconnections

– Myrinet 2000– 10 ns latency (one way) 2+2 Gbs Full duplex

bandwidth– bisectional bandwith: 128x (2+2) Gbs

Additional equipment

• 32 Double node Myrinet cluster for interactive development

• 2 Front End PC as access, compilation, job distribution hosts

• 1 Administration server• 1 Fileserver (Sun Fire 880) + 2 Tbyte Raid 5

diskarray• 10 Tbyte tape backup• remote power control device

Problems

• Hardware errors: 3 power supplies, 3 hard disks, 2 motherboards, 8 Myrinet network cards

• Software: Kernel 2.4.18 (stable), 2 nodes crash due to daemon crashes

Clustering

• What is needed?– Booting concept:

• Network boot (dhcp)

– cluster installation• installation via network

– power control• remote access of power supplies, seq. power off/on, reset

– BIOS control• update and setting via network, direct access via serial link

– health control of nodes• fan speed, cpu temp and disk status gathering via network

Clustering

• reliability of resources– spare hosts, redundant servers

• availability• monitoring & accounting

– gathering system+job status, accounting infos via network

• batching concepts– Score cluster software

Clustering

• application optimization– tracing + profiling tools (vampir, paraver)

• debugging of parallel applications– Debugger: Totalview, P2D2, PGI

Software

• SCore Cluster System Software is a high-performance parallel programming environment for workstation and PC clusters

SCORE

• Heterogeneous Programming Language

• Multiple Programming Paradigms

• Parallel Programming Support– Real-time process activity monitor– Deadlock detection– Automatic debugger attachment

SCORE

• Fault tolerance– Preemptive checkpoint– Parallel process migration

• Flexible Job Scheduling– Gang scheduling– Batch scheduling

USAGE

• Reactive flows

• Optimization problems

• Technical simulations

• Image processing

• Bio-computing/Bioinformatics

top related