helics petteri johansson & ilkka uuhiniemi. helics cow –amd athlon mp 1.4ghz –512 (2 in same...

13
HELICS Petteri Johansson & Ilkka Uuhiniemi

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

HELICS

Petteri Johansson & Ilkka Uuhiniemi

Page 2: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

HELICS

• COW– AMD Athlon MP 1.4Ghz– 512 (2 in same computing node)– 35 at top500.org– Linpack Benchmark 825 Gflops– COTS -> 1.3M EUROs

Page 3: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

HELICS

• 256 GBytes ECC RAM

• 10 TB local disks

• Myrinet 2000 (fiber)

• 6 switches (128 port)

• Ethernet

• Peak performance 512*2.8GFlops = 1.43TFlops

Page 4: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

Interconnections

– Myrinet 2000– 10 ns latency (one way) 2+2 Gbs Full duplex

bandwidth– bisectional bandwith: 128x (2+2) Gbs

Page 5: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

Additional equipment

• 32 Double node Myrinet cluster for interactive development

• 2 Front End PC as access, compilation, job distribution hosts

• 1 Administration server• 1 Fileserver (Sun Fire 880) + 2 Tbyte Raid 5

diskarray• 10 Tbyte tape backup• remote power control device

Page 6: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

Problems

• Hardware errors: 3 power supplies, 3 hard disks, 2 motherboards, 8 Myrinet network cards

• Software: Kernel 2.4.18 (stable), 2 nodes crash due to daemon crashes

Page 7: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

Clustering

• What is needed?– Booting concept:

• Network boot (dhcp)

– cluster installation• installation via network

– power control• remote access of power supplies, seq. power off/on, reset

– BIOS control• update and setting via network, direct access via serial link

– health control of nodes• fan speed, cpu temp and disk status gathering via network

Page 8: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

Clustering

• reliability of resources– spare hosts, redundant servers

• availability• monitoring & accounting

– gathering system+job status, accounting infos via network

• batching concepts– Score cluster software

Page 9: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

Clustering

• application optimization– tracing + profiling tools (vampir, paraver)

• debugging of parallel applications– Debugger: Totalview, P2D2, PGI

Page 10: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

Software

• SCore Cluster System Software is a high-performance parallel programming environment for workstation and PC clusters

Page 11: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

SCORE

• Heterogeneous Programming Language

• Multiple Programming Paradigms

• Parallel Programming Support– Real-time process activity monitor– Deadlock detection– Automatic debugger attachment

Page 12: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

SCORE

• Fault tolerance– Preemptive checkpoint– Parallel process migration

• Flexible Job Scheduling– Gang scheduling– Batch scheduling

Page 13: HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825

USAGE

• Reactive flows

• Optimization problems

• Technical simulations

• Image processing

• Bio-computing/Bioinformatics