rocks clusters sun hpc consortium november 2004 federico d. sacerdoti advanced cyberinfrastructure...

32
Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Upload: steven-burns

Post on 11-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Rocks Clusters

SUN HPC Consortium

November 2004

Federico D. Sacerdoti

Advanced CyberInfrastructure Group

San Diego Supercomputer Center

Page 2: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Outline

• Rocks Identity• Rocks Mission• Why Rocks • Rocks Design• Rocks Technologies, Services, Capabilities• Rockstar

Page 3: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Identity

• System to build and manage Linux Clusters

General Linux maintenance system for N nodes

Desktops too

Happens to be good for clusters

• Free

• Mature

• High Performance Designed for scientific workloads

Page 4: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Mission

• Make Clusters Easy (Papadopoulos, 00)

• Most cluster projects assume a sysadmin will help build the cluster.

• Build a cluster without assuming CS knowledge

Simple idea, complex ramifications Automatic configuration of all components and services ~30 services on frontend, ~10 services on compute nodes

Clusters for Scientists

• Results in a very robust system that is insulated from human mistakes

Page 5: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Why Rocks

• Easiest way to build a Rockstar-class machine with SGE ready out of the box

• More supported architectures Pentium, Athlon, Opteron, Nocona, Itanium

• More happy users 280 registered clusters, 700 member support list HPCwire Readers Choice Awards 2004

• More configured HPC software: 15 optional extensions (rolls) and counting.

• Unmatched Release Quality.

Page 6: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Why Rocks

• Big projects use Rocks BIRN (20 clusters)

GEON (20 clusters)

NBCR (6 clusters)

• Supports different clustering toolkits Rocks Standard (RedHat HPC) SCE SCore (Single Process Space) OpenMosix (Single Process Space: on the way)

Page 7: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Design

• Uses RedHat’s intelligent installer Leverages RedHat’s ability to discover & configure hardware Everyone tries System Imaging at first

Who has homogeneous hardware? If so, whose cluster stays that way?

• Description Based install: Kickstart Like Jumpstart

• Contains a viable Operating System No need to “pre-configure” an OS

Page 8: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Design

• No special “Rocksified” package structure. Can install any RPM.

• Where Linux core packages come from: RedHat Advanced Workstation (from SRPMS)

Enterprise Linux 3

Page 9: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Leap of Faith

• Install is primitive operation for Upgrade and Patch Seems wrong at first

Why must you reinstall the whole thing?

Actually right: debugging a Linux system is fruitless at this scale. Reinstall enforces stability.

Primary user has no sysadmin to help troubleshoot

• Rocks install is scalable and fast: 15min for entire cluster Post script work done in parallel by compute nodes.

• Power Admins may use up2date or yum for patches. To compute nodes by reinstall

Page 10: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Rocks Technology

Page 11: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Cluster Integration with Rocks

1. Build a frontend node1. Insert CDs: Base, HPC, Kernel, optional Rolls

2. Answer install screens: network, timezone, password

2. Build compute nodes1. Run insert-ethers on frontend (dhcpd listener)

2. PXE boot compute nodes in name order

3. Start Computing

Page 12: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Tech: Dynamic Kickstart File

On node install

Page 13: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Roll Architecture

• Rolls are Rocks Modules Think Apache

• Software for cluster Packaged

3rd party tarballs

Tested Automatically configured

services

• RPMS plus Kickstart graph in ISO form.

Page 14: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Tech: Dynamic Kickstart File

With Roll (HPC)

HPCbase

Page 15: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Tech: Wide Area Net InstallInstall a frontend without CDs

Benefits• Can install from minimal

boot image

• Rolls downloaded dynamically

• Community can build specific extensions

Page 16: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Tech: Security & EncryptionTo protect the kickstart file

Page 17: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Tech: 411 Information Service

• 411 does NIS Distribute passwords

• File based, simple HTTP transport Multicast

• Scalable

• Secure

Page 18: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Rocks Services

Page 19: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Cluster Homepage

Page 20: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Services: Ganglia Monitoring

Page 21: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Services: Job Monitoring

SGE Batch System

Page 22: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Services: Job Monitoring

How a job affects resources on this node

Page 23: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Services: Configured, Ready

• Grid (Globus, from NMI)

• Condor (NMI)

Globus GRAM

• SGE Globus GRAM

• MPD parallel job launcher (Argonne) MPICH 1, 2

• Intel Compiler set

• PVFS

Page 24: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Rocks Capabilities

Page 25: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

High Performance Interconnect Support

• Myrinet All major versions, GM2 Automatic configuration and support in Rocks since first

release

• Infiniband Via Collaboration with AMD & Infinicon

IB IPoIB

Page 26: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Visualization “Viz” Wall

• Enables LCD Clusters One PC / tile Gigabit Ethernet Tile Frame

• Applications Large remote sensing Volume Rendering Seismic Interpretation

• Electronic Visualization Lab Bio-Informatics Bio-Imaging (NCMIR BioWall)

Page 27: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Rockstar

Page 28: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rockstar Cluster

• Collaboration between SDSC and SUN

• 129 Nodes: Sun V60x (Dual P4 Xeon) Gigabit Ethernet Networking (copper) Top500 list positions: 201, 433

• Built on showroom floor of Supercomputing Conference 2003

Racked, Wired, Installed: 2 hrs total Running apps through SGE

Page 29: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Building of Rockstar

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 30: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rockstar Topology

• 24-port switches• Not a symmetric network

Best case - 4:1 bisection bandwidth Worst case - 8:1 Average - 5.3:1

• Linpack achieved 49% of peak• Very close to percentage peak of

1st generation DataStar at SDSC

Page 31: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

Rocks Future Work

• High Availability: N Frontend nodes. Not that far off (supplemental install server design) Limited by Batch System

Frontends are long lived in practice: Keck 2 Cluster (UCSD) uptime: 249 days, 2:56

• Extreme install scaling• More Rolls!• Refinements

Page 32: Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents

www.rocksclusters.org

• Rocks mailing List https://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

• Rocks Cluster Register http://www.rocksclusters.org/rocks-register

• Core: {fds,bruno,mjk,phil}@sdsc.edu