supercomputing at cu boulder with janus
TRANSCRIPT
Research Computing @ University of Colorado Boulder
Supercomputing at
CU Boulder with
Janus
Peter Ruprecht www.rc.colorado.edu
Senior HPC Analyst CU Research Computing Group
1 03/12/13 24x7 event
Research Computing @ University of Colorado Boulder
How Janus came to be
• National Science Foundation grant to CU/NCAR
group.
• CU provided matching funds and a data center.
• June 2010 – assembled and tested by Dell in
Austin. Number 31 on the Top-500 list.
• Installed in Boulder in Fall 2010.
• Currently operated by CU-Boulder Research
Computing Group.
• Now 164th-fastest in the world.
03/12/13 24x7 event 2
Research Computing @ University of Colorado Boulder
Vital statistics • 152 teraflops (trillion operations per second) sustained.
• 1368 nodes in 17 racks. 12 cores per node => 16416 cores.
• 40 Gb/s Infiniband network interconnect.
• 800 TB of high-speed scratch disk space.
03/12/13 24x7 event 3
Research Computing @ University of Colorado Boulder
CPU performance details
• Dell C6100 chassis holds 4 nodes.
• 2 CPU sockets per node; Intel “Westmere”
processors at 2.8 GHz; 6 cores per socket.
• 24 GB RAM per node; 2 GB per core.
• Diskless – OS image loaded into RAM.
• RedHat Enterprise Linux.
03/12/13 24x7 event 4
Research Computing @ University of Colorado Boulder
Network interconnect details
• Key to parallel performance!
• Quad-Data-Rate Infiniband (“40” Gb/s).
• Node-to-node latency of about 1-3 microseconds.
• Nonblocking – Fat Tree topology.
• Message passing interface (MPI) with Remote
Data Memory Access.
• Bulky copper cabling to each node is a challenge
in terms of space and airflow.
• 3 racks of core Infiniband switches, plus 4-5U of
distribution switches per compute rack.
03/12/13 24x7 event 5
Research Computing @ University of Colorado Boulder
Storage details
• Lustre parallel
filesystem.
• Main storage is about
600x 2 TB SATA drives.
800 TB total usable.
• 12-15 GB/s total
throughput.
• Connected to nodes via
Infiniband network.
• 2 racks DDN ExaScaler.
03/12/13 24x7 event 6
Research Computing @ University of Colorado Boulder 03/12/13 24x7 event 7
nodes200-280
CINC
A
C
E
G
CONTAINER
TR193
Internet
MDF
University of Colorado at Boulder
Research Computing / JANUS SuperComputer
Network Infrastructure
Management Stack (Dell)
Created by: Conan Moore
Senior Network Engineer
Office of Information Technology &
Research Computing
University of Colorado at Boulder
303-735-5675
Campus switch
Campus router
RC switch/router (Arista)RC
MDS 9124
MDS 9124
MDS 9124
MDS 9124
IB leaf switches (Mellanox)
IB core switches (Mellanox)
monitoring
l
Campus Network (1g)
Research Network backbone (10g)
Infiniband Fabric (QDR)
1
3
6
8
10
12
u2768
u2597
FIBER
Compute node
Management/IPMI (1g)
12
13
14
15
12
13
14
15
1
1
1
3
3
14
12
10
10
10
6
8
5
6
8
7
2
2
4
6
8
12
13
14
2.16
nodes100-180
2.19
2.18
2.17
2.20
2.21
2.23
2.25
2.24
2.26
2.27
2.28
2.29
2.30
2.22
2.31
2.32
2.33
2.34
nodes700-780
nodes500-580
nodes600-680
nodes400-480
nodes300-380
nodes900-980
nodes800-880
nodes1100-1180 nodes
1000-1080
nodes1200-1280
nodes1300-1380
nodes1600-1680
nodes1500-1580
nodes1400-1480
nodes1700-1780
campus
Research Computing @ University of Colorado Boulder
Jobs and scheduling
• 1150 total users; 200 active users.
• 20,000 – 50,000 jobs per month.
• 10M – 11M core hours delivered per month.
• Prefer “short/wide” jobs.
• Resource manager: Torque.
• Batch scheduler: Moab.
• Normally achieve 80-94% efficiency.
03/12/13 24x7 event 8
Research Computing @ University of Colorado Boulder
Research on Janus
• Molecular dynamics and molecular biology
• High energy physics
• Genetics/Genomics
• Materials science and engineering
• Astrophysics
• Architecture and design
• Environmental science
• Atmospheric physics and weather
• Economics
• Computer science
• Etc, etc, etc!
03/12/13 24x7 event 9
Research Computing @ University of Colorado Boulder
Research: Fluid dynamics
03/12/13 24x7 event 10
Computation and animation by: Greg Salvesen, JILA
Research Computing @ University of Colorado Boulder
Research: Astrophysics
03/12/13 24x7 event 11
Computation and animation by: Sam Skillman, CASA
Research Computing @ University of Colorado Boulder
Thank you!!
Peter Ruprecht
CU-Boulder Research Computing Group
www.rc.colorado.edu
03/12/13 24x7 event 12