supercomputing at cu boulder with janus

12
Research Computing @ University of Colorado Boulder Supercomputing at CU Boulder with Janus Peter Ruprecht www.rc.colorado.edu Senior HPC Analyst CU Research Computing Group 1 03/12/13 24x7 event

Upload: others

Post on 12-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Research Computing @ University of Colorado Boulder

Supercomputing at

CU Boulder with

Janus

Peter Ruprecht www.rc.colorado.edu

Senior HPC Analyst CU Research Computing Group

1 03/12/13 24x7 event

Research Computing @ University of Colorado Boulder

How Janus came to be

• National Science Foundation grant to CU/NCAR

group.

• CU provided matching funds and a data center.

• June 2010 – assembled and tested by Dell in

Austin. Number 31 on the Top-500 list.

• Installed in Boulder in Fall 2010.

• Currently operated by CU-Boulder Research

Computing Group.

• Now 164th-fastest in the world.

03/12/13 24x7 event 2

Research Computing @ University of Colorado Boulder

Vital statistics • 152 teraflops (trillion operations per second) sustained.

• 1368 nodes in 17 racks. 12 cores per node => 16416 cores.

• 40 Gb/s Infiniband network interconnect.

• 800 TB of high-speed scratch disk space.

03/12/13 24x7 event 3

Research Computing @ University of Colorado Boulder

CPU performance details

• Dell C6100 chassis holds 4 nodes.

• 2 CPU sockets per node; Intel “Westmere”

processors at 2.8 GHz; 6 cores per socket.

• 24 GB RAM per node; 2 GB per core.

• Diskless – OS image loaded into RAM.

• RedHat Enterprise Linux.

03/12/13 24x7 event 4

Research Computing @ University of Colorado Boulder

Network interconnect details

• Key to parallel performance!

• Quad-Data-Rate Infiniband (“40” Gb/s).

• Node-to-node latency of about 1-3 microseconds.

• Nonblocking – Fat Tree topology.

• Message passing interface (MPI) with Remote

Data Memory Access.

• Bulky copper cabling to each node is a challenge

in terms of space and airflow.

• 3 racks of core Infiniband switches, plus 4-5U of

distribution switches per compute rack.

03/12/13 24x7 event 5

Research Computing @ University of Colorado Boulder

Storage details

• Lustre parallel

filesystem.

• Main storage is about

600x 2 TB SATA drives.

800 TB total usable.

• 12-15 GB/s total

throughput.

• Connected to nodes via

Infiniband network.

• 2 racks DDN ExaScaler.

03/12/13 24x7 event 6

Research Computing @ University of Colorado Boulder 03/12/13 24x7 event 7

nodes200-280

CINC

A

C

E

G

CONTAINER

TR193

Internet

MDF

University of Colorado at Boulder

Research Computing / JANUS SuperComputer

Network Infrastructure

Management Stack (Dell)

Created by: Conan Moore

Senior Network Engineer

Office of Information Technology &

Research Computing

University of Colorado at Boulder

303-735-5675

[email protected]

Campus switch

Campus router

RC switch/router (Arista)RC

MDS 9124

MDS 9124

MDS 9124

MDS 9124

IB leaf switches (Mellanox)

IB core switches (Mellanox)

monitoring

l

Campus Network (1g)

Research Network backbone (10g)

Infiniband Fabric (QDR)

1

3

6

8

10

12

u2768

u2597

FIBER

Compute node

Management/IPMI (1g)

12

13

14

15

12

13

14

15

1

1

1

3

3

14

12

10

10

10

6

8

5

6

8

7

2

2

4

6

8

12

13

14

2.16

nodes100-180

2.19

2.18

2.17

2.20

2.21

2.23

2.25

2.24

2.26

2.27

2.28

2.29

2.30

2.22

2.31

2.32

2.33

2.34

nodes700-780

nodes500-580

nodes600-680

nodes400-480

nodes300-380

nodes900-980

nodes800-880

nodes1100-1180 nodes

1000-1080

nodes1200-1280

nodes1300-1380

nodes1600-1680

nodes1500-1580

nodes1400-1480

nodes1700-1780

campus

Research Computing @ University of Colorado Boulder

Jobs and scheduling

• 1150 total users; 200 active users.

• 20,000 – 50,000 jobs per month.

• 10M – 11M core hours delivered per month.

• Prefer “short/wide” jobs.

• Resource manager: Torque.

• Batch scheduler: Moab.

• Normally achieve 80-94% efficiency.

03/12/13 24x7 event 8

Research Computing @ University of Colorado Boulder

Research on Janus

• Molecular dynamics and molecular biology

• High energy physics

• Genetics/Genomics

• Materials science and engineering

• Astrophysics

• Architecture and design

• Environmental science

• Atmospheric physics and weather

• Economics

• Computer science

• Etc, etc, etc!

03/12/13 24x7 event 9

Research Computing @ University of Colorado Boulder

Research: Fluid dynamics

03/12/13 24x7 event 10

Computation and animation by: Greg Salvesen, JILA

Research Computing @ University of Colorado Boulder

Research: Astrophysics

03/12/13 24x7 event 11

Computation and animation by: Sam Skillman, CASA

Research Computing @ University of Colorado Boulder

Thank you!!

Peter Ruprecht

[email protected]

CU-Boulder Research Computing Group

www.rc.colorado.edu

03/12/13 24x7 event 12