sc19: the most significant bits
TRANSCRIPT
Rank System Site Launch Cores Rmax (PF/s)
1 Summit ORNL 2018 2,397,824 143.50
2 Sierra LLNL 2018 1,572,480 94.64
3 Sunway TaihuLight NSC Wuxi 2016 10,649,600 93.02
4 Tianhe-2A NSC Guangzhou 2013 4,981,760 61.44
5 Piz Daint CSCS 2012 387,872 21.23
6 Trinity LANL 2015 979,072 20.16
7 ABCI AIST 2018 391,680 19.88
8 SuperMUC-NG LRZ 2018 305,856 19.48
9 Titan ORNL 2012 560,640 17.59
10 Sequoia LLNL 2012 1,572,864 17.17
Top 500 – November 2018
HPL: Solve dense system of linear equations with LU decomposition
Rank System Site Launch Cores Rmax (PF/s)
1 Summit ORNL 2018 2,414,592 148.60
2 Sierra LLNL 2018 1,572,480 94.64
3 Sunway TaihuLight NSC Wuxi 2016 10,649,600 93.02
4 Tianhe-2A NSC Guangzhou 2013 4,981,760 61.44
5 Frontera TACC 2019 448,448 23.52
6 Piz Daint CSCS 2012 387,872 21.23
7 Trinity LANL 2015 979,072 20.16
8 ABCI AIST 2018 391,680 19.88
9 SuperMUC-NG LRZ 2018 305,856 19.48
10 Lassen LLNL 2018 288,288 18.20
Top 500 – June 2019
Titan (ORNL) and Sequoia (LLNL) drop out of the top 10
Rank System Site Launch Cores Rmax (PF/s)
1 Summit ORNL 2018 2,414,592 148.60
2 Sierra LLNL 2018 1,572,480 94.64
3 Sunway TaihuLight NSC Wuxi 2016 10,649,600 93.02
4 Tianhe-2A NSC Guangzhou 2013 4,981,760 61.44
5 Frontera TACC 2019 448,448 23.52
6 Piz Daint CSCS 2012 387,872 21.23
7 Trinity LANL 2015 979,072 20.16
8 ABCI AIST 2018 391,680 19.88
9 SuperMUC-NG LRZ 2018 305,856 19.48
10 Lassen LLNL 2018 288,288 18.20
Top 500 – November 2019
Titan (ORNL) and K Computer (RIKEN) are decommissioned
Rank System Site Launch Cores Rmax (PF/s)
1 Summit ORNL 2018 2,414,592 148.60
2 Sierra LLNL 2018 1,572,480 94.64
3 Sunway TaihuLight NSC Wuxi 2016 10,649,600 93.02
4 Tianhe-2A NSC Guangzhou 2013 4,981,760 61.44
5 Frontera TACC 2019 448,448 23.52
6 Piz Daint CSCS 2012 387,872 21.23
7 Trinity LANL 2015 979,072 20.16
8 ABCI AIST 2018 391,680 19.88
9 SuperMUC-NG LRZ 2018 305,856 19.48
10 Lassen LLNL 2018 288,288 18.20
47 Gadi (Phase 1) NCI 2019 75,576 4.41
239 Raijin NCI 2013 87,224 1.68
Top 500 – November 2019
Exascale?• Aurora (ANL) 2021
• Cray/Intel
• Sapphire Rapids + Ponte Vecchio
• Frontier (ORNL) 2021
• Cray/AMD
• EPYC + Radeon
• El Capitan (LLNL) 2022
• Cray
• Tianhe-3 (NSC Tianjin) 2020
• NUDT
• ARM-based? + MT-3000 + 400Gb/s
• Shuguang (NSC Shanghai) 2021?
• Sugon
• Licensed AMD EPYC clone
• Liquid immersion
• Sunway? (NSC Jinan) 2021?
• ShenWei (256C)
• No accelerator
• Fugaku (RIKEN) 2021
• Fujitsu
• A64FX (ARM)
• LUMI (CSC Finland) 2020
• Leonardo (CINECA Italy) 2020
• MareNostrum 5 (BSC Spain) 2020
• 1st Gen ARM/RISC-V 2021
• 3 exascale with 2nd Gen 2023
Fujitsu FX1000
• 4 shelves
• 24 blades per shelf
• 2 x 1S nodes per blade
• A64FX 48C CPU
• 32GB HBM2 memory
• No accelerators
• 384 nodes per rack
• 1 PF per rack
23 © 2019 FUJITSU
CMU: CPU Memory Unit
A64FX CPU x2 (Two independent nodes)
QSFP28 x3 for Active Optical Cables
Single-side blind mate connectors of signals & water
~100% direct water cooling
Water
Water
Electrical signals
AOC
QSFP28 (X)
QSFP28 (Y)
QSFP28 (Z)
AOC
AOC
SCAsia2019, March 12
Rank System Site Launch Cores GFLOPS/watt
1 Micro-Fugaku Fujitsu 2019 36,864 16.88
2 NA-1 PEZY 2019 1,271,040 16.26
3 AiMOS RPI 2019 130,000 15.77
4 Satori MIT 2019 23,040 15.57
5 Summit ORNL 2018 2,397,824 14.67
6 ABCI AIST 2018 391,680 14.42
7 MareNostrum P9 CTE BSC 2018 18,360 14.13
8 TSUBAME3.0 GSIC 2017 135,828 13.70
9 PANGEA III Total 2019 291,024 13.07
10 Sierra LLNL 2018 1,572,480 12.72
Green 500 – November 2019
Prototype Fugaku ARM-based system straight into #1 spot
Rank System Site Launch Cores GFLOPS/watt
1 Micro-Fugaku Fujitsu 2019 36,864 16.88
2 NA-1 PEZY 2019 1,271,040 16.26
3 AiMOS RPI 2019 130,000 15.77
4 Satori MIT 2019 23,040 15.57
5 Summit ORNL 2018 2,397,824 14.67
6 ABCI AIST 2018 391,680 14.42
7 MareNostrum P9 CTE BSC 2018 18,360 14.13
8 TSUBAME3.0 GSIC 2017 135,828 13.70
9 PANGEA III Total 2019 291,024 13.07
10 Sierra LLNL 2018 1,572,480 12.72
Green 500 – November 2019
IBM Power9 systems well represented this year
• Build computational cluster with 3kW power budget
• Benchmarks: HPL, HPCG and IO-500
• Applications: 3 known apps; 1 mystery app
• 16 teams of undergraduate or high school students• China, Estonia, Germany, Poland, Singapore, Switzerland, Taiwan, USA
• Tsinghua University (China) 9th Student Cluster Competition win!
• 2 AMD and 14 Intel systems
• 14 systems used NVIDIA V100s (avg 3 per node)
Student Cluster Competition
Tutorials
• Containers
• OpenMP
• Quantum Computing
• Parallel Computing 101
• Better Scientific Software
• Data Compression
• I/O Frameworks
• GPU Programming
• Deep Learning
• MPI
• Performance Analysis
• Performance Tuning
• High Speed Networks
• HPC Procurements
• Managing S/W Complexity
• Secure Coding
Intel CPU
• Cooper Lake
• 14nm; 56 cores; Q2 2020
• 8-channel DDR4
• bfloat16
• Ice Lake
• 10nm; 38 cores; Q3 2020
• 2nd Gen Optane
• PCI Gen 4
• Sapphire Rapids
• 10nm; Late 2021
• PCI Gen 5
Intel GPU
• Xe architecture
• Xe LP; 20W
• Xe HP; 250W
• Xe HPC; 500W
• 1,000s execution units (8T)
• Focus FP64 performance
• Ponte Vecchio
• 7nm; 2021
Intel oneAPI
• Compute Taxonomy
• Scalar – CPU
• Vector – GPU
• Matrix – AI
• Spatial – FPGA
• Aim is to write code once and cross-compile for each target
• Software Stack
• Distributed Parallel C++
• C++ and SYCL
• Domain specific libraries
• MKL, TBB, DNN,…
• Migration tools
• CUDA → oneAPI
• Analysis and Debug tools
• VTune profiler
• Trace analyzer
Dell DSS8440 + Graphcore C2
• 1,216 IPU cores
• 300MB memory
• 45TB/s mem b/w
• 8TB/s internal comms
• 320GB/s inter-chip
• 2 x IPU per C2 card
• 8 x C2 cards per server
• Microsoft Azure
Cerebras
• 21.5cm x 21.5cm wafer
• 400,000 cores (SLAC)
• 18GB memory
• 57x larger than V100
• 3,000x more memory
• 10,000x more mem b/w
• Award for innovative AI hardware