![Page 1: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/1.jpg)
China’s HPC development: a brief review and perspectives
Depei Qian Beihang University/Sun Yat-sen University
International Symposium on Impact of extreme scale computing Tokyo, Japan Nov. 2, 2017
![Page 2: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/2.jpg)
Outline
• A Brief review • The New HPC key project in China • Issues in exascale system development
![Page 4: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/4.jpg)
Three 863 key projects on HPC
• 2002-2005:High Performance Computer and Core Software – Research on resource sharing and collaborative work – Grid-enabled applications in multiple areas – TFlops computers and China National Grid (CNGrid) testbed
• 2006-2010:High Productivity Computer and Grid Service Environment – High productivity
• Application performance • Efficiency in program development • Portability of programs • Robust of the system
– Emphasizing service features of the HPC environment – Developing peta-scale computers
• 2010-2016:High Productivity Computer and Application Service Environment – Developing 100PF computers – Developing large scale HPC applications – Upgrading of CNGird
![Page 5: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/5.jpg)
High performance Computers • 2013: Tianhe-2
– CPU+MIC Heterogeneous accelerated architecture
– 54.9 PF peak, 33.9 PF Linpack, No. 1 in Top500 for 6 times from 2013 to 2015
– Installed at the National Supercomputing Center in Guangzhou
– Will be upgraded to 100PF this year
• 2016: Sunway TaihuLight – Implemented with home-grown
Shenwei many-core processors, 10 million cores in total
– 125 PF peak, 93 PF Linpack, No. 1 in Top500 in June and Nov. of 2016
– Installed at the National Supercomputing Center in Wuxi
Sunway Bluelight Tianhe-2
![Page 6: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/6.jpg)
6
Items Tianhe-2 Tianhe-2A
Nodes & Performance
16000 nodes with Intel CPU + KNC
17792 nodes with Intel CPU + Matrix-2000
54.9Pflops 94.97Pflops
Interconnection 10Gbps, 1.57us 14Gbps, 1us
Memory 1.4PB 3.4PB
Storage 12.4PB, 512GB/s 19PB, 1TB/s
Energy Efficiency 17.8MW, 1.9Gflops/W About 18MW, >5Gflops/W
Heterogeneous software MPSS for Intel KNC OpenMP/OpenCL for Matrix-
2000
Tianhe-2 upgrade
![Page 7: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/7.jpg)
Matrix-2000 accelerator
Chip specification
– 4 super-nodes (SN) – 8 clusters per SN – 4 cores per cluster – Core
• Self-defined 256-bit vector ISA • 16 DP flops/cycle per core
– Peak performance: [email protected]
– Peak power dissipation: ~240w – Interface
• 8 DDR4-2400 channels • X16 PCIE 3.0 EP Port
7
4 SNs x 8 clusters x 4cores x 16 flops x 1.2 GHz = 2.4576 Tflops
SN3C C C CCluster 0
C C C CCluster 1
C C C CCluster 2
C C C CCluster 3
C C C CCluster 4
C C C CCluster 5
C C C CCluster 6
C C C CCluster 7
On chip interconnection
PCIE DDR4 DDR4 DDR4 DDR4
SN0C C C CCluster 0
C C C CCluster 1
C C C CCluster 2
C C C CCluster 3
C C C CCluster 4
C C C CCluster 5
C C C CCluster 6
C C C CCluster 7
SN1C C C CCluster 0
C C C CCluster 1
C C C CCluster 2
C C C CCluster 3
C C C CCluster 4
C C C CCluster 5
C C C CCluster 6
C C C CCluster 7
SN2C C C CCluster 0
C C C CCluster 1
C C C CCluster 2
C C C CCluster 3
C C C CCluster 4
C C C CCluster 5
C C C CCluster 6
C C C CCluster 7
![Page 8: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/8.jpg)
Compute Nodes
Heterogeneous Compute Nodes
C PU
C PU
Q PI
PC HD M I
16X PCIE
IPM B
CPLD
16X PCIE
G b LA NCom m . Port
N IC
G E
16X PCIE
MT-2000
DDR4
MT-2000
DDR4
– Intel Xeon CPU x2
– Matrix-2000 x2
– Memory:192GB
– Interconnection:14G proprietary network
– Peak performance: 5.34Tflops
![Page 9: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/9.jpg)
HPC environment
• 2016 – China National Grid,
composed of 17 national supercomputing centers and HPC centers, world leading class computing resources
![Page 10: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/10.jpg)
HPC applications
• 2016 – HPC applications in many domains – 10-million core parallelism reached, Gordon Bell Prize in 2016 – Developed a number application software, adopted by production
systems • aircraft design • high speed train design • oil & gas exploration • new drug discovery • ensemble weather forecasting • bio-information • car development • design optimization of large fluid machinery • electromagnetic computation • …
![Page 11: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/11.jpg)
Problems identified
• Lack of the long-term national program for high performance computing
• Weak in kernel HPC technologies – processor/accelerator – novel devices (new memory, storage, and network) – large scale parallel algorithms and programs
implementation • Application software is the bottleneck
– applications rely on imported commercial software • expensive • small scale parallelism • restricted by export regulation
• Shortage in cross-disciplinary talents – No enough talents with both domain and IT knowledge
• Lack of multi-disciplinary collaboration
![Page 13: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/13.jpg)
Reform of research system in China
• The national research and development system is undergoing a reform – 100+ different national R&D programs/initiatives
are merged into 5 tracks of national programs • Basic research program (NSFC) • Mega-science and technology programs • Key R&D program (former 863, 973, enabling
programs) • Enterprise innovation program • Facility/talent program
![Page 14: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/14.jpg)
A New key project on HPC
• High performance computing has been identified as a priority subject under the key R&D program (track 3)
• Strategic studies and planning have been conducted since 2013
• A proposal on HPC in the 13th five-year plan was submitted in early 2015
• The key R&D project was approved in Oct. 2015 by a multi-government agency committee led by the MOST
![Page 15: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/15.jpg)
Motivations
• The key value of exascale computers identified – Addressing the grand challenge problems
• Energy shortage, pollution, climate change… – Enabling industry transformation
• supporting development of important products – high speed train, commercial aircraft, automobile…
• promoting economy transformation – For social development and people’s benefit
• new drug discovery, precision medicine, digital media… – Enabling scientific discovery
• high energy physics, computational chemistry, new material, astrophysics…
• Promote computer industry by technology transfer • Developing HPC systems by self-controllable technologies
– a lesson learnt from the recent embargo regulation
![Page 16: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/16.jpg)
Major tasks
• Exa-scale computer development – R&D on novel architectures and key technologies of the exa-scale
computer – Developing the exa-scale computer based on home-grown
processors – Technology transfer to promote development of high-end servers
• HPC applications development – Basic research on exa-scale modeling methods and parallel
algorithms – Developing high performance application software – Establishing the HPC application eco-system
• HPC environment development – Developing software and platform for national HPC environment – Upgrading of the national HPC environment CNGrid – Developing service systems on the national HPC environment
• Each task will cover basic research, key technology development, and application demonstration
![Page 17: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/17.jpg)
• Basic research – Novel high performance interconnect
• Theoretical work on the novel interconnect – based on the enabling technologies of 3D chips, silicon
photonics and on-chip networks – Programming & execution models for exa-scale
systems • new programming models for heterogeneous systems • Improving programming efficiency
Task 1: Exa-scale computer development
![Page 18: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/18.jpg)
Task 1: Exa-scale computer development
• Key technology – prototype systems for verifying the exa-scale system
technologies • 3 typical applications to verify the design
– exa-scale computer technologies • architecture optimized for multi-objectives • high efficient computing node • high performance processor/accelerator design • exa-scale system software • scalable interconnect • parallel I/O • exa-scale infrastructure • energy efficiency • exa-scale system reliability
![Page 19: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/19.jpg)
Task 1: Exa-scale computer development
• Exa-scale computer development – exaflops in peak – Linpack efficiency >60% – 10PB memory – EB storage – 30GF/w energy efficiency – interconnect >500Gbps – large scale system management and resource
scheduling – easy-to-use parallel programming environment – system monitoring and fault tolerance – support large scale applications
![Page 20: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/20.jpg)
Task 2: HPC application development
• Basic research – computable modeling and computational
methods for exa-scale systems – scalable highly efficient parallel algorithms
and parallel libraries for exa-scale systems • Key technology
– programming framework for exa-scale software development
![Page 21: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/21.jpg)
Task 2: HPC application development
• Application software – Numerical devices
• numerical nuclear reactor • numerical aircraft • numerical earth system • numerical engine
– high performance domain application software • complex engineering project and critical equipment • numerical simulation of ocean • design of energy-efficient large fluid machineries • drug discovery • electromagnetic environment simulation • ship design • oil exploration • digital media rendering
– high performance application software for research • material science • high energy physics • astrophysics • life science
![Page 22: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/22.jpg)
Task 2: HPC application development
• HPC application software development – establishing a national-level R&D center for HPC
application software – build up of a platform for HPC software development
and optimization – tools for performance/energy efficiency and pre-
/post-processing – build up software resource repository – developing typical domain application software
– a joint effort involving national supercomputing
centers, universities, and institutes
![Page 23: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/23.jpg)
Task 3: HPC environment development
• Basic research – models and architecture for computational
services – virtual data space
• Key technology – mechanism and platform for the national HPC
environment, providing technical support for service–mode operation
– upgrading the national HPC environment (CNGrid)
![Page 24: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/24.jpg)
Task 3: HPC environment development
• Services – integrated business platform, e.g.
• complex product design • HPC-enabled EDA platform
– application villages • innovation and optimization of industrial products • drug discovery • SME computing and simulation platform
– platform for HPC education • provide computing resources and services to
undergraduate and graduate students
![Page 25: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/25.jpg)
Projects supported
• The first call for proposal was issued in Feb. , 2016. 19 projects supported
• The second call was issued in Oct., 2016, 18 projects supported, mainly application software
• The third round of call was issued in Oct. 2017, the review process will begin soon.
![Page 26: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/26.jpg)
Sugon exa-prototype: specification
metrics prototype exascale ratio
Computing Node peak (TF) 10 32 3.2 No. of nodes 512 ≤32768 64
No. silicon-unit 6 ≤ 384 64 System peak (PF) 5.12 ≥1024 200
storage Memory (PB) 0.065 ≥ 10 153.8 Storage (PB) 10 ≥ 100 10
network Silicon-switch 6 ≤ 384 64 Dim. global net 2*1*3 ≤ 8*8*6 4*8*2 Dim. local net 2*3*2 2*3*2 1
Power consum
Power consumption
0.5 ≤ 30 60
Energy efficiency (GF/W)
10.24 ≤ 34.13 3.33
size W*D*H (m) 6*6*6 ≤ 24*24*6 16 Total cabinets 27 ≤ 400 25
![Page 27: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/27.jpg)
Sugon exa-prototype: general design
• Computing sub-system – home-grown X86 processor + DCU accelerator in
2019 – CPU > 1TF, DCU > 15TF
• Network sub-system – 400Gbps 6D-torus, 384 routers
• Storage sub-system – Distributed storage architecture, extensible to EB
• Infrastructure sub-system – Immersive phase-change cooling – High voltage DC power supply – Hierarchical 3D assembly
• Software sub-system – Mature and complete libs and programming tools – Light-weight virtualization and software-defined architecture
![Page 28: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/28.jpg)
Sugon exa-prototype: hierarchical 3D structure
层次 每单元节点数 原型机单元数 E级机单元数
Node pair 2 256 16384
Super node 16 32 2048
Silicon block 96 6 384
Silicon cubic 512 1 1
![Page 29: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/29.jpg)
Node:2 CPU and 2 DCU CPU and DCU interconnected by
GOP high speed bus Memory bandwidth: 2667 Mbps, DDR4 Memory capacity ≥128G DDR4 Interconnect: 200Gbps fast Fabric
DCU0
DCU1
CPU0
CPU1
CPU2
CPU3
16x GOP*2
U/R/LR DDR4 DIMMs
XGKR*2
Pcle 16x
XGKR*2
Pcle 16x
16x GO
P*2
16x GOP*2
U/R/LR DDR4 DIMMs
BIOS BIOS
BIOS BIOS
DCU2
DCU3
U/R/LR DDR4 DIMMs
16x GO
P*2
SATA Pcle 4x
SATA/ Pcle 4x M.2 M.2 M.2 M.2
XGKR*2
Pcle 16x
XGKR*2
Pcle 16x
U/R/LR DDR4 DIMMs
16x GOP*2
16x GOP*2
SATA Pcle 4x
16x GO
P*2
SATA/ Pcle 4x
16x GO
P*2
Midplane
2X200G NIC
AIU
Sugon exa-ptototype: Computing node
![Page 30: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/30.jpg)
Tianhe exa-prototype: flexible architecture
• Reconfigurable flexible architecture, meet the requirement of different applications
• Virtualized OS, provide a configurable computing environment • Software-defined interconnect, guarantee bandwidth and fault
isolation • Hierarchical storage QoS guarantee technology, providing stable
and independent storage bandwidth • Dynamic optimization providing architecture-aware optimization
Computing sub-system
IO storage sub-system
OS
runtime
compiler
application
Computing node
![Page 31: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/31.jpg)
Tianhe exa-prototype: technical route
31
performance
Energy efficiency Easy to use
Many-core
Special purpose accelerator
customized
General purpose many-core is adopted by the prototype
![Page 32: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/32.jpg)
Tianhe exa-prototype: technical features
• Flexible architecture to meet the requirement of different applications
• New generation many-core processor, pursuing balanced computing and memory access
• Optoelectronic integrated high speed interconnect, greatly improved performance and energy efficiency
• Fault-tolerance based on new storage medium • Accurate heat dissipation, tradeoff between the
manufacture cost and the operational cost
![Page 33: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/33.jpg)
Tianhe exa-prototype: interconnect
• High-radix router for low power consumption, low cost and high desity
• Exascale communication need: single node > 400Gbps • Chip power budget <200W, at most 12 ports of 400 Gbps • Co-design of ultra short distance Serdes PHY, PHY coding,
and link layer • Optoelectronic integration for interconnect
33
![Page 34: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/34.jpg)
Sunway exa-prototype: hardware system
直流供电系统
水冷机组
二级胖树互连结构
强化换热冷板组装节点 新一代众核处理器
运算机仓
• System composed of computing, interconnect, storage, power supply and cooling
• New generation many-core based system,512 nodes,performance >4PFlops
• Self-developed network chip, fat-tree interconnect, point to point bandwidth > 200Gbps
• Storage subsystem based on Shenwei storage server
• Self-developed high voltage (300V) DC power supply
• High efficient water-cooling, enhanced heat transfer copper cold plate
![Page 35: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/35.jpg)
Sunway exa-prototype: computing node
MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
DDR3MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
DDR3
MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
MEMMEMMEMMEMMEMMEM
DDR3 DDR3
以太网
PCI-E
高速计算网网络接口
以太管理网网络接口
核组0 核组1
核组2 核组3
时钟管理
处理器管理
电源管理
节点监测
BM C
DDR4 DDR4
DDR4 DDR4
• Connection to the interconnect:2 X 25GbpsX4
• Point to point one-way bandwidth:200Gbps
• Peak performance: >8TFlops
• memory:> 64GB
![Page 36: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/36.jpg)
Sunway exa-prototype: software system
• Basic software for home-grown many-core processor – parallel OS – high performance storage
management system – parallel compiler – parallel program development
environment • High efficient compiler for
heterogeneous many-core • SIMD auto-vectorization • High performance basic math libs • Integrated multi-domain OS for
heterogeneous many-core • Dynamic storage management • Supporting MPI-1、MPI-2、MPI-3、
OpenMP3.0, compatible OpenACC2.0 • Debugger for heterogeneous many-
core
![Page 37: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/37.jpg)
Sunway exa-prototype: demo applications
Ocean model Aircraft design
seismic Floating platform design
• Porting applications on TaihuLight, performance optimization is being conducted
![Page 38: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/38.jpg)
2016 Fully Implicit Solver for
Atmospheric Dynamics
Surface Wave Modeling
Phase Field Simulations of Coarsening Dynamics
Atomistic Simulation of Silicon Nanowires
Run-away Electron Trajectory Simulation
Genome Functional Annotation and Homeotic Gene Building Spacecraft CFD Numerical
Simulation
2017 Extreme-scale Graph Processing
Framework
Simulation of Planetary Rings
Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent
Materials cryo-EM Macromolecule Structure Determination
Redesigning CAM-SE
Nonlinear Earthquake Simulation
Sunway exa-prototype: applications
10-Million core applications on TaihuLight
![Page 40: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/40.jpg)
Major Challenges to exa-scale systems
• Power consumption • Performance obtained by applications • Programmability • Resilience
• How to make tradeoffs between performance,
power consumption, and programmability? • How to achieve continuous no-stop operation? • How to adapt to a wide range of applications
with reasonable efficiency?
![Page 41: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/41.jpg)
Architecture • Novel architectures beyond the current
heterogeneous accelerated/manycore-based expected
• Co-processor or partitioned heterogeneous architecture? – Low utilization of the co-processor in some
applications, using CPU only – Bottleneck in moving data between CPU
and co-processor • Application-aware architecture
– on-chip integration of special purpose units (idea from Prof. Andrew Chien)
– using the right tool to do the right things – dynamic reconfigurable? how to program?
![Page 42: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/42.jpg)
Memory system
• Pursuing large capacity, low latency, high bandwidth
• Increase capacity and lower power consumption by using DRAM/NVM together – Data placement issue
• Improving bandwidth and latency by using the 3D stacking technology
• Reduce the data move by placing the data closer to processing – HBM/HMC near processor – On-chip DRAM – Simple functions in memory
• Reduce data copy cost by using unified memory space in heterogeneous architecture
![Page 43: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/43.jpg)
Interconnect • Pursuing low latency, high
bandwidth and low energy consumption
• Adopt new technologies – silicon photonics communication
between components – optical interconnect /
communication – miniature optical devices
• High scalability adapting to exa-scale system interconnect requirement – Connecting 10,000+ nodes – Low-hop, low-latency topology – Reliable and intelligent routing
![Page 44: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/44.jpg)
Programming the heterogeneous systems
• Addressing the issues in programming the heterogeneous parallel systems – efficient expression of the parallelism, dependence, data
sharing, execution semantics – problem decomposition appropriate for heterogeneous
systems • Improving programming by means of a holistic approach
– new programming models – programming languages extension and compiler – parallel debugging – runtime support and optimization – architectural support
![Page 45: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/45.jpg)
Computational models and algorithms
• Full-chain innovation – mathematical methods – computer algorithms – algorithm implementation and
optimization • A good mathematical method is
often more effective than hardware improvement and algorithm optimization
• Architecture-aware algorithm implementation and optimization is necessary for heterogeneous systems
• Domain-specific libraries for improving software productivity and performance
![Page 46: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/46.jpg)
Resilience
• Resilience is one of the key issues of the exa-scale system – Large scale of the system
• 50K to 100K nodes • Huge amount of components • Very short MTBF • Long time non-stop operation required for solving large scale
problems
• Reliability measures at different levels required, including device, node, and system levels
• Software / hardware coordination is necessary – fast context saving and recovery for checkpointing in case of
short MTBF – fault-tolerance at the algorithm and application software level
![Page 47: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/47.jpg)
Importance of the tools
• Development and optimization of large scale parallel software require scalable tools
• Particularly important for systems implemented with home-grown processors – current commercial and research tools do not
support • Three kinds of default tools required
– Parallel debugger for correctness – Performance tuner for performance – Energy optimizer for energy efficiency
![Page 48: China’s HPC development: a brief review and … · Depei Qian Beihang University ... Three 863 key projects on HPC • 2002-2005:High Performance Computer and Core Software](https://reader035.vdocuments.site/reader035/viewer/2022062909/5b8094c27f8b9a35788d9a26/html5/thumbnails/48.jpg)
Urgent need for eco-system
• The eco-system for exa-scale system based on home-grown processors is in a urgent need – languages, compilers, OS, runtime – tools – application development support – application software
• Need to attract the hardware manufacturers and the third party software developers – product family instead of a single machine
• Collaboration between industry, academia and end-users required