hiroshi nakashima academic center for computing and media studies kyoto university

19
Combining the Power of Computer and Computational Sciences to Fly to Peta-Scale — a Case Study — Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University special thanks to: Y. Omura & H. Usui (RISH, Kyoto U.)

Upload: pepin

Post on 13-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Combining the Power of Computer and Computational Sciences to Fly to Peta-Scale — a Case Study —. Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University. special thanks to: Y. Omura & H. Usui (RISH, Kyoto U.). Contents. Introduction: Combining CS 2 Power - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Combining the Power of Computer and Computational

Sciences to Fly to Peta-Scale

— a Case Study —

Hiroshi NakashimaAcademic Center for Computing and Media Studies

Kyoto University

special thanks to: Y. Omura & H. Usui (RISH, Kyoto U.)

Page 2: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Contents Introduction: Combining CS2 Power

Why Need to Fly to Peta-Scale? What Kind of Power to Be Combined?

Case Study: Plasma Simulation on DM Systems Why Plasma Simulation? Why for DM Systems ? How for DM Systems ? How Efficient ?

Fly from Case Study Took off Successfully? How Can We Fly Higher?

Conclusions

Page 3: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Contents Introduction: Combining CS2 Power

Why Need to Fly to Peta-Scale? What Kind of Power to Be Combined?

Case Study: Plasma Simulation on DM Systems Why Plasma Simulation? Why for DM Systems ? How for DM Systems ? How Efficient ?

Fly from Case Study Took off Successfully? How Can We Fly Higher?

Conclusions

Page 4: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Why Need to Combine CS2 Power ?

Fly to Peta: How High? (1/2)T2K Open Supercomputer in KyotoRpeak/Rmax=61.2/50.5TFLOPS (#34)

core: (mul+add) x 2 + (L1+L2)

socket: core x 4 + L3

node: (socket + mem.bank) x 4 + IB x 4

node group: node x 6 + 24p-sw x 2

system: node-group x 70 + 288p-sw x 6 + + + ...

already large enough(16 x 416 nodes = 6656 cores)

already layered deeply &complicatedly enough(coresocketnode node-groupsystem)

Page 5: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Why Need to Combine CS2 Power ?

Fly to Peta: How High? (2/2)T2K Open Supercomputer in KyotoRpeak/Rmax=61.2/50.5TFLOPS (#3?)

core: (mul+add) x 2 + (L1+L2)

socket: core x 4 + L3

node: (socket + mem.bank) x 4 + IB x 4

node group: node x 6 + 24p-sw x 2

system: node-group x 70 + 288p-sw x 6 + + + ...

already large enough(16 x 416 nodes = 6656 cores)

already layered deeply &complicatedly enough(coresocketnode node-groupsystem)

Peta-scale system should be; much larger (1,000,000 cores 6656 x 150) much more deeply/complicatedly layered

(corecore-groupsocketsocket-groupnode node-groupnode-supergroupsystem)

Page 6: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Why Need to Combine CS2 Power ?

Fly to Peta: How High? (2/2)

BTW, how large is Peta? 1 Peta meter > 100 light-year 1 Peta second > 30 million year 1 Peta kg > 1/2 x Deimos 1 Peta Hz > violet

Peta-scale system should be; much larger (1,000,000 cores 6656 x 150) much more deeply/complicatedly layered

(corecore-groupsocketsocket-groupnode node-groupnode-supergroupsystem)

Page 7: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Why Need to Combine CS2 Power ?

What Are Combined to Fly?Computational scientists have deep knowledge of; physics, chemistry, biology, ... their own problems, algorithms, programs, ... (sometimes) their own supercomputers

Computer scientists have deep knowledge of; a wide variety of computers, software, tools, ... a wide variety of algorithms, techniques, tricks, ... (sometimes) a few of scientific problems

much more efficient wayto fully exploit peta-scalecomputing power

more Nature/Science papersand chance to win Nobel Prize

chance to co-author a Nature/Science paper and to attendNobel Prize Ceremony

Computational scientists have deep knowledge of; physics, chemistry, biology, ... their own problems, algorithms, programs, ... (sometimes) their own supercomputersand (often?) have Nature/Science papers

Computer scientists have deep knowledge of; a wide variety of computers, software, tools, ... a wide variety of algorithms, techniques, tricks, ... (sometimes) a few of scientific problemsbut never dream to author a Nature/Science paper

Page 8: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Contents Introduction: Combining CS2 Power

Why Need to Fly to Peta-Scale? What Kind of Power to Be Combined?

Case Study: Plasma Simulation on DM Systems Why Plasma Simulation? Why for DM Systems ? How for DM Systems ? How Efficient ?

Fly from Case Study Took off Successfully? How Can We Fly Higher?

Conclusions

Page 9: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Case Study: Plasma Simulation on DM

Why Plasma Simulation ?

power/money hungrylarge scale (128cores,1TB, 1.28TFlops) sharedmemory nodes

A big user group of plasma simulationinsisted that our new system shouldinclude this power/money hungrysubsystem for their memory hungrySM-parallel application.

I failed to persuade them to buildOpen-Supercomputer-only system.So I swore revenge on them by codinga much more efficient DM-parallelprogram to run on Open Supercomputer.

Page 10: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Case Study: Plasma Simulation on DM

Why for DM Systems ? (1/2)a large number of(e.g. > 1 billion)charged particles

a large scale (e.g. 2000x2000x2000 grid)electromagnetic field (e.g. magnetosphere)

simulate particle movement by

Page 11: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Case Study: Plasma Simulation on DM

Why for DM Systems ? (2/2) particle parallelization

(only)

very simple esp. on SM

#particlememory short in SM#grid-pointmemory short even in DM

Page 12: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

33

120010

03 13 23

01

32

3010

3303

Case Study: Plasma Simulation on DM

How for DM Systems ? (1/3)

primary subspaces secondary subspaces

uniform block decomposition well-balanced :

#particle-in-subspace #p / #nodes (1 + )

simulate primary particlesneighboring comm. only

each node helps anothernode having dense subspace

balanced #particles balanced subspace size simple boundary comp/comm well-balancedstable ss ass.

13 23

02 12 22 32

01 11 21 31

00 10 20 30

02 22

11 21

00 20

03 1120310123

0230331332

22

31 21

OhHelp:One-handed Help

Page 13: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Case Study: Plasma Simulation on DM

How for DM Systems ? (2/3)

33 00 32 01 30 10 13 03 23 20 31 02 11 21 12 22

Secondary Space Assignment

move p from heaviestto lightest so thatlightest has av. #p

av. #p

give p even if becomingless than averageget from somebody

afterward

Page 14: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Case Study: Plasma Simulation on DM

How for DM Systems ? (3/3)

33

00

32

01

30

10

13

03

23

2031

02

11

21

12

22

must have all primaries cover secondaries up to

well-balancing limit

must have all primariesnot covered by children

cover secondaries up towell-balancing limit

check recursively fromleaves to root

OK if no overflow detected

Well-Balancing Check with Primary/Secondary Tree

Page 15: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

0

20

40

60

80

100

120

0 32 64 96 128# of processes

106̂ p

arti

cle/

sec

Case Study: Plasma Simulation on DM

How Efficient ? performance @ 16-128 proc on HPC2500

x3.20

x11.71

x8.76

balanced

unbalanced

original

x10.7

T2K Open Supercomputer4 nodes (64 cores)

x1.66

x4.02

Page 16: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Contents Introduction: Combining CS2 Power

Why Need to Fly to Peta-Scale? What Kind of Power to Be Combined?

Case Study: Plasma Simulation on DM Systems Why Plasma Simulation? Why for DM Systems ? How for DM Systems ? How Efficient ?

Fly from Case Study Took off Successfully? How Can We Fly Higher?

Conclusions

Page 17: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Fly from Case Study

Took off Successfully ?Plasma simulation group now; appreciates OhHelp and Open Supercomputer

(but not published Nature/Science papers yet ) is planning to port codes to Open Supercomputer.

We supercomputer guys now; are happy with accomplishing the revenge. are generously pursuing cooperative research with

them (hoping at least to have a SC paper )

Plasma simulation group now; appreciates OhHelp and Open Supercomputer

(but not published Nature/Science papers yet ) is planning to port codes to Open Supercomputer. hopes our help in recoding a variety of simulators.

We supercomputer guys now; are happy with accomplishing the revenge. are generously pursuing cooperative research with

them (hoping at least to have a SC paper ) but cannot find time to do everything they want.

Page 18: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Fly from Case Study

How Fly Higher ? Plasma guys have a large variety of simulators.

We supercomputer guys have OhHelp which needsto be adapted to each simulator by modifying notonly itself but also the simulator.

Parallelization Method Librarygenerated from method skeleton AP specific stuband linked to simulators.

Plasma guys have a wide variety of simulators. Other guys have other varieties of other simulators.

We supercomputer guys have OhHelp which needsto be adapted to each simulator by modifying notonly itself but also the simulator.

Expectedly we will find other computer-scientifictricks for other types of simulators.

Page 19: Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University

Conclusions Flying to Peta-scale needs CS2 collaboration

offering various (non-numerical) tricks from computer guys.

taking opportunity to play in larger and real-world application field from computational guys.

Took off from OhHelp simple but efficient load balancing for plasma simulations. (non-numerical) computer-scientific tricks can greatly

improve numerical simulations. fly higher by parallelization method libraries.

Other ways to elevate adaptation of linear equation solvers to applications w.r.t.

memory layout. parallel script programming language for large parameter

space exploration.