gravitational billion body project

1
The Gravitational Billion Body Project The Gravitational Billion Body Project We report on cosmological N-body simulations which run over up to 4 supercomputers across the globe. We achieved to run simulations on 60 to 750 cores distributed over a variety of supercomputers. Regardless of the network latency of 0.32 s and the communication over 30.000 km of optical network cable we are able to achieve up to 92% of the performance compared to an equal number of cores on a single supercomputer. The Application Introduction Results Our cosmological code simulates structure formation in the universe by integrating the gravitational forces between dark matter particles over time. This code, which is based on GreeM 1 , uses Barnes-Hut Tree integration 2 to resolve force inter-actions over short distances, and Particle-Mesh integration 3 to resolve interactions over long distances. We have coupled our code with MPWide 4 to enable simulations across supercomputers. Our code is named SUSHI I . I) SUSHI stands for Simulating Universe Structure formation on Heterogeneous Infrastructures. Authors: Derek Groen Authors: Derek Groen a a , Steven Rieder , Steven Rieder a a , Simon Portegies Zwart , Simon Portegies Zwart a a , , Tomoaki Ishiyama Tomoaki Ishiyama b b , Jun Makino , Jun Makino b b Network Setup All supercomputers have been connected by optical networks. The DEISA network is shared with other users. Communication nodes are shown by the green boxes. particles procs sites time/step comm. time # # # [s] [s] 120 1 71,04 1,05 120 2 61,62 4,59 120 3 56,31 4,78 120 4 70,1 19,26 240 1 272 3,4 240 2 252 21,98 240 3 294,7 31,28 750 2 483,4 46,5 512 3 512 3 512 3 512 3 1024 3 1024 3 1024 3 2048 3 Timing results of our simulations per step averaged over 10 steps can be found in the table. Here, the simulations spend about 90% of the total time on calcutions when performed across 2 or 3 sites. For our experiments we have used one IBM Power6 supercomputer and three Cray-XT4 machines. The IBM resides at SARA in Amsterdam(NL) and the Cray machines reside at EPCC in Edinburgh (UK), CSC in Espoo (FI) and CFCA in Tokyo (JP). Runs over 1 site were performed at SARA only, runs over 2 sites also at EPCC, and runs over 3 sites also at CSC. The 2048 3 run was performed at SARA and CFCA. Timings per step of a 512 3 run over 3 sites. Conclusion We have run cosmological simulations efficiently across multiple supercomputers. The scale of our experiments is constrained by the political overhead of scheduling the application across supercomputers. The use of a meta-scheduler and reservation system that works across sites will enable us to perform long-lasting and large production runs on a grid of supercomputers. Affiliations A. Leiden Observatory, Leiden University, Leiden, the Netherlands. B. Center for Computational Astrophysics, Mitaka, Tokyo, Japan. Acknowledgements This research is supported by the Netherlands organization for Scientific research (NWO) grant #639.073.803, #643.200.503 and #643.000.803, the Stichting Nationale Computerfaciliteiten (project #SH-095-08), NAOJ, SURFNet (GigaPort project), the Netherlands Advanced School for Astronomy (NOVA) and the Leids Kerkhoven-Bosscha fonds (LKBF). We thank the DEISA Consortium (www.deisa.eu), co-funded through the EU FP6 project RI-031513 and the FP7 project RI-222919, for support within the DEISA Extreme Computing Initiative. We thank the network facilities of SURFnet,, IEEAF, WIDE, Northwest Gigapop and the Global Lambda Integrated FAcility (GLIF) GOLE of Trans-Light Cisco on National LambdaRail, TransLight, StarLight, NetherLight, T-LEX, Pacific and Atlantic Wave. The picture in the background is a colorized density plot of the simulation data at redshift z=5.65. We performed the run over 4 sites and the slices are colored to match volumes residing at CFCA, CSC, EPCC and SARA. Higher accuracy tree integration using an opening angle of 0.3. Lower accuracy tree integration using an opening angle of 0.5. References 1. T. Ishiyama, T. Fukushige, and J. Makino, “GreeM : Massively ParallelTreePM Code for Large Cosmological N-body Simulations,” accepted by PASJ. 2. J. Barnes and P. Hut, “A Hierarchical O(NlogN) Force-Calculation Algorithm,” Nature, vol. 324, pp. 446–449, Dec. 1986. 3. R. Hockney and J. Eastwood, “Computer Simulation Using Particles”, New York: McGraw-Hill, 1981. 4. D. Groen, S. Rieder, P. Grosso, C. de Laat and S. Portegies Zwart, “A light- weight communication library for distributed computing,” (submitted to CSD). 120 cores total

Upload: university-college-london

Post on 12-Jul-2015

68 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Gravitational Billion Body Project

The Gravitational Billion Body ProjectThe Gravitational Billion Body Project

We report on cosmological N-body simulations which run over up to 4 supercomputers across the globe. We achieved to run simulations on 60 to 750 cores distributed over a variety of supercomputers. Regardless of the network latency of 0.32 s and the communication over 30.000 km of optical network cable we are able to achieve up to 92% of the performance compared to an equal number of cores on a single supercomputer.

The ApplicationIntroduction

Results

Our cosmological code simulates structure formation in the universe by integrating the gravitational forces between dark matter particles over time. This code, which is based on GreeM1, uses Barnes-Hut Tree integration2 to resolve force inter-actions over short distances, and Particle-Mesh integration3 to resolve interactions over long distances. We have coupled our code with MPWide4 to enable simulations across supercomputers. Our code is named SUSHII.

I) SUSHI stands for Simulating Universe Structure formation on Heterogeneous Infrastructures.

Authors: Derek GroenAuthors: Derek Groenaa, Steven Rieder, Steven Riederaa, Simon Portegies Zwart, Simon Portegies Zwartaa, , Tomoaki IshiyamaTomoaki Ishiyamabb, Jun Makino, Jun Makinobb

Network SetupAll supercomputers have been connected by optical networks. The DEISA network is shared with other users. Communication nodes are shown by the green boxes.

particles procs sites time/step comm. time# # # [s] [s]

120 1 71,04 1,05120 2 61,62 4,59120 3 56,31 4,78120 4 70,1 19,26240 1 272 3,4240 2 252 21,98240 3 294,7 31,28750 2 483,4 46,5

5123

5123

5123

5123

10243

10243

10243

20483

Timing results of our simulations per step averaged over 10 steps can be found in the table. Here, the simulations spend about 90% of the total time on calcutions when performed across 2 or 3 sites.

For our experiments we have used one IBM Power6 supercomputer and three Cray-XT4 machines. The IBM resides at SARA in Amsterdam(NL) and the Cray machines reside at EPCC in Edinburgh (UK), CSC in Espoo (FI) and CFCA in Tokyo (JP).

Runs over 1 site were performed at SARA only, runs over 2 sites also at EPCC, and runs over 3 sites also at CSC. The 20483 run was performed at SARA and CFCA. Timings per step of a 5123 run over 3 sites.

ConclusionWe have run cosmological simulations efficiently across multiple supercomputers. The scale of our experiments is constrained by the political overhead of scheduling the application across supercomputers. The use of a meta-scheduler and reservation system that works across sites will enable us to perform long-lasting and large production runs on a grid of supercomputers.

AffiliationsA. Leiden Observatory, Leiden University, Leiden, the Netherlands.

B. Center for Computational Astrophysics, Mitaka, Tokyo, Japan.

Acknowledgements

This research is supported by the Netherlands organization for Scientific research (NWO) grant #639.073.803, #643.200.503 and #643.000.803, the Stichting Nationale Computerfaciliteiten (project #SH-095-08), NAOJ, SURFNet (GigaPort project), the Netherlands Advanced School for Astronomy (NOVA) and the Leids Kerkhoven-Bosscha fonds (LKBF). We thank the DEISA Consortium (www.deisa.eu), co-funded through the EU FP6 project RI-031513 and the FP7 project RI-222919, for support within the DEISA Extreme Computing Initiative.

We thank the network facilities of SURFnet,, IEEAF, WIDE, Northwest Gigapop and the Global Lambda Integrated FAcility (GLIF) GOLE of Trans-Light Cisco on National LambdaRail, TransLight, StarLight, NetherLight, T-LEX, Pacific and Atlantic Wave.

The picture in the background is a colorized density plot of the simulation data at redshift z=5.65. We performed the run over 4 sites and the slices are colored to match volumes residing at CFCA, CSC, EPCC and SARA.

Higher accuracy tree integrationusing an opening angle of 0.3.

Lower accuracy tree integrationusing an opening angle of 0.5.

References1. T. Ishiyama, T. Fukushige, and J. Makino, “GreeM : Massively ParallelTreePM Code for Large Cosmological N-body Simulations,” accepted by PASJ.

2. J. Barnes and P. Hut, “A Hierarchical O(NlogN) Force-Calculation Algorithm,” Nature, vol. 324, pp. 446–449, Dec. 1986.

3. R. Hockney and J. Eastwood, “Computer Simulation Using Particles”, New York: McGraw-Hill, 1981.

4. D. Groen, S. Rieder, P. Grosso, C. de Laat and S. Portegies Zwart, “A light-weight communication library for distributed computing,” (submitted to CSD).

120 cores total