www.vacet.org brad whitlock october 14, 2009 brad whitlock october 14, 2009 porting visit to bg/p
Post on 20-Dec-2015
215 views
TRANSCRIPT
www.vacet.org
Brad Whitlock
October 14, 2009
Brad Whitlock
October 14, 2009
Porting VisIt to BG/PPorting VisIt to BG/P
www.vacet.org
Overview
• Objectives• Building 3rd party libraries• Building VisIt• Running VisIt on BG/P• Improvements• Impact• Future work
• Objectives• Building 3rd party libraries• Building VisIt• Running VisIt on BG/P• Improvements• Impact• Future work
www.vacet.org
Objectives
• Port VisIt to IBM’s BlueGene/P platform so VisIt can run on LLNL’s Dawn and eventually Sequoia
– Dawn is a 500 Teraflop, 36,864 node, 147,456 cpu, IBM BG/P system
– 4 850MHz PowerPC cores/node, 4Gb Memory/node
– Compute nodes run CNK OS
– Cross-compile code for CNK
• Identify weaknesses in VisIt that prevent it from scaling to tens/hundreds of thousands of processors
• Port VisIt to IBM’s BlueGene/P platform so VisIt can run on LLNL’s Dawn and eventually Sequoia
– Dawn is a 500 Teraflop, 36,864 node, 147,456 cpu, IBM BG/P system
– 4 850MHz PowerPC cores/node, 4Gb Memory/node
– Compute nodes run CNK OS
– Cross-compile code for CNK
• Identify weaknesses in VisIt that prevent it from scaling to tens/hundreds of thousands of processors
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
www.vacet.org
Building 3rd party libraries
• Built all libraries on login nodes for regular Linux PowerPC version of VisIt– Ran into runtime problems using xlC compiler so reverted to g++
for the time being
• Cross-compiled all libraries for CNK• No support for this platform in VisIt’s 3rd party
libraries so special builds were required• Mesa built unmangled and no X11• VTK tricky to build
– No OpenGL so VTK built with Mesa as its OpenGL– No X11 so created custom render window– Used CMake toolchain file
• Built all libraries on login nodes for regular Linux PowerPC version of VisIt– Ran into runtime problems using xlC compiler so reverted to g++
for the time being
• Cross-compiled all libraries for CNK• No support for this platform in VisIt’s 3rd party
libraries so special builds were required• Mesa built unmangled and no X11• VTK tricky to build
– No OpenGL so VTK built with Mesa as its OpenGL– No X11 so created custom render window– Used CMake toolchain file
www.vacet.org
Building VisIt
• No X11 so graphical components can’t be built for CNK (don’t build gui)
• Added new --enable-engine-only build mode to VisIt’s build system that only builds the compute engine and its plugins
• VisIt always used to require mangled mesa– This support had to become conditional on VTK having
mangled mesa support
• No X11 so graphical components can’t be built for CNK (don’t build gui)
• Added new --enable-engine-only build mode to VisIt’s build system that only builds the compute engine and its plugins
• VisIt always used to require mangled mesa– This support had to become conditional on VTK having
mangled mesa support
www.vacet.org
Running VisIt on Dawn
• Dawn uses mpirun to start VisIt on compute nodes– Minor differences required environment variables to be exported via
mpirun command, which could be handled via host profile in VisIt
• VisIt ran at 1k,2k,4k,8k,16k nodes• VisIt ran with 1 and 4 trillion zone datasets (June09)• Encountered scaling problems early
– Launch time slow because each processor was reading plugin directory to obtain plugin information
– VisIt commands were sent from rank 0 to other ranks 1Kb at a time until a message was sent
– Non-spinning bcast substitute used for sending commands had point-to-point that performed poorly at scale
– Certain metadata consumed too much memory (each processor has ~700Mb only)
– Synchronization step for SR mode used slow point-to-point
• Dawn uses mpirun to start VisIt on compute nodes– Minor differences required environment variables to be exported via
mpirun command, which could be handled via host profile in VisIt
• VisIt ran at 1k,2k,4k,8k,16k nodes• VisIt ran with 1 and 4 trillion zone datasets (June09)• Encountered scaling problems early
– Launch time slow because each processor was reading plugin directory to obtain plugin information
– VisIt commands were sent from rank 0 to other ranks 1Kb at a time until a message was sent
– Non-spinning bcast substitute used for sending commands had point-to-point that performed poorly at scale
– Certain metadata consumed too much memory (each processor has ~700Mb only)
– Synchronization step for SR mode used slow point-to-point
www.vacet.org
Improvements
• Broadcast plugin information from rank 0 to other ranks to improve plugin loading time 9x
• Broadcast VisIt commands from rank 0 in a single chunk instead of 1Kb at a time
• Use standard bcast in engine main loop instead of poorly performing non-spin substitute geared towards shared nodes
• Switched to alternate metadata representation to free up most available memory for calculations
• Mark Miller was able to replace SR mode synchronization step with much faster version that reduced time to 2 seconds from 20 minutes
• Broadcast plugin information from rank 0 to other ranks to improve plugin loading time 9x
• Broadcast VisIt commands from rank 0 in a single chunk instead of 1Kb at a time
• Use standard bcast in engine main loop instead of poorly performing non-spin substitute geared towards shared nodes
• Switched to alternate metadata representation to free up most available memory for calculations
• Mark Miller was able to replace SR mode synchronization step with much faster version that reduced time to 2 seconds from 20 minutes
www.vacet.org
Impact
• So far this project’s impact has been small for customers– They do not yet run on Dawn– They might not notice small improvements at today’s
everyday processor counts (<2k)
• At higher processor counts (>4k) optimizations added by this work prevent bottlenecks in compute engine, improving scalability
• So far this project’s impact has been small for customers– They do not yet run on Dawn– They might not notice small improvements at today’s
everyday processor counts (<2k)
• At higher processor counts (>4k) optimizations added by this work prevent bottlenecks in compute engine, improving scalability
www.vacet.org
Future work
• Resolve load problems with xlC compiler so we can use the best optimizations, including using BG/P’s dual FPU’s
• Improve 3rd party library build process for BG/P by adding support in build_visit script
• Continue profiling plots and improving performance
• Reduce memory usage where possible• Investigate I/O patterns and attempt
optimizations
• Resolve load problems with xlC compiler so we can use the best optimizations, including using BG/P’s dual FPU’s
• Improve 3rd party library build process for BG/P by adding support in build_visit script
• Continue profiling plots and improving performance
• Reduce memory usage where possible• Investigate I/O patterns and attempt
optimizations