ccni hpc 2 activities
DESCRIPTION
CCNI HPC 2 Activities. HPC 2 Activities. NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years Goal is to provide NY State users support in the application of HPC technologies in: Research and discovery Product development - PowerPoint PPT PresentationTRANSCRIPT
CCNI HPC2 Activities
2
NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years
Goal is to provide NY State users support in the application of HPC technologies in: Research and discovery Product development Improved engineering and manufacturing
processesThe HPC2 is a distributed activity - participants
Rensselaer, Stony Brook/Brookhaven, SUNY Buffalo, NYSERNET
HPC2 Activities
NY State Industrial Partners
XeroxCorningITT Fluid Technologies: Goulds PumpsGlobal Foundries
Modeling Two-phase Flows
Objectives Demonstrate end-to-end solution of two-phase flow
problems. Couple with structural mechanics boundary condition. Provide interfaced, efficient and reliable software suite
for guiding design.Tools Simmetrix SimAppS Graphical Interface – mesh generation
and problem definition PHASTA – two-phase level set flow solver PhParAdapt – solution transfer and mesh adaptation driver Kitware Paraview – visualizationSystems CCNI BG/L, CCNI Opterons Cluster
Modeling Two-phase Flows3D Example Simulation
REPLACE WITH ANIMATION
Fluid ejected into air.Ran on 4000 CCNI BG/L cores.
Two-phase Automated Mesh Adaptation
Six iterations of mesh adaptation on two-phase simulation. Autonomously ran on 128 cores of CCNI Opterons for approximately 4 hours
Modeling Two-phase FlowsSoftware Support for Fluid Structure Interactions
Initial work interfaces simulations through serial file formats for displacement and pressure data.
Structural mechanics simulation runs in serial. PHASTA simulation runs in parallel.
Distribute serial displacement data to partitioned PHASTA mesh.
Aggregate partitioned PHASTA nodal pressure data to serial input file.
Modifications to automated mesh adaptation Perl script.
Structural Mechanics Mesh of Input Face
PHASTA Partitioned Mesh of Input Face
Modeling Free Surface FlowsObjectives Demonstrate capability of available computational
tools/resources for parallel simulation of highly viscous sheet flows.
Solve a model sheet flow problem relevant to the actual process/geometry.
Develop and define processes for high fidelity twin screw extruder parallel CFD simulation.
Investigated Tools (to date) ACUSIM AcuConsole and AcuSolve, Simmetrix
MeshSim, Kitware ParaviewSystems CCNI Opterons Cluster
9
High Aspect Ratio Sheet Aspect ratio : 500:1 Element count: 1.85 Million 7 mins on 512 cores 300 mins on 8 cores
Parallel 3D Sheet Flow Simulation
10
Mesh generation in Simmetrix SimAppS graphical interface.
Gaps that are ~1/180 of large feature dimension.
* http://en.wikipedia.org/wiki/Plastics_extrusion** https://sites.google.com/site/oscarsalazarcespedescaddesign/project03
Single Screw Extruder CAD**
Conceptual Rendering of Single
Screw Extruder Assembly*
Screw Extruder: Simulation Based Design Tools
Modeling Pump FlowsObjectives Apply HPC systems and software to setup and
run 3D pump flow simulations in hours instead of days.
Provide automated mesh generation for fluid geometries with rotating components.
Tools ACUSIM Suite, PHASTA, ANSYS CFX,
FMDB, Simmetrix MeshSim, Kitware ParaviewSystems CCNI Opterons Cluster
Modeling Pump FlowsGraphical Interfaces
AcuConsole Interface Problem definition, mesh
generation, runtime monitor, and data visualization
Modeling Pump Flows Critical Mesh Regions
Modeling Pump Flows Critical Mesh Regions
Mesh Generation Tools Simmetrix provided customized mesh generation and
problem definition GUI after iterating with industrial partner. Supports automated identification of pump
geometric model features and application of attributes
Problem definition with support for exporting data for multiple CFD analysis tools.
Reduced mesh generation time frees engineers to focus on simulation and design optimizations improved products
Scientific Computation Research Center
Scientific Computation Research Center Goal: Develop simulation technologies that allow
practitioners to evaluate systems of interest. To meet this goal we
Develop adaptive methods for reliable simulations Develop methods to do all computation on massively
parallel computers Develop multiscale computational methods Develop interoperable technologies that speed
simulation system development Partner on the construction of simulation systems for
specific applications in multiple areas
SCOREC Software Components
Software available (http://www.scorec.rpi.edu/software.php) Some tools not yet linked – email [email protected]
with any questions Simulation Model and Data Management
Geometric model interface to interrogate CAD models Parallel mesh topological representation Representation of tensor fields Relationship manager
Parallel Control Neighborhood aware message packing Iterative mesh partition improvement with multiple criteria Processor mesh entity reordering to improve cache
performance
SCOREC Software Components (Continued)
Adaptive Meshing Adaptive mesh modification Mesh curving
Adaptive Control Support for executing parallel adaptive unstructured mesh
flow simulations with PHASTA Adaptive multimodel simulation infrastructure
Analysis Parallel Hierarchic Adaptive Stabilized Transient Analysis
software for compressible or incompressible, laminar or turbulent, steady or unsteady flows on 3D unstructured meshes (with U. Colorado)
Parallel hierarchic multiscale modeling of soft tissues
Interoperable Technologies for Advanced Petascale Simulations (ITAPS)
Mesh Geometry Relations FieldCommonInterfaces
ComponentTools
Are unified by
PetascaleIntegratedTools
Build on
Mesh Adapt
InterpolationKernelsSwapping Dynamic
ServicesGeom/MeshServices
AMRFront tracking
ShapeOptimization
SolutionAdaptiveLoop
SolutionTransfer
PetascaleMeshGeneration
SmoothingFront tracking
PHASTA Scalability(Jansen, Shephard, Sahni, Zhou)
Excellent strong scaling Implicit time integration Employs the partitioned mesh for
system formulation and solution Specific number of ALL-REDUCE
communications also required
#Proc. El./core t(sec) scale512 204,800 2120 1
1,024 102,400 1052 1.012,048 51,200 529 1.004,096 25,600 267 0.998,192 12,800 131 1.02
16,384 6,400 64.5 1.0332,768 3,200 35.6 0.93
105M vertex mesh (CCNI Blue Gene/L)
1 billion element anisotropic mesh on Intrepid Blue
Gene/P#of cores
Rgn imb
Vtx imb Time (s) Scaling
16k 2.03% 7.13% 222.03 1
32k 1.72% 8.11% 112.43 0.987
64k 1.6% 11.18% 57.09 0.972
128k 5.49% 17.85% 31.35 0.885
Strong Scaling – 5B Mesh up to 288k Cores
Without ParMA partition improvement strong scaling factor is 0.88 (time is 70.5 secs).Can yield 43 cpu-years savings for production runs!
AAA 5B elements: full-system scale on Jugene (IBM BG/P system)
Requires functional support for Mesh distribution Mesh level inter-processor communications Parallel mesh modification Dynamic load balancing
Have parallel implementations for each – focusing on increasing scalability
Parallel Adaptive Analysis
Initial mesh: uniform, 17 million mesh regions
Adapted mesh: 160 air bubbles 2.2 billion mesh regions
Multiple predictive load balance steps used to make the adaptation possible
Larger meshes possible (not out of memory)
Parallel Mesh Adaptation to 2.2 Billion Elements
Initial and adapted mesh (zoom of a bubble), colored by magnitude of mesh size field
Mesh size field of air bubbles distributing in a tube (segment of the model – 64 bubbles total)
Initial Scaling Studies of parallel MeshAdapt Test strong scaling uniform
refinement on Ranger 4.3M to 2.2B elements
Nonuniform field driven refinement (with mesh optimization) on Ranger 4.2M to 730M elements (time for dynamic load balancing not included)
Nonuniform field driven refinement (with mesh optimization operations) on Blue Gene/P 4.2M to 730M elements (time for dynamic load balancing not included)
# of Parts Time (s) Scaling2048 21.5 1.04096 11.2 0.968192 5.67 0.95
16384 2.73 0.99
# of Parts Time (s) Scaling2048 110.6 1.0
4096 57.4 0.96
8192 35.4 0.79
# of Parts Time (s) Scaling4096 173 1.0
8192 105 0.82
16384 66.1 0.65
32768 36.1 0.60
Tightly coupled Adv: Computationally efficient Disadv: More complex code
development Example: Explicit solution of
cannon blastsLoosely coupled
Adv: Ability to use existing analysis codes
Disadv: Overhead of multiple structures and data conversion
Example: Implicit high-orderActive flow control modeling
t=0.0
t=2e-4
t=5e-4
Adaptive Loop Construction
Adaptive Loop Driver – C++ Coordinates API calls to execute solve-adapt loop
phSolver – Fortran 90 Flow solver scalable to 288k cores of BG-P, Field API
phParAdapt – C++ Invokes parallel mesh adaptation
▪ SCOREC FMDB and MeshAdapt, Simmetrix MeshSim and MeshSimAdapt
Adaptive Loop DriverphSolver phParAdapt
27
Compact Mesh and Solution
Data
Mesh Data Base
Solution Fields
Field API
Field API
Control
Control
Field Data
Field Data
File Free Parallel-Adaptive Loop
Mesh Curving for COMPASS Analyses
28
mesh close-up before and after correcting invalid mesh regions
marked in yellow
• Mesh curving applied to 8-cavity cryomodule simulations• 2.97 Million curved regions• 1,583 invalid elements corrected – leads to stable simulation and executes
30% faster
Moving Mesh Adaptation
• FETD for short-range wakefield calculations▪ Adaptively refined meshes have 1~1.5
million curved regions▪ Uniform refined mesh using small mesh
size has 6 million curved regions
Electric fields on the three refined curved meshes
Patient Specific Vascular Surgical Planning
Initial mesh has 7.1 million regionsInitial mesh is isotropic outside boundary layerThe adapted mesh: 42.8 million regions 7.1M->10.8M->21.2M->33.0M->42.8MBoundary layer based mesh adaptationMesh is anisotropic
Multiscale Simulations for Collagen Indentation
• Multiscale simulation linking microscale network model to a macroscale finite element continuum model.
• Collaborating with experimentalists at the University of Minnesota
Macroscale Model Microscale Model
Concurrent Multiscale: Atomistic-to-Continuum
Nano-indentation of a thin film.Concurrent modelconfiguration at 60th loadstep (3 A indentationdisplacement). Colors representthe sub-domains in whichvarious models are used.
Nano-void subjected to hydrostatictension. Finite element discretization of the problem domain anddislocation structures.
33
ParallelComputingMethods
Fab-Aware High-Performance Chip Design
size
sca
le
circ
uits
devi
ces
atom
s/ca
rrie
rs
design manufacture use/performance
Simulation AutomationComponents
Device simulation
Super-resolutionlithography tools
Reactive ionetching
variation-awarecircuit design
1st principlesCMOS modeling Modeling/simulation
development
Technology development
Mechanics ofdamage nucleation in devices
34
First-Principles Modeling for Nanoelectronic CMOS (Nayak)
E
Fermi level
NU
UN
Poisson
Schrödinger
UI
Input to circuit level from atomic level physics
As Si CMOS devices shrink nanoelectronic effects emerge. Fermi-function based analysis gives way
to quantum energy-level analysis. Poisson and Schrodinger equations reconciled
iteratively, allowing for current predictions. Carrier dynamics respond to strain
in increasingly complex ways from mobility changes to tunneling effects.
New functionalities might be exploited▪ Single-electron transistors▪ Graphene semiconductors▪ Carbon nanotube conductors▪ Spintronics – encoding information into charge carrier’s spin
35
Super-Resolution Lithography Analysis (Oberai)
Motivation: Reducing feature size in has made the
modeling of underlying physics critical. In projective lithography simple biases not
adequate In holographic lithography near-field
phenomenon is predominant Modeling approach must be based on
Maxwell’s equations Goal:
Develop unified computational algorithms for the design and analysis of super-resolution lithographic processes that model the underlying physics with high fidelity
Projective Lithography
Holographic Lithography
36
Virtual Nanofabrication: Reactive-Ion Etching Simulation (Bloomfield)
To handle SRAM-scale systems, we expect much larger computational systems, e.g., 105 - 106 surface elements. Transport tracking scales O(n2) with number of surface elements n.
▪ Parallelizes well – every view factor can be computed completely independently of every other view factor, giving almost linear speed up.
Computational complexity of chemistry solver depends upon particular chemical mechanisms associated with etch recipe. Tend to be O(n2).
Cut away view of reactive ion etch simulation of an aspect ratio 1.4 via into a dielectric substrate with 7% porosity, and complete selectivity with respect to the underlying etch stop. A generic ion-radical etch model was used. ~103 surface elements. [Bloomfield et al., SISPAD 2003, IEEE.]
37
Stress-induced Dislocation Formation in Silicon Devices (Picu) At 90 nm and below, devices have come to rely on increased carrier mobility
produced by strained silicon. As devices scale down, the relative importance of scattering centers increases. Can we have our cake and eat it too? How much strain can be built into a given
device before processing variations and thermo-mechanical load during use cause critical dislocation shedding?
Continuum FEM calculationsautomatically identify critical high-stress regions.
A local atomistic problem is constructed and an MD simulation is run, looking for criticality. Results feed back to continuum.
38
Advanced Meshing Tools for Nanoelectronic Design (Shephard)
Advanced meshing tools and expertise exist at RPI and associated spin-off
Leverage tools to support CCNI projects such as the advanced device-modeling.
Local refinement and adaptivity can help carry the computation resources further. “More bang for the buck.”