simulating materials with atomic detail at ibm: from biophysical to high-tech applications...

Post on 27-Dec-2015

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Simulating materials with atomic detail at IBM: From biophysical to high-tech

applications

Application TeamGlenn J. Martyna, Physical Sciences Division, IBM ResearchDennis M. Newns, Physical Sciences Division, IBM ResearchJason Crain, School of Physics, Edinburgh UniversityAndrew Jones, School of Physics, Edinburgh UniversityRazvan Nistor, Physical Sciences Division, IBM ResearchAhmed Maarouf, Physical Sciences Division, IBM Research and EGNCMarcelo Kuroda, Physical Sciences Division, IBM and CS, UIUC

Methods/Software Development TeamGlenn J. Martyna, Physical Sciences Division, IBM ResearchMark E. Tuckerman, Department of Chemistry, NYU.Laxmikant Kale, Computer Science Department, UIUCRamkumar Vadali, Computer Science Department, UIUCSameer Kumar, Computer Science, IBM ResearchEric Bohm, Computer Science Department, UIUCAbhinav Bhatele, Computer Science Department, UIUCRamprasad Venkataraman, Computer Science Department, UIUCAnshu Arya, Computer Science Department, UIUC

Funding : NSF, IBM Research, ONRL, …

“The quote” (Lincoln on Parallel Computing)

One score years ago, our fore-PPLers brought forth upon this continent a new programming language, conceived in data driven execution and dedicated to the proposition that all chares are instantiated equal ...

On the field of applications research, we are testing whether such a programming language or any language so conceived and so dedicated, can long endure or shall perish from the earth.

IBM : A history of multidisciplinary research

2009 Nano MRI

2008 World’s First Petaflop Supercomputer

2005 Cell

2004 Blue Gene/L (National Medal of Technology)

2003 Carbon Nanotubes

1997 Copper Interconnect Wiring

1997 Deep Blue

1994 Silicon Germanium (SiGe)

1987 High-Temperature Superconductivity (Noble)

1986 Scanning Tunneling Microscope (Noble)

1980 RISC

1971 Speech Recognition

1970 Relational Database

1967 Fractals

1966 One-Device Memory Cell

1957 FORTRAN (A mixed blessing)

1956 RAMAC

• Materials Science– Exploratory Materials Synthesis and Processing– Advanced Instrumentation for Materials

Characterization– Theory and Computational Modeling

• Chemistry– New Methods for Synthesis of Electronic Materials

• Physics – Nanoscale Device and Materials Physics– Thermal Physics– Statistical Physics – Biophysics : Theory and Simulation– Physics of Information; Quantum Information Theory

• Nanoscale Engineering– Exploratory Process Integration, – Exploratory Device Test and Characterization

4

IBM’s Physical Sciences: We’re not dead yet!

IBM’s Blue Gene/L network torus supercomputer

Our fancy apparatus: New models on the way or here. BG/Q, BlueWaters, 2-steps to Exascale ...

Goal : The accurate treatment of complex heterogeneous systems to gain physical insight.

Characteristics of current models

Empirical Models: Fixed charge, non-polarizable, pair dispersion.

Ab Initio Models: GGA-DFT, Self interaction present, Dispersion absent.

Problems with current models (empirical)

Dipole Polarizability : Including dipole polarizability changes solvation shells of ions and drives them to the surface.

Higher Polarizabilities: Quadrupolar and octapolar polarizabilities are NOT SMALL.

All Manybody Dispersion terms: Surface tensions and bulk properties determined using accurate pair potentials are incorrect. Surface tensions and bulk properties are both recovered using manybody dispersion and an accurate pair potential. An effective pair potential destroys surface properties but reproduces the bulk.Force fields cannot treat chemical reactions:

Problems with current models (DFT)• Incorrect treatment of self-interaction/exchange: Errors in electron affinity, band gaps …

• Incorrect treatment of correlation: Problematic treatment of spin states. The ground state of transition metals (Ti, V, Co) and spin splitting in Ni are in error. Ni oxide incorrectly predicted to be metallic when magnetic long-range order is absent.

• Incorrect treatment of dispersion : Both exchange and correlation contribute.

• KS states are NOT physical objects : The bands of the exact DFT are problematic. TDDFT with a frequency dependent functional (exact) is required to treat excitations even within the Born-Oppenheimer approximation.

Conclusion : Current Models

• Simulations are likely to provide semi-quantitative accuracy/agreement with experiment.

• Simulations are best used to obtain insight and examine physics .e.g. to promote understanding.

Nonetheless, in order to provide truthful solutions of themodels, simulations must be performed to long time scales!

Limitations of Limitations of ab initioab initio MDMD ((despite our despite our efforts/improvements!)efforts/improvements!)Limited to small systems (100-1000 Limited to small systems (100-1000 atoms)atoms)**..

Limited to short time dynamics and/or Limited to short time dynamics and/or sampling times.sampling times.

Parallel scaling only achieved for Parallel scaling only achieved for # processors <= # electronic # processors <= # electronic statesstates

until recent efforts by ourselves and until recent efforts by ourselves and others.others.

Improving this will allow us to sample Improving this will allow us to sample longer and learn new physics.longer and learn new physics.

*The methodology employed herein scales as O(N3) with system size due to the orthogonality constraint, only.

Solution: Fine grained parallelization of Solution: Fine grained parallelization of CPAIMD.CPAIMD. Scale small systems to 10Scale small systems to 1055 processors!!processors!! Study long time scale Study long time scale phenomena!!phenomena!!

The charm++ QM/MM application is work in progress with Chris Harrison

IBM’s Blue Gene/L network torus supercomputer

The worlds fastest supercomputer!Its low power architecture requires fine grain parallel algorithms/software to achieve optimal performance.

Density Functional Theory : Density Functional Theory : DFTDFT

Electronic states/orbitals of water

Removed by introducing a non-local electron-ion interaction.

Plane Wave Basis Plane Wave Basis Set:Set:

The # of states/orbital ~ N where N is # of atoms. The # of pts in g-space ~N.

Plane Wave Basis Set: Plane Wave Basis Set: Two Spherical cutoffs in G-Two Spherical cutoffs in G-spacespace

gxgy

gz (g)

(g) : radius gcut

gxgy

n(g)

n(g) : radius 2gcut

g-space is a discrete regular grid due to finite size of system!!

gz

Plane Wave Basis Set: Plane Wave Basis Set: The dense discrete real space The dense discrete real space mesh.mesh.

xy

z (r)

(r) = 3D-FFT{ (g)}

xy

n(r)

n(r) = k|k(r)|2

n(g) = 3D-IFFT{n(r)} exactly!

z

Although r-space is a discrete dense mesh, n(g) is generated exactly!

Simple Flow Chart : Scalar Simple Flow Chart : Scalar OpsOps

Object : Comp : MemStates : N2 log N : N2

Density : N log N : NOrthonormality : N3 : N2.33

Memory penalty

Flow Chart : Data Flow Chart : Data StructuresStructures

Parallelization under Parallelization under charm++charm++

Transpose

Transpose

Transpose Transpose

RhoR

Transpose

Effective Parallel Strategy:Effective Parallel Strategy:

The problem must be finely discretized.The problem must be finely discretized.The discretizations must be deftly chosen toThe discretizations must be deftly chosen to

–Minimize the communication between Minimize the communication between processors.processors.

–Maximize the computational load on the Maximize the computational load on the processors.processors.

NOTE , PROCESSOR AND DISCRETIZATION NOTE , PROCESSOR AND DISCRETIZATION ARE ARE

SEPARATE CONCEPTS!!!!SEPARATE CONCEPTS!!!!

Ineffective Parallel StrategyIneffective Parallel Strategy

The discretization size is controlled by the The discretization size is controlled by the number of physical processors.number of physical processors.

The size of data to be communicated at a The size of data to be communicated at a given step is controlled by the number of given step is controlled by the number of physical processors.physical processors.

For the above paradigm : For the above paradigm : –Parallel scaling is limited to # Parallel scaling is limited to # processors=coarse grained parameter in processors=coarse grained parameter in the model.the model.

THIS APPROACH IS TOO LIMITED TO THIS APPROACH IS TOO LIMITED TO ACHIEVE FINE GRAINED PARALLEL ACHIEVE FINE GRAINED PARALLEL SCALING.SCALING.

Virtualization and Charm++Virtualization and Charm++Discretize the problem into a large number Discretize the problem into a large number of very fine grained parts.of very fine grained parts.

Each discretization is associated with some Each discretization is associated with some amount of computational work and amount of computational work and communication.communication.

Each discretization is assigned to a light Each discretization is assigned to a light weight thread or a ``virtual processor'' weight thread or a ``virtual processor'' (VP).(VP).

VPs are rolled into and out of physical VPs are rolled into and out of physical processors as physical processors become processors as physical processors become available (Interleaving!)available (Interleaving!)

Mapping of VPs to processors can be Mapping of VPs to processors can be changed easily.changed easily.

The Charm++ middleware The Charm++ middleware provides the provides the data structures and controls required to data structures and controls required to choreograph this complex dance.choreograph this complex dance.

Parallelization by over partitioning of parallel objects : The charm++ middleware choreography!

Decomposition of work

Available Processors

On a torus architecture, load balance is not enough!The choreography must be ``topologically aware’’.

Charm++ middleware maps work to processors dynamically

Challenges to scaling:Challenges to scaling:Multiple concurrent 3D-FFTs to generate the states in real space require AllToAll communication patterns. Communicate N2 data pts. Reduction of states (~N2 data pts) to the density (~N data pts) in real space.

Multicast of the KS potential computed from the density (~N pts) back to the states in real space (~N copies to make N2 data).

Applying the orthogonality constraint requires N3 operations.

Mapping the chare arrays/VPs to BG/L processors in a topologically aware fashion.

Scaling bottlenecks due to non-local and local electron-ion interactions removed by the introduction of new methods!

Topologically aware mapping for CPAIMD

•The states are confined to rectangular prisms cut from the torus to minimize 3D-FFT communication. •The density placement is optimized to reduced its 3D-FFT communication and the multicast/reduction operations.

~N1/2

(~N1/3)

States ~N1/2

Gspace

Density N1/12

Topologically aware mapping for CPAIMD : Details

Distinguished Paper Award at Euro-Par 2009

Improvements wrought by topological aware mappingon the network torus architecture

Density (R) reduction and multicast to State (R) improved.State (G) communication to/from orthogonality chares improved.‘’Operator calculus for parallel computing”, Martyna and Deng (2011) in preparation.

Parallel scaling of liquid water* as a function of system size on the Blue Gene/L installation at YKT:

•Weak scaling is observed!•Strong scaling on processor numbers up to ~60x the number of states!•IBM J. Res. Dev. (2009).

*Liquid water has 4 states per molecule.

Scaling Water on Blue Gene/L

Topo-aware Spanning tree

Results from 32 water molecule simulation

Software : SummarySoftware : Summary

Fine grained parallelizationFine grained parallelization of the Car-of the Car-Parrinello Parrinello ab initioab initio MD method demonstrated MD method demonstrated on thousands of processors : on thousands of processors :

# processors >> # electronic states# processors >> # electronic states..

Long time simulations of small systems are Long time simulations of small systems are now possible on large massively parallel now possible on large massively parallel supercomputers.supercomputers.

Instance parallelization • Many simulation types require fairly uncoupled instances

of existing chare arrays.

• Simulation types is this class include: 1) Path Integral MD (PIMD) for nuclear quantum effects. 2) k-point sampling for metallic systems. 3) Spin DFT for magnetic systems. 4) Replica exchange for improved atomic phase space

sampling.

• A full combination of all 4 simulation is both physical and interesting

Replica Exchange : M classical subsystems each at a different temperature acting indepently

Replica exchange uber index active for all chares. Nearest neighbor communication required to exchange temperatures and energies

PIMD : P classical subsystems connect by harmonic bonds

Classical particle

Quantum particlePIMD uber index active for all chares. Uber communication required to compute harmonic interactions

K-points : N-states are replicated and given a different phase.

The k-point uber index is not active for atoms and electron density.Uber reduction communication require to form the e-density and atom forces.

k0

k1

Atoms are assumed to be part of a periodic structure andare shared between the k-points (crystal momenta).

Spin DFT : States and electron density are given a

spin-up and spin-down index.

The spin uber index is not active for atoms.Uber reduction communication require to form the atom forces

Spin up

Spin dn

``Uber’’ charm++ indices

• Chare arrays in OpenAtom now posses 4 uber ``instance’’ indices.

• Appropriate section reductions and broadcasts across the ‘’Ubers’’ have been enabled.

• All physics routines are working.

• We expect full results this month.

Goal : The accurate treatment of complex heterogeneous systems to gain physical insight.

Clock speed of Microprocessors (Haensch, 2008)

1.0E+02

1.0E+03

1.0E+04

1990 1995 2000 2005 2010

Clo

ck S

pee

d (

MH

z)

103

102

104

2004 Frequency Extrapolation

CMOS clock speed no longer scales!! Oh my!

p substrate, doping *NA

Scaled Device

L/xd/

GATE

n+ source

n+ drain

WIRINGVoltage, V /

W/tox/

CMOS Constant Field Scaling Rules

SCALING:Voltage: V/Oxide: tox /Wire width: W/Gate length: L/Diffusion: xd /Substrate: NA

RESULTS:Higher Density: ~2

Higher Speed: ~Power/ckt: ~1/2

Active Power Density: ~Constant!

R. H. Dennard et al.,

IEEE J. Solid State Circuits, (1974).

New device concepts are required for truly

low-voltage or low-power digital logic.

The energy dissipated in each switching operation is ½CV2 –To reduce C: Make the device smaller.

–To reduce V: We must now change the device physics! Why???

New device concepts are required for truly

low-voltage or low-power digital logic.

For transport controlled by thermal emission over a barrier,-- subthreshold current is proportionaltoexpse/kT

-- ds / d(log10 I) = 60 mV/decade at room temp

Oh no! I am stuck at ~1V or I leak like crazy!!!

Log Current

Voltage

Sub-Threshold

slope

Ion

Ioff

Vdd0

How low can the supply voltage be? Supply Voltage Based on Voltage Noise

Constraints

10-3

10-2

10-1

100

8 kT/CSRAM

8 kT/CLOGIC

SRAMPot

entia

l (V

)

Technology Node (nm)

45 32 22 16 11 8

kT/e

kT/C Ave. Logic.

Min. Pwr. Suppl. CMOS (~5kT/e)

Estimated average capacitance for logic includes both device and wiring capacitance, and for static memory,

the internal capacitance of a 6-device cell.

CkT /8minimum 8 Johnson-Nyquist noise criterion ( ) for static memory logic

Theis and Solomon, Proceedings of the IEEE, Dec. 2010.

Proposed Paths to Low-Voltage SwitchingTheis and Solomon, Proceedings of the IEEE, Dec. 2010

Energy Filtering– The Tunnel-FET cuts off the high-energy tail of the

Boltzmann distribution.

Internal Potential Step-up – The Ferroelectric-Gate (or Negative Capacitance)

FET allows a small gate voltage swing to produce a

larger swing in s.

Internal Transduction– The Spin-FET transduces a voltage to a spin state

which is gated and then transduced back to a voltage state. The NEMS switch transduces a voltage to a mechanical state and back.

These concepts are more general than the “exemplar” devices!

Wow! A green light to explore new physics from the CMOS guys!

Novel Switches by Phase Change Materials

The phases are interconverted by heat!

Novel Switches by Phase Change Materials

Ge0.15Te0.85

or Eutectic GT

Novel Switches by Phase Change Materials

GeSb2Te4 or GST-124

Piezoelectrically driven fast, cool, memory??

A pressure driven mechanism would be fast, cool, and scalable! Can we find suitable materials using a combined exp/theor approach?

GST-225 undergoes pressure induced “amorphization” both experimentally and theoretically ….

but the process is notreversible!

crystallineamorphous

Eutectic GS undergoes anamorphous to crystalline transformation under pressure, experimentally!

Is the process amenable to reversible switching as in the thermal approach???

Utilize tensile load to approach the spinodal and cause pressure induced amorphization!

Schematic of a potential device based on pressure switching

P~1.5 GPa

P~ -1.5 GPa

CPAIMD spinodal line!

Spinodal decomposition under tensile load

Solid is stable at ambient pressure

Crystalline

Amorphous

Why? Gap driven amorphization!

IBM’s Piezoelectric Memory

We are investigating other materials and better device designs! Patent filed. Scientific work has appeared in PNAS.

Mechanics of C-Clamp

c

e

F F

f

uL

YAAF

L

YA

u

Fk

fcee uuuLEd 333

f

c

e

f

fe

c

ce

c

e

fcec

e

cc

AA

L

L

YLL

YAA

Y

Ed

kkkA

LEd

A

F

111)(333

111333

Maximize stress in PCM : f stiff,large , PCM small, Piezo large.

PCM

Piezo

stiffener

buffer

piezoelectric

phase change or

piezoresistive

metal contact

Device Structure

C-Clamp Piezoelectric Device Structure

60

2040

40

SiN

SiO

2

Piezotronic Mesoscale Transduction Devices: a fast, low power switching technology

Sense

Common

SensePR

CommonPiezoelectric

DriveDrive

PiezoResist

Team

An industry-academic collaborative to discover new physical principles and perform novel Materials Science & Engineering

to revolutionize computingFunded under DARPA MESO Program : Kickoff June 2011

top related