cmsi計算科学技術特論b(9) オーダーn法2

DFTN N Krylov

CMSI B 9 2014612

N2

8

N N O(N)

9

2014 65

2014 612

8NKohn-Sham

Wavelet (Sinc)

Wavelet

KS

:

O(N)

O(N)

LCPAO (Linear-Combination of Pseudo Atomic Orbital Method)

1. KS

2.

3. l step2

s-orbital of oxygen

PRB 67, 155108 (2003) PRB 69, 195113 (2004)

(LDA)DNP DNP=+

LCPAO

Ozaki, PRB 67, 155108 (2003)

1KS

ac

cacKS

Ozaki, PRB 67, 155108 (2003)

a:

caRccRa

1 aKS 2 Ea a

12GDIIS

Primitive vs. Optimized

p p

Wish

Wien2k0.005

Wien2k3GPa

Wien2k

0.5 kcal/mol

LCAO (1)

NLCAO: LCAO NPW:

NLCAO

1. 2. 3. 4.

LCAO (2)

LCAO:

LCAO

Gaussian basis sets for accurate calculations on molecular systems in gas and condensed phases, J. VandeVondele and J. Hutter, JCP 127, 114105 (2007). CP2K code

Ab initio molecular simulations with numeric atom-centered orbitals, V. Blum et al., Comp. Phys. Comm. 180, 2175 (2009). FHI-AIMS code

LCAO

1.

2.

3.

: P

BCC

Full space Mn 3s, 3p, 3d, 4s 3s3p3s3p 3s (3p)4s(4p)

(1) s5>3 5s 3s

(2) AB

(3) B - C

-factor

FLAPW+LOWien2k-(V06%V0, B0, B1Birch-Murnaghan)

Lejaeghere et al., Critical Reviews in Solid State and Materials Sciences 39, 1-24 (2014).

GGA-PBE: -factor

GGA-PBE58-factor23.5meV/atom1/102meV/atom

-factor

http://molmod.ugent.be/deltacodesdft

N

3

FFT References: A three-dimensional domain decomposition method for large-scale DFT electronic structure calculations T.V.T. Duy and T. Ozaki, Comput. Phys. Commun. 185, 777-789 (2014). A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs T.V.T. Duy and T. Ozaki, Comput. Phys. Commun. 185, 153-164 (2014).

,

4

O(N) MPI

(1) MPI

(2)

MPI19

2

16384 atoms, 19 processes

:

CNT, 16 processes

Poisson1 A,B,C,D

3D-FFT2D

Row-wise

Pencil

Duy and Ozaki, CPC 185, 153 (2014).

3D-FFT2D-(row-wise)

1D- N N22

Compared to 1D-parallelization, no increase of MPI communication up to N. Even at N2, just double communication.

1283 data points

OpenFFT

FFT

OpenFFTGNU-GPLwebdownloadhttp://www.openmx-square.org/openfft/

O(N)

13107268

: 131072

N

bcc-

Li

: T. Ohwaki, M. Otani, T. Ikeshoji, and T. Ozaki, J. Chem. Phys. 136, 134101 (2012).

H. Sawada, S. Taniguchi, K. Kawakami, and T. Ozaki, Modelling Simul. Mater. Sci. Eng. 21, 045012 (2013).

bcc-

: TiC, VC, NbC

By

HRTEM image

0.0

1.0

2.0

3.0

4.0

5.0

24.0 25.0 26.0 27.0 28.0 29.0

inte

rfac

e en

ergy

(J/m

2)

c0

Fe7/(NbC)7

Fe-Nb

Fe-C

semi-coherent

Fe-C Fe-Nb FeNbCFeNb

12

34

56

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

c0

12

34

56

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

0.0

0.5

1.0

1.5

2.0

2.5

3.0

-8 -6 -4 -2 0 2 4

DOS

(sta

es/e

V/at

om)

Energy (eV)

Fe4

Fe7

0.0

0.5

1.0

1.5

2.0

2.5

3.0

-8 -6 -4 -2 0 2 4

DOS

(sta

es/e

V/at

om)

Energy (eV)

Fe4

Fe7

1 2

3 4

5 6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

1 2

3 4

5 6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

-8 -6 -4 -2 0 2 4

DOS

(sta

es/e

V/at

om)

Energy (eV)

Nb8

Nb11

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

-8 -6 -4 -2 0 2 4

DOS

(sta

es/e

V/at

om)

Energy (eV)

c15

c18

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

-8 -6 -4 -2 0 2 4

DOS

(sta

es/e

V/at

om)

Energy (eV)

nb15

nb18

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

-8 -6 -4 -2 0 2 4

DOS

(sta

es/e

V/at

om)

Energy (eV)

c8

c11

Fe Fe

Nb Nb

C C

Nb C

Fe

: TSUBAME

H. Sawada et al., Modelling Simul. Mater. Sci. Eng. 21, 045012 (2013).

2.2nm

: T. Ohwaki et al., J. Chem. Phys. 136, 134101(2012).

Li By

(~400) H-Si(111) (PC) + Li+

(ESM)

O(N)Krylov

Velocity scaling(600K)

Li+

Li+

T. Ozaki, PRB 75, 035123 (2007). T. Ozaki, PRB 82, 075131 (2010).

Density functionals as a functional of

Density functionals can be rewritten by the first order reduced density matrix:

where the electron density is given by

1. Non-zero matrix elements of H and S 2. Density matrix corresponding to the non-zero matrix elements

All we need to calculate

Main difficulty: diagonalization

O(N3) method - Numerically exact diagonalization Householder+QR method Conjugate gradient (CG) method Davidson method

Even if basis functions are localized in real space, Gram-Shmidt (GS) type method is needed to satisfy orthonormality among eigenstates, which results in O(N3) for the computational time.

O(N) method - can be achieved in exchange for accuracy. O(N) Krylov subspace method, DC, DM, OM methods, etc..

O(N2~) method Is it possible to develop O(N2~) methods without introducing approximations? No more GS process.

:

1. Green

2.

3.

LNV, PRB 47, 10891 (1993)

1.

PBR 75, 035123 (2007). PRB 82, 075131 (2010).

2. GreenLDLT

: : : & : 1D, 2D, 3D : : (PAO, FEM, FD)

Fermi PBR 75, 035123 (2007).

Hu et al., JCP 133, 101106 (2010) Karrasch et al., PRB 82, 125114 (2010). Lin Lin et al., Chinese Annals of Mathematics (CAM), Ser. B.

Convergence of w.r.t. poles

The calculation of can be expressed by a contour integration:

The analysis shows that the number of poles for each eigenstate for a sufficient convergence does not depend on the size of system if the spectrum radius does not change. The scaling property is governed by the calculation of G.

Lehmann rep.

Convergence property of the contour integration

Total energy of aluminum as a function of the number of poles by a recursion method at 600 K.

The energy completely converges using only 80 poles within double precision.

Nicholson et al., PRB 50, 14686 1994

How can Greens funtion be evaluated ?

The Greens function is the inverse of a sparse matrix (ZS-H).

Selected elements of G(Z), which correspond to non-zero elements of the overlap matrix S, are needed to calculate physical properties.

1. Nested dissection of (ZS-H) 2. LDLT decomposition for the structured matrix

a set of recurrence relations

Our idea

TO, PRB 82, 075131 (2010)

(nested dissection) George, SIAM J. Numer. Anal. 10, 345 (1973).

Nested dissection of a sparse matrix

(i) :

(iv) :

(ii) :

(iii) :

|N0-N1| + Ns

(i)-(iv) |N0-N1| + Nsdissection

N0: 0 N1: 1 Ns:

(v) Dissection:

(i)-(v)nested dissection

Square lattice for the nested dissection

Schur #1

A matrix X can be factorized using a Schur complement into a LDLT form.

Then, the inverse of X is given by

Schur #2

L

L

L

L

Schur #3

LDLT

1D O(N(log2N)2) 2D O(N2) 3D O(N7/3)

O(N7/3)

PRB 82, 075131 (2010)

loop of poles for (1) contour int. { MPI parallelization

(2) Inverse calculation by the recurrence relations.

OpenMP parallelization }

(3) Calculate the total number of electrons (4) Correct by the Muller method ()

loop for finding m {

}

1D 2D 3D

For the inverse calculation of model TB hamiltonians

Conventional -4130.938589871644 New method -4130.938589871899

Total energy (Hartree)

DNA, 650 atoms 700K, 80 poles SZ basis set

1 SCF on Cray-XT5

M. Toyoda and T. Ozaki, PRA 83, 032515 (2011). T. Ozaki and M. Toyoda, Comp. Phys. Comm. 182, 1245-1252 (2011).

O(N)

O(N)

: OEP : : -1/r (): O(N)

(1-RDM)1-RDM

0: 1:

a, b

SCF

SCF

5.3 2.6 1.8

OEP

NKS

N

N

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

cmsi計算科学技術特論b(9) オーダーn法2

Technology