cmsi計算科学技術特論b(9) オーダーn法2

72
低次スケーリング法の重要性 計算のオーダーとは? DFT計算におけるオーダーNオーダーN Krylov部分空間法 第一原理計算への拡張 CMSI 計算科学技術特論B 9回講義資料 2014612オーダーN2 尾崎 泰助 東京大学 物性研究所 8局在基底法 オーダーN法の超並列化の方法 オーダーN法の応用 数値厳密な低次スケーリング法 O(N)準厳密交換汎関数 92014652014612

Upload: computational-materials-science-initiative

Post on 08-Jul-2015

1.797 views

Category:

Technology


7 download

TRANSCRIPT

  • DFTN N Krylov

    CMSI B 9 2014612

    N2

    8

    N N O(N)

    9

    2014 65

    2014 612

  • 8NKohn-Sham

    Wavelet (Sinc)

    Wavelet

  • KS

    :

    O(N)

    O(N)

    LCPAO (Linear-Combination of Pseudo Atomic Orbital Method)

  • 1. KS

    2.

    3. l step2

    s-orbital of oxygen

    PRB 67, 155108 (2003) PRB 69, 195113 (2004)

  • (LDA)DNP DNP=+

  • LCPAO

    Ozaki, PRB 67, 155108 (2003)

  • 1KS

    ac

    cacKS

    Ozaki, PRB 67, 155108 (2003)

    a:

  • caRccRa

    1 aKS 2 Ea a

    12GDIIS

  • Primitive vs. Optimized

    p p

  • Wish

    Wien2k0.005

    Wien2k3GPa

    Wien2k

    0.5 kcal/mol

  • LCAO (1)

    NLCAO: LCAO NPW:

    NLCAO

  • 1. 2. 3. 4.

    LCAO (2)

  • LCAO:

    LCAO

    Gaussian basis sets for accurate calculations on molecular systems in gas and condensed phases, J. VandeVondele and J. Hutter, JCP 127, 114105 (2007). CP2K code

    Ab initio molecular simulations with numeric atom-centered orbitals, V. Blum et al., Comp. Phys. Comm. 180, 2175 (2009). FHI-AIMS code

    LCAO

  • 1.

    2.

    3.

  • 2

    2

  • : P

    BCC

  • Full space Mn 3s, 3p, 3d, 4s 3s3p3s3p 3s (3p)4s(4p)

  • (1) s5>3 5s 3s

    (2) AB

    (3) B - C

  • -factor

    FLAPW+LOWien2k-(V06%V0, B0, B1Birch-Murnaghan)

    Lejaeghere et al., Critical Reviews in Solid State and Materials Sciences 39, 1-24 (2014).

  • GGA-PBE: -factor

    GGA-PBE58-factor23.5meV/atom1/102meV/atom

  • -factor

    http://molmod.ugent.be/deltacodesdft

  • N

    3

    FFT References: A three-dimensional domain decomposition method for large-scale DFT electronic structure calculations T.V.T. Duy and T. Ozaki, Comput. Phys. Commun. 185, 777-789 (2014). A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs T.V.T. Duy and T. Ozaki, Comput. Phys. Commun. 185, 153-164 (2014).

    ,

    4

  • O(N) MPI

  • 3

    :

  • (1) MPI

    (2)

  • MPI19

    2

  • MPI 1

  • 16384 atoms, 19 processes

    :

    CNT, 16 processes

  • Poisson1 A,B,C,D

  • 3D-FFT2D

    Row-wise

    Pencil

    Duy and Ozaki, CPC 185, 153 (2014).

  • 3D-FFT2D-(row-wise)

    1D- N N22

  • Compared to 1D-parallelization, no increase of MPI communication up to N. Even at N2, just double communication.

  • 1283 data points

    OpenFFT

    FFT

    OpenFFTGNU-GPLwebdownloadhttp://www.openmx-square.org/openfft/

  • O(N)

    13107268

    : 131072

  • N

    bcc-

    Li

    : T. Ohwaki, M. Otani, T. Ikeshoji, and T. Ozaki, J. Chem. Phys. 136, 134101 (2012).

    H. Sawada, S. Taniguchi, K. Kawakami, and T. Ozaki, Modelling Simul. Mater. Sci. Eng. 21, 045012 (2013).

  • bcc-

    : TiC, VC, NbC

    By

    HRTEM image

  • 0.0

    1.0

    2.0

    3.0

    4.0

    5.0

    24.0 25.0 26.0 27.0 28.0 29.0

    inte

    rfac

    e en

    ergy

    (J/m

    2)

    c0

    Fe7/(NbC)7

    Fe-Nb

    Fe-C

    semi-coherent

    Fe-C Fe-Nb FeNbCFeNb

    12

    34

    56

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    c0

    12

    34

    56

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

  • 0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    -8 -6 -4 -2 0 2 4

    DOS

    (sta

    es/e

    V/at

    om)

    Energy (eV)

    Fe4

    Fe7

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    -8 -6 -4 -2 0 2 4

    DOS

    (sta

    es/e

    V/at

    om)

    Energy (eV)

    Fe4

    Fe7

    1 2

    3 4

    5 6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    1 2

    3 4

    5 6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

    -8 -6 -4 -2 0 2 4

    DOS

    (sta

    es/e

    V/at

    om)

    Energy (eV)

    Nb8

    Nb11

    0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

    -8 -6 -4 -2 0 2 4

    DOS

    (sta

    es/e

    V/at

    om)

    Energy (eV)

    c15

    c18

    0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

    -8 -6 -4 -2 0 2 4

    DOS

    (sta

    es/e

    V/at

    om)

    Energy (eV)

    nb15

    nb18

    0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

    -8 -6 -4 -2 0 2 4

    DOS

    (sta

    es/e

    V/at

    om)

    Energy (eV)

    c8

    c11

    Fe Fe

    Nb Nb

    C C

    Nb C

    Fe

  • : TSUBAME

    H. Sawada et al., Modelling Simul. Mater. Sci. Eng. 21, 045012 (2013).

    2.2nm

  • : T. Ohwaki et al., J. Chem. Phys. 136, 134101(2012).

    Li By

    (~400) H-Si(111) (PC) + Li+

    (ESM)

    O(N)Krylov

    Velocity scaling(600K)

    Li+

    Li+

  • T. Ozaki, PRB 75, 035123 (2007). T. Ozaki, PRB 82, 075131 (2010).

  • Density functionals as a functional of

    Density functionals can be rewritten by the first order reduced density matrix:

    where the electron density is given by

    1. Non-zero matrix elements of H and S 2. Density matrix corresponding to the non-zero matrix elements

    All we need to calculate

  • Main difficulty: diagonalization

    O(N3) method - Numerically exact diagonalization Householder+QR method Conjugate gradient (CG) method Davidson method

    Even if basis functions are localized in real space, Gram-Shmidt (GS) type method is needed to satisfy orthonormality among eigenstates, which results in O(N3) for the computational time.

    O(N) method - can be achieved in exchange for accuracy. O(N) Krylov subspace method, DC, DM, OM methods, etc..

    O(N2~) method Is it possible to develop O(N2~) methods without introducing approximations? No more GS process.

  • :

    1. Green

    2.

    3.

    LNV, PRB 47, 10891 (1993)

  • 1.

    PBR 75, 035123 (2007). PRB 82, 075131 (2010).

    2. GreenLDLT

    : : : & : 1D, 2D, 3D : : (PAO, FEM, FD)

  • Fermi PBR 75, 035123 (2007).

    Hu et al., JCP 133, 101106 (2010) Karrasch et al., PRB 82, 125114 (2010). Lin Lin et al., Chinese Annals of Mathematics (CAM), Ser. B.

  • 2()

  • Convergence of w.r.t. poles

    The calculation of can be expressed by a contour integration:

    The analysis shows that the number of poles for each eigenstate for a sufficient convergence does not depend on the size of system if the spectrum radius does not change. The scaling property is governed by the calculation of G.

    Lehmann rep.

  • Convergence property of the contour integration

    Total energy of aluminum as a function of the number of poles by a recursion method at 600 K.

    The energy completely converges using only 80 poles within double precision.

    Nicholson et al., PRB 50, 14686 1994

  • How can Greens funtion be evaluated ?

    The Greens function is the inverse of a sparse matrix (ZS-H).

    Selected elements of G(Z), which correspond to non-zero elements of the overlap matrix S, are needed to calculate physical properties.

    1. Nested dissection of (ZS-H) 2. LDLT decomposition for the structured matrix

    a set of recurrence relations

    Our idea

    TO, PRB 82, 075131 (2010)

  • (nested dissection) George, SIAM J. Numer. Anal. 10, 345 (1973).

  • Nested dissection of a sparse matrix

    (i) :

    (iv) :

    (ii) :

    (iii) :

    |N0-N1| + Ns

    (i)-(iv) |N0-N1| + Nsdissection

    N0: 0 N1: 1 Ns:

    (v) Dissection:

    (i)-(v)nested dissection

  • Square lattice for the nested dissection

  • Schur #1

    A matrix X can be factorized using a Schur complement into a LDLT form.

    Then, the inverse of X is given by

  • Schur #2

    L

    L

  • L

    L

    Schur #3

  • LDLT

    1D O(N(log2N)2) 2D O(N2) 3D O(N7/3)

    O(N7/3)

    PRB 82, 075131 (2010)

  • loop of poles for (1) contour int. { MPI parallelization

    (2) Inverse calculation by the recurrence relations.

    OpenMP parallelization }

    (3) Calculate the total number of electrons (4) Correct by the Muller method ()

    loop for finding m {

    }

  • 1D 2D 3D

    For the inverse calculation of model TB hamiltonians

  • Conventional -4130.938589871644 New method -4130.938589871899

    Total energy (Hartree)

    DNA, 650 atoms 700K, 80 poles SZ basis set

  • 1 SCF on Cray-XT5

  • M. Toyoda and T. Ozaki, PRA 83, 032515 (2011). T. Ozaki and M. Toyoda, Comp. Phys. Comm. 182, 1245-1252 (2011).

    O(N)

  • O(N)

    : OEP : : -1/r (): O(N)

    (1-RDM)1-RDM

  • 0: 1:

    a, b

  • SCF

    SCF

    5.3 2.6 1.8

    OEP

  • He

  • NKS

    N

    N

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72