accelerating biomolecular simulation using a scalable...

1
Accelerating Biomolecular Simulation using a Scalable Network of Reconfigurable Hardware Arun Patel 1 , Christopher Madill 2,3 , Manuel Saldaña 1 , Christopher Comis 1 , Dave Chui 1 , Sam Lee 1 , Régis Pomès 2,3 , Paul Chow 1 . 1 Department of Electrical and Computer Engineering, University of Toronto. 2 Department of Structural Biology and Biochemistry, The Hospital for Sick Children. 3 Department of Biochemistry, University of Toronto. Motivation • Computer simulations of biomolecules is increasingly playing an important role in medical research. • Understanding the balance of physical forces governing atomic-level interactions is a central challenge in modern Biochemistry. • Computer simulations have been successfully applied to study biophysical phenomena including: i) Cell membrane transport. ii) Molecular conformational equilibria. iii) Protein / Ligand docking. iv) Time-dependent molecular motion. • Current technological limitations restrict the size and length of simulations. • The primary objective of this study is to engineer a scalable Molecular Dynamics simulator which is capable of outperforming supercomputers and computing clusters. • This project is a collaboration between the Department of Electrical and Computer Engineering, and the Department of Biochemistry, University of Toronto and the Department of Structural Biology and Biochemistry, The Hospital for Sick Children. Force Calculations (F = - E) θ E Angle = -k θ (θ-θ 0 ) 2 E Bond = k b (l-l 0 ) 2 l E Torsion = A[1 + cos(nτ + ϕ)] τ E van der Waals = 4ε - [(−) (−) ] σ 12 σ 6 r r E Electrostatic = r q 1 q 2 r r δ + δ Δ Molecular Dynamics Architecture Coordinate Repository F F F F Reduce Forces Determine net force acting on every atom Integrate Forces Calculate acceleration, velocity and the new position as a function of the computed forces a = v = a r = r 0 + v dt F m - r 1 2 3 2 4 Output Coordinates Echo coordinates to a computer for storage and/or visualization δ + δ r Compute Force Compute Force Compute Force a = acceleration F = force m = mass r = position dt = timestep v = velocity Distribute atoms to compute engines Conventional vs. Proposed Implementations • Despite these enhancements, a protein folding reaction which occurs in 10 -5 seconds would require about 30 years of CPU time to simulate. • Consequently, MD simulations are often run on supercomputers or large clusters. 3000-node Molecular Dynamics Cluster at the Pittsburgh Supercomputing Center • NAMD, a state-of-the-art MD program, employs many techniques to improve simulation performance including: • Spatial decomposition to enable parallel force evaluations. • PM-Ewald algorithm to improve electrostatic calculation efficiency. • Nonbonded cutoff to reduce the number of pairwise calculations. CPU FPGA with multiple computing cores and transceivers Circuit board of networked FPGAs • Our proposed implementation is not another attempt at building a supercomputer. This is an interdisciplinary effort to design a machine that performs MD simulations roughly 10 3 x faster than the current supercomputer-based approach. • MD is a highly-parallel problem with a large computation-to-data transfer ratio. • Computation throughput is improved vastly by performing time-consuming computation kernels with hardware accelerators. • Using a network of interconnected FPGAs, we achieve greater parallelization and higher integration than typical supercomputers. • The fully-interconnected network topology, developed by profiling NAMD network activity, is ideal for transferring small data quickly. • A single rack of FPGA-based hardware can out- perform a supercomputer in MD simulation! Block A Block B FSL I/F FSL I/F Chip A Chip B TX RX CPU CPU r q 1 q 2 r δ + δ 4ε - [(−) (−) ] σ 12 σ 6 r r r r FPGA • Multi-Gigabit Transceivers allow gigabit-rate communication • Serial Protocol enables a fully- interconnected network topology • FPGAs combine computation and communication hardware on-chip • Reconfigurability = Flexibility • High integration density possible • Programming model allows mix of hardware and software elements • Soft-processors can be used to emulate hardware functionality • Ewald Electrostatic Force Engine • Computes electrostatic forces in hardware with N∙log(N) algorithm • Other engines use N 2 algorithm • Lennard-Jones Potential Engine • Computes van der Waals forces • Outperforms existing software implementations by up to 88x • Fast Simplex Links abstract on- chip hardware communication • Standard interface enables rapid integration of system modules Hardware Components Reduce Coords MGT Interface MGT Interface LJ Forces CPU Ethernet Interface Simulation FPGA Visualization FPGA MGT Link Output Terminal Optical Link MGT Links Output Terminal { { Sub - Clusters CPU CPU Hardware Prototypes • Goal of initial prototype is to develop programming model • Uses standardized Message- Passing Interface (MPI) soft- ware for communication • Accelerators are emulated using soft-processors • Control processor separated from computation units • Lennard-Jones Forces only • Second prototype divided into fully-interconnected sub-clusters of five FPGAs • Optical links used to connect sub-clusters together • Embedded soft-processors handle control flow and perform scheduling duties • Each FPGA “node” contains heterogeneous mixture of processors and accelerators First Generation Prototype Second Generation Prototype

Upload: others

Post on 20-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Accelerating Biomolecular Simulation using a Scalable Network of Reconfigurable HardwareArun Patel1, Christopher Madill2,3, Manuel Saldaña1, Christopher Comis1, Dave Chui1, Sam Lee1, Régis Pomès2,3, Paul Chow1.

    1 Department of Electrical and Computer Engineering, University of Toronto. 2 Department of Structural Biology and Biochemistry, The Hospital for Sick Children. 3 Department of Biochemistry, University of Toronto.

    Motivation

    • Computer simulations of biomolecules is increasingly playing an important role in medical research.

    • Understanding the balance of physical forces governing atomic-level interactions is a central challenge in modern Biochemistry.

    • Computer simulations have been successfully applied to study biophysical phenomena including: i) Cell membrane transport. ii) Molecular conformational equilibria. iii) Protein / Ligand docking. iv) Time-dependent molecular motion.

    • Current technological limitations restrict the size and length of simulations.

    • The primary objective of this study is to engineer a scalable Molecular Dynamics simulator which is capable of outperforming supercomputers and computing clusters.

    • This project is a collaboration between the Department of Electrical and Computer Engineering, and the Department of Biochemistry, University of Toronto and the Department of Structural Biology and Biochemistry, The Hospital for Sick Children.

    Force Calculations (F = - E)

    θ

    EAngle= -kθ(θ-θ0)2

    EBond= kb(l-l0)2

    l

    ETorsion= A[1 + cos(nτ + ϕ)]

    τ

    Evan der Waals= 4ε -[(−) (−) ]σ 12 σ 6

    r r

    EElectrostatic=−rq1q2

    r

    r δ+δ−

    Δ→

    Molecular Dynamics ArchitectureCoordinate Repository

    F

    F

    F

    FReduce Forces

    Determine net forceacting on every atom

    Integrate Forces

    Calculate acceleration, velocityand the new position as a function of the computed forces

    a =

    v = a

    r = r0 + v dt

    F

    m-

    r1

    2

    3

    2

    4

    Output Coordinates

    Echo coordinates to a computer for storage and/or visualization

    δ+δ−

    r

    Compute Force

    Compute Force

    Compute Force

    a = acceleration

    F = force

    m = mass

    r = position

    dt = timestep

    v = velocity

    →→

    Distribute atoms to compute engines

    Conventional vs. Proposed Implementations

    • Despite these enhancements, a protein folding reaction which occurs in 10-5 seconds would require about 30 years of CPU time to simulate.

    • Consequently, MD simulations are often run on supercomputers or large clusters.

    3000-node Molecular Dynamics Cluster at the Pittsburgh Supercomputing Center

    • NAMD, a state-of-the-art MD program, employs many techniques to improve simulation performance including:

    • Spatial decomposition to enable parallel force evaluations.• PM-Ewald algorithm to improve electrostatic calculation efficiency.• Nonbonded cutoff to reduce the number of pairwise calculations.

    CPU

    FPGA with multiplecomputing coresand transceivers

    Circuit board ofnetworked FPGAs

    • Our proposed implementation is not another attempt at building a supercomputer. This is an interdisciplinary effort to design a machine that performs MD simulations roughly 103x faster than the current supercomputer-based approach.

    • MD is a highly-parallel problem with a large computation-to-data transfer ratio.

    • Computation throughput is improved vastly by performing time-consuming computation kernels with hardware accelerators. • Using a network of interconnected FPGAs, we achieve greater parallelization and higher integration than typical supercomputers. • The fully-interconnected network topology, developed by profiling NAMD network activity, is ideal for transferring small data quickly.

    • A single rack of FPGA-based hardware can out- perform a supercomputer in MD simulation!

    Block A Block B

    FSL I/F

    FSL I/F

    Chip A Chip B

    TX RX

    CPU

    CPU

    rq1q2

    r δ+δ−

    4ε - [(−) (−) ]σ 12 σ 6r r

    rr

    FPGA

    • Multi-Gigabit Transceivers allow gigabit-rate communication• Serial Protocol enables a fully- interconnected network topology

    • FPGAs combine computation and communication hardware on-chip • Reconfigurability = Flexibility• High integration density possible

    • Programming model allows mix of hardware and software elements• Soft-processors can be used to emulate hardware functionality

    • Ewald Electrostatic Force Engine• Computes electrostatic forces in hardware with N∙log(N) algorithm• Other engines use N2 algorithm

    • Lennard-Jones Potential Engine• Computes van der Waals forces• Outperforms existing software implementations by up to 88x

    • Fast Simplex Links abstract on- chip hardware communication• Standard interface enables rapid integration of system modules

    Hardware Components

    Reduce

    Coords

    MG

    T In

    terface

    MG

    T In

    terface

    LJForces

    CPU

    Eth

    ernet In

    terface

    Simulation FPGAVisualization FPGA

    MGT Link

    Output Terminal

    Optical Link

    MGT Links

    OutputTerminal

    {{Sub - ClustersCPUCPU

    Hardware Prototypes

    • Goal of initial prototype is to develop programming model• Uses standardized Message- Passing Interface (MPI) soft - ware for communication• Accelerators are emulated using soft-processors• Control processor separated from computation units• Lennard-Jones Forces only

    • Second prototype divided into fully-interconnected sub-clusters of five FPGAs• Optical links used to connect sub-clusters together• Embedded soft-processors handle control flow and perform scheduling duties• Each FPGA “node” contains heterogeneous mixture of processors and accelerators

    First Generation Prototype

    Second Generation Prototype