an fpga implementation of the ewald direct space and lennard-jones compute engines
DESCRIPTION
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines. By: David Chui Supervisor: Professor P. Chow. Overview. Introduction and Motivation Background and Previous Work Hardware Compute Engines Results and Performance Conclusions and Future Work. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/1.jpg)
An FPGA Implementation of theEwald Direct Space and Lennard-Jones
Compute Engines
By: David Chui
Supervisor: Professor P. Chow
![Page 2: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/2.jpg)
Overview
Introduction and Motivation Background and Previous Work Hardware Compute Engines Results and Performance Conclusions and Future Work
![Page 3: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/3.jpg)
1. Introduction and Motivation
![Page 4: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/4.jpg)
What is Molecular Dynamics (MD) simulation?
Biomolecular simulations Structure and behavior of biological systems Uses classical mechanics to model a molecular system Newtonian equations of motion (F = ma) Compute forces and integrate acceleration through time
to move atoms A large scale MD system takes years to simulate
![Page 5: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/5.jpg)
Why is this an interesting computational problem?
Physical time for simulation 1e-4 sec
Time-step size 1e-15 sec
Number of time-steps 1e11
Number of atoms in a protein system 32,000
Number of interactions 1e9
Number of instructions/force calculation 1e3
Total number of machine instructions 1e23
Estimated simulation time on a petaflop/sec capacity machine
3 years
![Page 6: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/6.jpg)
Motivation
Special-purpose computers for MD simulation have become an interesting application
FPGA technology Reconfigurable Low cost for system prototype Short turn around time and development cycle Latest technology Design portability
![Page 7: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/7.jpg)
Objectives
Implement the compute engines on FPGA Calculate the non-bonded interactions in an MD
simulation (Lennard-Jones and Ewald Direct Space) Explore the hardware resources Study the trade-off between hardware resources and
computational precision Analyze the hardware pipeline performance Become the components of a larger project in the future
![Page 8: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/8.jpg)
2. Background and Previous Work
![Page 9: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/9.jpg)
Lennard-Jones Potential
Attraction due to instantaneous dipole of molecules Pair-wise non-bonded interactions O(N2) Short range force Use cut-off radius to reduce computations Reduced complexity close to O(N)
![Page 10: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/10.jpg)
Lennard-Jones Potential of Argon gas
-150
-100
-50
0
50
100
150
200
250
300
0.3 0.5 0.7 0.9 1.1 1.3 1.5
r (nm)
v(r)
/kb
(K
)
612
4rr
U LJ
![Page 11: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/11.jpg)
Electrostatic Potential
Attraction and repulsion due to electrostatic charge of particles (long range force)
Reformulate using Ewald Summation Decompose to Direct Space and Reciprocal Space Direct Space computation similar to Lennard-Jones Direct Space complexity close to O(N)
![Page 12: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/12.jpg)
Ewald Summation - Direct Space
nij
nijji
N
ijn
r
r
rerfcqqU
,
,' )(
2
1
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7
x
erfc
(x)
![Page 13: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/13.jpg)
Previous Hardware Developments
Project Technology Year
MD-GRAPE 0.6um 1996
MD-Engine 0.8um 1997
BlueGene/L 0.13um 2003
MD-GRAPE3 0.13um 2004
![Page 14: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/14.jpg)
Recent work - FPGA based MD simulator
Transmogrifier-3 FPGA system University of Toronto (2003)
Estimated speedup of over 20 times over software with better hardware resources
Fixed-point arithmetic, function table lookup, and interpolation
Xilinx Virtex-II Pro XC2VP70 FPGA Boston University (2005)
Achieved a speedup of over 88 times over software Fixed-point arithmetic, function table lookup, and interpolation
![Page 15: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/15.jpg)
MD Simulation software - NAMD
Parallel runtime system (Charm++/Converse) Highly scalable Largest system simulated has over 300,000 atoms on
1000 processors Spatial decomposition Double precision floating point
![Page 16: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/16.jpg)
NAMD - Spatial Decomposition
Cutoff Radius
Cutoff Radius
Cell
Simulation Box
Cutoff Radius
![Page 17: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/17.jpg)
3. Hardware Compute Engines
![Page 18: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/18.jpg)
Purpose and Design Approach
Implement the functionality of the software compute object
Calculate the non-bonded interactions given the particle information
Fixed-point arithmetic, function table lookup, and interpolation
Pipelined architecture
![Page 19: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/19.jpg)
Compute Engine Block Diagram
ix: {7.25}
|Δr|²
F(x, y, z)
Function: |Δr|² =|Δx|² + |Δy|² + |Δz|²
i(x, y, z)
j(x, y, z)
ZBTMemory Lookup/
Linear Interpolation
constantMultiplication/
Addition E
![Page 20: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/20.jpg)
Function Lookup Table
The function to be looked up is a function of |r|2 (the separation distance between a pair of atoms)
Block floating point lookup Partition function based on different precision
![Page 21: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/21.jpg)
Function Lookup Table
Value and Slope
Partition
Value
Slope
ZBT Memory Bankr
![Page 22: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/22.jpg)
Hardware Testing Configuration
NAMDmain( )
Compute ObjectEwald( )
Compute ObjectLennard_Jones( )
Communication Bus
EwaldHardware Engine
Lennard-JonesHardware Engine
![Page 23: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/23.jpg)
4. Results and Performance
![Page 24: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/24.jpg)
Simulation Overview
Software model Different coordinate precisions and lookup table sizes Obtain the error compared to computation using double
precision
![Page 25: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/25.jpg)
Total Energy Fluctuation
Total Energy Fluctuation: Ewald Direct Space
-7
-6
-5
-4
-3
-2
-1
0
Various Precision
log
(Re
lati
ve
rm
s F
luc
tua
tio
n)
Time-step 1.0fs Time-step 0.1fs
FP16K4K1K10 1̂x10 2̂x10 3̂x10 4̂x10 5̂x
![Page 26: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/26.jpg)
Average Total Energy
Average Total Energy: Ewald Direct Space
268
270
272
274
276
278
280
282
Various Precision
|<E
>|
Time-step 1.0fs Time-step 0.1fs
FP16K4K1K10 1̂x10 2̂x10 3̂x10 4̂x10 5̂x
![Page 27: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/27.jpg)
Operating Frequency
Compute Engine Arithmetic Core
Lennard-Jones 43.6 MHz 80.0 MHz
Ewald Direct Space
47.5 MHz 82.2 MHz
![Page 28: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/28.jpg)
Latency and Throughput
Latency Throughput
Lennard-Jones 59 clocks 33.33%
Ewald Direct Space
44 clocks 100%
![Page 29: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/29.jpg)
Hardware Improvement
Operating frequency: Place-and-route constraints More pipeline stages
Throughput: More hardware resources Avoid sharing of multipliers
![Page 30: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/30.jpg)
Compared with previous work
Pipelined adders and multipliers Block floating point memory lookup Support different types of atoms
Lennard-Jones System
Latency
(clocks)
Operating Frequency (MHz)
Transmogrifier3 11 26.0
Xilinx Virtex-II 59 80.0
![Page 31: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/31.jpg)
5. Conclusions and Future Work
![Page 32: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/32.jpg)
Hardware Precision
A combination of fixed-point arithmetic, function table lookup, and interpolation can achieve high precision
Similar result in RMS energy fluctuation and average energy Coordinate precision of {7.41} Table lookup size of 1K
Block floating memory Data precision maximized Different types of functions
![Page 33: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/33.jpg)
Hardware Performance
Compute engines operating frequency: Ewald Direct Space 82.2 MHz Lennard-Jones 80.0 MHz
Achieving 100 MHz is feasible with newer FPGAs
![Page 34: An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines](https://reader035.vdocuments.site/reader035/viewer/2022062309/56814951550346895db6a142/html5/thumbnails/34.jpg)
Future Work
Study different types of MD systems Simulate computation error with different table lookup
sizes and interpolation orders Hardware usage: storing data in block RAMs instead of
external ZBT memory