interactive drug design using gpgpus | gtc 2013 · interactive drug design using gpgpus | gtc 2013...
TRANSCRIPT
S3333 - Interactive Drug Design Using GPGPUs
Thanasis Anthopoulos *¹*², Dr Andrea Brancale*¹, Dr Ian Grimstead*²¹ Cardiff School of Pharmacy and Pharmaceutical Sciences;
² School of Computer Science, Cardiff University, Cardiff UK;
Project BackgroundComputer Aided / Interactive Drug Design
Force feedback (1000Hz)
Docking Scores
Haptic - controlledLigand
Macromolecule
Visual feedback (33Hz)
Project Background
At Present: Rigid Macromolecule
Aim → Protein flexibility
– Fast rendering rates
– Frequent force feedback
– Accurate docking scores
Options
✔ CPU ?
✔ Cluster ?
✔ GPGPU !!!
Haptic - controlledLigand
Macromolecule
Project Background
∑ EBonded=∑ EBij∑ EA ijk∑ EBAijk∑ EOOPijkl∑ ET ijkl
∑ ENon−Bonded=∑ EVdW ij∑ EQij
Forcefield used → MMFF94s
Project Background
∑ EvdW ij=ε ij 1.07Rij
R ij0.07R ij
7
1.12Rij7
R ij70.12R ij
7−2
Rij=0.5 R iR j1B1−exp −12γij2
γ ij=Ri−R j
RiR j
ε ij=181.16G[ i]G j a ia j
a i /N i1/2a j /N j
1/2Rij6
∑ EQij=f∗332.0716qi q j
D R ijδ
All atoms interaction
Running time complexity O( ), N → number of atomsN2
The MMFF94s Forcefield Attributes
Project BackgroundConformations UpdateOptimal conformation → lowest free energy
Conjugate gradientsSteepest descents
→ Steepest descents suitable cause of small and frequent step (haptic feedback)
→ Force gradients for each atom
→ Move atoms along gradients direction - fixed step
→ Force gradients and energy potentials → most computationally expensive
Initial Attempts
● Cell-lists for short range non-bonded
•Reduces complexity to O(CN)•Adequate scaling•Good approximation for rc→ [9-12.5 A]•Able to simulate small systems well(<4000 Ats)
•Load imbalance•Void computations•No Newton's 3d law for force calcs
Known Issues
Initial thoughts - Ideal scenario
● New design → Kepler arch
•256 thread-blocks → 8 warp-groups (WG)
•Good occupancy rates !!
•Cell-pair process → 8x8 warp-group
pair process
• Fewer distance checks
• Warp Intrinsics → Newton's 3d Law
Archetype
Final design● Load balanced irregular grid
•Cell base // control warp group shape
•Atom migration in xy stacks → last cell of the stack load balanced
•Half shell traversal approach
•64 bit bitmap warp-group pair processing in constant memory
•Faster → able to simulate 12500 atom systems
rb=r cδ
Pseudocode
● Warp-Group pair processing
Pseudocode
● Forces calculation kernel
Exclusions Handling
2 bits 2bits 7 bits 7 bits 7 bits 7 bits
Donor/acceptor
Bond order Index differenceBond 1
Index differenceBond 2
Index differenceBond 4
•If Hydrogen Bond → 28 bits used for index difference
•Covers 1-2 & 1-3 Exclusions
•1-4 scaled to 75% by subtracting 25% of the total force during bonded
force calculations
32 bit Integer
Index differenceBond 3
Performance Benchmarks
● Regular vs Irregular grid decomp
•Exclusions hardcoded / optimizations switched off for fair comparison
•Newtons third law with __shfl → 2 orders of magnitude gain
•Load balancing gives up to 100% for >5000 atoms
Performance Benchmarks
● WG pair elimination
0 10 20 30 40 50 60 70
0
2
4
6
8
10
12
14
16
Short range forces benchmark
rb = 13.75, rc = 10.75
No BitMapfull
Atoms (x1000)
Pe
rfo
rma
nce
(m
s)
•Force calculation using our approach
•WG pair bitmap → Up to 25% speed-up
•Good linear scaling
Performance Benchmarks● Performance impact of WG Aspect ratio
•WG aspect ratio has an impact on performance
•Peak performance close to 13.75 Å
•Beneficial compared to cut-off sized base lengths
especially for small cut-offs
Performance Benchmarks
● 1000 energy minimization steps
0 10 20 30 40 50 60 70
0
1
2
3
4
5
6
7
8
9
Performance for 1000 steps at rc = 10.25 Å
Molecule size x 1000 Ats
Pe
rfo
rma
nce
(s)
•Recall our 1000 Hz feedback rate aim
•Can simulate up to 12000 Atom proteins on GTX 680
Line-search can take 1-n steps to find a minimum..!
kEnergy
kEnergy
kEnergy
GTK110
GTK104
GTK110
kEnergy
kEnergy
kEnergy
t1 t2
...
...
● Steepest Descent Benchmarks– Accelerated line search using streams(hyper-Q)
Serial version
float step, eOld, eNew;
calculate eOldcalulate F->
While finite step linesearch(step) calculate eNew if (eNew<eOld) //accept step update coords(host) calulate F-> step*FACTOR_ACCEPT eOld = eNew else //reject step reset coords(device) step*FACTOR_REJECT
Stream-parallel version
float *steps, *eNewDev;float eOld, stepuint nSteps, idx
calculate eOldcalulate F->
While finite step for i = 0 -> nStreams linesearch(steps, i)//stream i for i = 0 -> nStreams calculate eNew(i)//stream I idx = findFirstLowerEnergyStep (eNew) if (lowerExists) //accept step update coords(host, idx) calulate F-> step = steps(idx) else //reject step reset coords(device) step*FACTOR_REJECT
3-6 concurrent streams -> 35% performance speed-up
Limiting factors:-> number of blocks (60 blocks max at a time)-> shared memory and register usage
Steepest Descent Benchmarks
1 2 3 4 5 6 7 80
20
40
60
80
100
120
140
2O0O.pdb -> 7165Ats
12345
Concurrent streams
Perf
orm
ance (
ms)
3-8 concurrent streams -> 1x performance speed-up
-> Smaller molecule size -> Less blocks per grid-> enhanced kernel concurrency
Steepest Descent Benchmarks
1 2 3 4 5 6 7 80
20
40
60
80
100
120
2YDB.pdb -> 3620 Ats
123
Concurrent streams
Perf
orm
an
ce (
ms)
Proof of Concept
● Work in progress :– Evaluating Docking results between flexible
and rigid protein HPLD (Haptic protein ligand docking)
– Cyclin-dependent kinase 2 (CDK2) and HIV RT targets are being used for our experiments
● GPGPU accelerated HPLD
● Kepler warp intrinsics and hyper-Q enabled us to design faster code
● Future work → simulating bigger systems always a challenge
● Long range NB forces → PME support (stream processing)
● A new approach, so more ideas for improvements will come from testing
Thank you for your attention.
Questions?
Thanasis AnthopoulosIntroducing Protein Flexibility in Interactive Drug Design