interactive drug design using gpgpus | gtc 2013 · interactive drug design using gpgpus | gtc 2013...

S3333 - Interactive Drug Design Using GPGPUs

Thanasis Anthopoulos *¹*², Dr Andrea Brancale*¹, Dr Ian Grimstead*²¹ Cardiff School of Pharmacy and Pharmaceutical Sciences;

² School of Computer Science, Cardiff University, Cardiff UK;

Project BackgroundComputer Aided / Interactive Drug Design

Force feedback (1000Hz)

Docking Scores

Haptic - controlledLigand

Macromolecule

Visual feedback (33Hz)

Project Background

At Present: Rigid Macromolecule

Aim → Protein flexibility

– Fast rendering rates

– Frequent force feedback

– Accurate docking scores

Options

✔ CPU ?

✔ Cluster ?

✔ GPGPU !!!

Haptic - controlledLigand

Macromolecule

Project Background

∑ EBonded=∑ EBij∑ EA ijk∑ EBAijk∑ EOOPijkl∑ ET ijkl

∑ ENon−Bonded=∑ EVdW ij∑ EQij

Forcefield used → MMFF94s

Project Background

∑ EvdW ij=ε ij 1.07Rij

R ij0.07R ij

7

1.12Rij7

R ij70.12R ij

7−2

Rij=0.5 R iR j1B1−exp −12γij2

γ ij=Ri−R j

RiR j

ε ij=181.16G[ i]G j a ia j

a i /N i1/2a j /N j

1/2Rij6

∑ EQij=f∗332.0716qi q j

D R ijδ

All atoms interaction

Running time complexity O( ), N → number of atomsN2

The MMFF94s Forcefield Attributes

Project BackgroundConformations UpdateOptimal conformation → lowest free energy

Conjugate gradientsSteepest descents

→ Steepest descents suitable cause of small and frequent step (haptic feedback)

→ Force gradients for each atom

→ Move atoms along gradients direction - fixed step

→ Force gradients and energy potentials → most computationally expensive

Initial Attempts

● Cell-lists for short range non-bonded

•Reduces complexity to O(CN)•Adequate scaling•Good approximation for rc→ [9-12.5 A]•Able to simulate small systems well(<4000 Ats)

•Load imbalance•Void computations•No Newton's 3d law for force calcs

Known Issues

Initial thoughts - Ideal scenario

● New design → Kepler arch

•256 thread-blocks → 8 warp-groups (WG)

•Good occupancy rates !!

•Cell-pair process → 8x8 warp-group

pair process

• Fewer distance checks

• Warp Intrinsics → Newton's 3d Law

Archetype

Final design● Load balanced irregular grid

•Cell base // control warp group shape

•Atom migration in xy stacks → last cell of the stack load balanced

•Half shell traversal approach

•64 bit bitmap warp-group pair processing in constant memory

•Faster → able to simulate 12500 atom systems

rb=r cδ

Pseudocode

● Warp-Group pair processing

Pseudocode

● Forces calculation kernel

Exclusions Handling

2 bits 2bits 7 bits 7 bits 7 bits 7 bits

Donor/acceptor

Bond order Index differenceBond 1

Index differenceBond 2


•If Hydrogen Bond → 28 bits used for index difference

•Covers 1-2 & 1-3 Exclusions

•1-4 scaled to 75% by subtracting 25% of the total force during bonded

force calculations

32 bit Integer


Performance Benchmarks

● Regular vs Irregular grid decomp

•Exclusions hardcoded / optimizations switched off for fair comparison

•Newtons third law with __shfl → 2 orders of magnitude gain

•Load balancing gives up to 100% for >5000 atoms


● WG pair elimination

0 10 20 30 40 50 60 70

0

2

4

6

8

10

12

14

16

Short range forces benchmark

rb = 13.75, rc = 10.75

No BitMapfull

Atoms (x1000)

Pe

rfo

rma

nce

(m

s)

•Force calculation using our approach

•WG pair bitmap → Up to 25% speed-up

•Good linear scaling

Performance Benchmarks● Performance impact of WG Aspect ratio

•WG aspect ratio has an impact on performance

•Peak performance close to 13.75 Å

•Beneficial compared to cut-off sized base lengths

especially for small cut-offs


● 1000 energy minimization steps

0 10 20 30 40 50 60 70

0

1

2

3

4

5

6

7

8

9

Performance for 1000 steps at rc = 10.25 Å

Molecule size x 1000 Ats

Pe

rfo

rma

nce

(s)

•Recall our 1000 Hz feedback rate aim

•Can simulate up to 12000 Atom proteins on GTX 680

Line-search can take 1-n steps to find a minimum..!

kEnergy

kEnergy

kEnergy

GTK110

GTK104

GTK110

kEnergy

kEnergy

kEnergy

t1 t2

...

...

● Steepest Descent Benchmarks– Accelerated line search using streams(hyper-Q)

Serial version

float step, eOld, eNew;

calculate eOldcalulate F->

While finite step linesearch(step) calculate eNew if (eNew<eOld) //accept step update coords(host) calulate F-> step*FACTOR_ACCEPT eOld = eNew else //reject step reset coords(device) step*FACTOR_REJECT

Stream-parallel version

float *steps, *eNewDev;float eOld, stepuint nSteps, idx

calculate eOldcalulate F->

While finite step for i = 0 -> nStreams linesearch(steps, i)//stream i for i = 0 -> nStreams calculate eNew(i)//stream I idx = findFirstLowerEnergyStep (eNew) if (lowerExists) //accept step update coords(host, idx) calulate F-> step = steps(idx) else //reject step reset coords(device) step*FACTOR_REJECT

3-6 concurrent streams -> 35% performance speed-up

Limiting factors:-> number of blocks (60 blocks max at a time)-> shared memory and register usage

Steepest Descent Benchmarks

1 2 3 4 5 6 7 80

20

40

60

80

100

120

140

2O0O.pdb -> 7165Ats

12345

Concurrent streams

Perf

orm

ance (

ms)

3-8 concurrent streams -> 1x performance speed-up

-> Smaller molecule size -> Less blocks per grid-> enhanced kernel concurrency

Steepest Descent Benchmarks

1 2 3 4 5 6 7 80

20

40

60

80

100

120

2YDB.pdb -> 3620 Ats

123

Concurrent streams

Perf

orm

an

ce (

ms)

Proof of Concept

● Work in progress :– Evaluating Docking results between flexible

and rigid protein HPLD (Haptic protein ligand docking)

– Cyclin-dependent kinase 2 (CDK2) and HIV RT targets are being used for our experiments

● GPGPU accelerated HPLD

● Kepler warp intrinsics and hyper-Q enabled us to design faster code

● Future work → simulating bigger systems always a challenge

● Long range NB forces → PME support (stream processing)

● A new approach, so more ideas for improvements will come from testing

Thank you for your attention.

Questions?

Thanasis AnthopoulosIntroducing Protein Flexibility in Interactive Drug Design

interactive drug design using gpgpus | gtc 2013 · interactive drug design using gpgpus | gtc 2013...

Documents