![Page 1: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/1.jpg)
ACCELERATING SANJEEVINI: A DRUG DISCOVERY SOFTWARE SUITEAbhilash Jayaraj, IIT Delhi Bharatkumar Sharma, Nvidia
Shashank Shekhar, IIT Delhi Nagavijayalakshmi, Nvidia
![Page 2: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/2.jpg)
2
AGENDA
• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani
• Challenges
• Code documentation in process of being improved
• Code maintained by Non Computer Science
• Designed to suit distributed programming
• Constraints
• Code modification should be minimal Ease of Maintenance.
• The current cluster has mix of CPU and GPU. Should run on both Portable
• Learnings
What to expect and what not to
![Page 3: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/3.jpg)
3
COMPUTER AIDED DRUG DISCOVERYIntroduction
Target Discovery
Lead Generation
Lead Optimization
Preclinical Development
Phase I, II & III Clinical Trials
FDA Review & Approval
Drug to the Market
14 yrs $1.4 billion
2.5yrs
3.0yrs
1.0yrs
6.0yrs
1.5yrs
4%
15%
10%
68%
3%
![Page 4: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/4.jpg)
4
SANJEEVINI FOR COMPUTER AIDED DRUG DESIGN
Check Lipinski compliance
Generate rapid binding energy estimates by
RASPD protocol
Predict all
possible
binding sites
and store top
ten sites
Dock and Score
Optimize geometry /
Assign TPACM4/derive quantum
mechanical charges
Assign force field
parameters
Perform molecular dynamics simulations and post facto free energy component analyses (Optional)
Generate
canonical A/B
DNA or MD
averaged
structure of B
DNA
Self drawn
ligand
molecule
Protein-ligand Complex/ Protein/DNA sequenceNRDBSM/Million molecule
library/Natural products
database
Overview
![Page 5: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/5.jpg)
5
SANJEEVINIGPU acceleration
▪ OpenACCacceleration of ParDOCK module
▪ All atom energy based Monte Carlo docking for protein-ligand complexes
![Page 6: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/6.jpg)
6
PERFORMANCE OPTIMIZATION Strategy
Analyze
ParallelizeOptimize
![Page 7: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/7.jpg)
7
PERFORMANCE OPTIMIZATION Strategy
Analyze
ParallelizeOptimize
![Page 8: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/8.jpg)
8
SANJEEVINI: PARDOCK
Flat profile:
Hotspots
% time Cumulative
seconds
Self
seconds
Calls Self calls Total
s/calls
Name
69.78 557.90 557.90 1188000 0.00 0.00 PDB::EnergyCalculator()
12.92 661.19 103.29 8 12.91 20.26 PDB::clashCombination()
7.35 719.96 58.77 26051422500 0.00 0.00 getRadius1()
5.49 763.85 43.89 885075 0.00 0.00 PDB::energyAtom()
![Page 9: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/9.jpg)
9
PERFORMANCE OPTIMIZATION Strategy
Analyze
ParallelizeOptimize
![Page 10: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/10.jpg)
10
SANJEEVINI: PARDOCKCPU code: EnergyCalculator
double PDB::EnergyCalculator(float **&energyGrid, const vector <points> &vDrugGrid, points
coords[], const unsigned &totalDockAtoms, … ){
for( int atomcount = 0; atomcount < totalDockAtoms; atomcount++ ){
for( int counter = 0; counter < vDrugGrid.size(); counter++ ){
// compute ‘distance’ between coords[atomcount] and vDrugGrid[counter]
// minDis = minimum of ‘distance’, minCounter = counter corresponding to minDis
}
ene += EnergyGrid[minCounter][atomcount];
}
return ene; }
![Page 11: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/11.jpg)
11
OpenACCSimple | Powerful | Portable
Fueling the Next Wave of
Scientific Discoveries in HPC
University of IllinoisPowerGrid- MRI Reconstruction
70x Speed-Up
2 Days of Effort
http://www.cray.com/sites/default/files/resources/OpenACC_213462.12_OpenACC_Cosmo_CS_FNL.pdf
http://www.hpcwire.com/off-the-wire/first-round-of-2015-hackathons-gets-underway
http://on-demand.gputechconf.com/gtc/2015/presentation/S5297-Hisashi-Yashiro.pdf
http://www.openacc.org/content/experiences-porting-molecular-dynamics-code-gpus-cray-xk7
RIKEN JapanNICAM- Climate Modeling
7-8x Speed-Up
5% of Code Modified
main() {
<serial code>#pragma acc kernels//automatically runs on GPU
{ <parallel code>
}}
![Page 12: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/12.jpg)
12
OPENACC DIRECTIVES
Manage
Data
Movement
Initiate
Parallel
Execution
Optimize
Loop
Mappings
#pragma acc data copyin(x,y) copyout(z){...#pragma acc parallel {#pragma acc loop gang vector
for (i = 0; i < n; ++i) {z[i] = x[i] + y[i];...
}}...
}
Performance portable
Interoperable
Single source
Incremental
![Page 13: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/13.jpg)
13
SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (1)
double PDB::EnergyCalculator(float **&energyGrid, const vector <points> &vDrugGrid, points
coords[], const unsigned &totalDockAtoms, … ){
#pragma acc parallel loop reduction(+:ene) private(minDis,minCounter) present() copyin()
firstprivate()
for( int atomcount = 0; atomcount < totalDockAtoms; atomcount++ ){
#pragma acc loop reduction(min:minDis)
for( int counter = 0; counter < vDrugGrid.size(); counter++ ){
// compute ‘distance’ between coords[atomcount] and vDrugGrid[counter]
minDis = (minDis > distance) ? distance;
}
![Page 14: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/14.jpg)
14
SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (2)
#pragma acc loop reduction(min:minCounter)
for( int counter = 0; counter < vDrugGrid.size(); counter++ ){
// compute ‘distance’ between coords[atomcount] and vDrugGrid[counter]
if ( distance == minDis ){
minCounter = (minCounter > counter) ? counter; }
}
ene += EnergyGrid[minCounter][atomcount];
}
return ene; }
![Page 15: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/15.jpg)
15
SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (3)
const points *vDrugGridData = vDrugGrid.data();
// compute ‘distance’ between coords[atomcount] and vDrugGridData[counter]
▪ Use ‘raw data pointer’ to access vectors
![Page 16: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/16.jpg)
16
SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (4)
unsigned totDockAtoms = totalDockAtoms;
float **eneGrid = EnergyGrid;
#pragma acc parallel loop reduction(+:ene) …
copyin(coords[0:tot DockAtoms]) present(eneGrid)
ene += eneGrid[minCounter][atomcount];
▪ Use ‘raw data pointer’ to access vectors
▪ Avoid using C++ references in OpenACC pragmas
![Page 17: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/17.jpg)
17
SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (4)
unsigned totDockAtoms = totalDockAtoms;
float **eneGrid = EnergyGrid;
#pragma acc parallel loop reduction(+:ene) …
copyin(coords[0:tot DockAtoms]) present(eneGrid)
ene += eneGrid[minCounter][atomcount];
▪ Use ‘raw data pointer’ to access vectors
▪ Avoid using C++ references in OpenACC pragmas
PDB::EnergyCalculator(float **&, const
std::vector<points, std::allocator<points>> &,
const std::vector<points, std::allocator<points>>
&, points *, const unsigned int &, energy &, int):
22, Generating present(vDrugGridData[:])
Generating copyin(coords[:totalDockAtoms->])
Generating present(EnergyGrid[:][:][:])
Runtime memory access
violation
![Page 18: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/18.jpg)
18
OPENACC: 3 LEVELS OF PARALLELISM
• Vector threads work in
lockstep (SIMD/SIMT
parallelism)
• Workers compute a vector
• Gangs have 1 or more
workers and share resources
(such as cache, the
streaming multiprocessor,
etc.)
• Multiple gangs work
independently of each other
Workers
Gang
Workers
Gang
Vector
Vector
![Page 19: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/19.jpg)
19
SANJEEVINI: PARDOCKOpenACC compiler output: EnergyCalculator
PDB::EnergyCalculator(float **&, const std::vector<points, std::allocator<points>> &, const std::vector<points,
std::allocator<points>> &, points *, const unsigned int &, energy &, int):
22, Generating present(vDrugGridData[:],eneGrid[:][:])
Generating copyin(coords[:totDockAtoms])
22, Accelerator kernel generated
Generating Tesla code
22, Generating reduction(+:ene)
24, #pragma acc loop gang /* blockIdx.x */
31, #pragma acc loop vector(256) /* threadIdx.x */
Generating reduction(min:minDis)
45, #pragma acc loop vector(256) /* threadIdx.x */
Generating reduction(min:minIdx)
31, Loop is parallelizable
45, Loop is parallelizable
![Page 20: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/20.jpg)
20
MANAGE DATA HIGHER IN THE PROGRAM
Currently data is moved at the beginning and end of each function, in case the data is needed on the CPU
We know that the data is only needed on the CPU after convergence
We should inform the compiler when data movement is really needed to improved performance
![Page 21: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/21.jpg)
21
STRUCTURED DATA REGIONS
The data directive defines a region of code in which GPU arrays remain on the GPU and are shared among all kernels in that region.
#pragma acc data
{
#pragma acc parallel loop
...
#pragma acc parallel loop
...
}
Data Region
Arrays used within the
data region will remain
on the GPU until the
end of the data region.
![Page 22: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/22.jpg)
22
UNSTRUCTURED DATA DIRECTIVES
Used to define data regions when scoping doesn’t allow the use of normal data regions (e.g. the constructor/destructor of a class).
enter data Defines the start of an unstructured data lifetime
• clauses: copyin(list), create(list)
exit data Defines the end of an unstructured data lifetime
• clauses: copyout(list), delete(list), finalize
#pragma acc enter data copyin(a)
...
#pragma acc exit data delete(a)
![Page 23: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/23.jpg)
23
SANJEEVINI: PARDOCKOpenACC parallelization: EnergyAtom (3)
int **vProteinListData = new int
*[vProteinList.size()];
n = vProteinList.size();
#pragma acc enter data
create(vProteinListData[0:n][0:1])
for( int count = 0; count < n; count++ ){
int numPro = vProteinList[count].size();
vProteinListData[count] =
vProteinList[count].data();
#pragma acc enter data
copyin(vProteinListData[count:1][0:numPro])
}
▪ Use ‘raw data pointer’ to access vectors
▪ How will you access ‘vector of vector (jagged arrays)’ ?
Creation and copy of jagged arrays
![Page 24: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/24.jpg)
24
SANJEEVINI: PARDOCKOpenACC parallelization: EnergyAtom (4)
for( int count = 0; count < n; count++ ){
int numPro = vProteinList[count].size();
#pragma acc exit data
delete(vProteinListData[count:1][0:numPro])
vProteinListData[count] = NULL;
}
#pragma acc exit data
delete(vProteinListData[0:n][0:1])
▪ Use ‘raw data pointer’ to access vectors
▪ How will you access ‘vector of vector (jagged arrays)’ ?
Deletion of jagged arrays
![Page 25: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/25.jpg)
25
SANJEEVINI: PARDOCKOpenACC compiler output: EnergyAtom (1)
PDB::energyAtom(const std::vector<PDB, std::allocator<PDB>> &, PDB, points, const std::vector<Box,
std::allocator<Box>>&, const std::vector<int, std::allocator<int>>&, const std::vector<std::vector<int,
std::allocator<int>>, std::allocator<std::vector<int, std::allocator<int>>>>&, int **):
79, Generating enter data copyin(boxListData[:boxListNumElements],rec,coord)
85, Generating present(coord,boxListData[:],rec,vProteinListData[:][:],vProData[:])
Accelerator kernel generated
Generating Tesla code
85, Generating reduction(+:electro,vandw,ehyd)
87, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
129, Generating exit data delete(boxListData[:boxListNumElements],rec,coord)
![Page 26: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/26.jpg)
26
SANJEEVINI: PARDOCKOpenACC compiler output: EnergyAtom (2)
main:
266, Generating enter data copyin(vProData[:vProNumElements])
Generating enter data create(vProteinListData[:vProteinListNumElements][:1])
275, Generating enter data copyin(vProteinListData[proList][:numElements])
321, Generating exit data delete(vProteinListData[proList][:numElements])
322, Generating exit data
delete(vProteinListData[:vProteinListNumElements][:1],vProData[:vProNumElements])
![Page 27: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/27.jpg)
27
CUDA UNIFIED MEMORYSimplified Developer Effort
Without Unified Memory With Unified Memory
Unified MemorySystem Memory
GPU Memory
Sometimes referred to as
“managed memory.”
New “Pascal” GPUs handle Unified Memory in hardware.
![Page 28: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/28.jpg)
28
PERFORMANCE OPTIMIZATION Strategy
Analyze
ParallelizeOptimize
![Page 29: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/29.jpg)
30
SAJEEVINI: PARDOCKPerformance: CPU and GPU (1)
▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core
▪ 256 GB RAM▪ Tesla P100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61▪ MPS enabled for GPU access
CPU+GPU 5.8x/3.3x faster than CPU at 8 MPI procs, ROTATE=1000/100
16 MPI procs on a single GPU -> GPU is the
bottleneck!
![Page 30: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/30.jpg)
31
SAJEEVINI: PARDOCKPerformance: CPU and GPU (2)
▪ Average ‘time to predict’ over 160 datasets
▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core
▪ 256 GB RAM▪ Tesla P100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61▪ MPS enabled for GPU access
CPU+GPU 5.3x/3.2x faster than CPU at 8 MPI procs, ROTATE=1000/100
![Page 31: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/31.jpg)
32
TESLA V100The Fastest and Most Productive GPU for AI and HPC
Volta Architecture
Most Productive GPU
Tensor Core
125 Programmable
TFLOPS Deep Learning
Improved SIMT Model
New Algorithms
Volta MPS
Inference Utilization
Improved NVLink &
HBM2
Efficient Bandwidth
![Page 32: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/32.jpg)
33
MULTI PROCESS SERVICE (MPS) FOR MPI APPLICATIONS
![Page 33: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/33.jpg)
34
GPU ACCELERATION OF LEGACY MPI APPS
Typical legacy application
MPI parallel
Single or few threads per MPI rank (e.g. OpenMP)
Running with multiple MPI ranks per node
GPU acceleration in phases
Proof of concept prototype, …
Great speedup at kernel level
Application performance misses expectations
4/2/2018
![Page 34: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/34.jpg)
35
MULTI PROCESS SERVICE (MPS)For Legacy MPI Applications
4/2/2018
N=4N=2N=1 N=8
Multicore CPU only
With Hyper-Q/MPSAvailable on Tesla/Quadro with CC 3.5+
(e.g. K20, K40, K80, M40,…)
N=4N=2 N=8
GPU parallelizable partCPU parallel partSerial part
GPU-accelerated
N=1
![Page 35: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/35.jpg)
36
PROCESSES SHARING GPU WITHOUT MPSNo Overlap
4/2/2018
Process A Process B
Context A Context B
Process A Process B
GPU
![Page 36: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/36.jpg)
37
PROCESSES SHARING GPU WITHOUT MPSContext Switch Overhead
4/2/2018
Time-slided use of GPU
Context switch Context
Switch
![Page 37: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/37.jpg)
38
PROCESSES SHARING GPU WITH MPSMaximum Overlap
4/2/2018
Process A Process B
Context A Context B
GPU Kernels from
Process A
Kernels from
Process B
MPS Process
![Page 38: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/38.jpg)
39
PROCESSES SHARING GPU WITH MPSNo Context Switch Overhead
4/2/2018
![Page 39: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/39.jpg)
40
![Page 40: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/40.jpg)
41
![Page 41: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/41.jpg)
42
SAJEEVINI: PARDOCKPascal vs Volta
▪ Average ‘time to predict’ over 160 datasets, ROTATE=1000
▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core
▪ 256 GB RAM▪ Tesla P100/V100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61/9.0.176▪ MPS enabled for GPU access
Volta is 2.1x faster than Pascal
![Page 42: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/42.jpg)
43
SANJEEVINI: PARDOCKOpenACC parallelization
▪ Use ‘raw data pointer’ to access vectors
▪ Avoid using C++ references in OpenACC pragmas
▪ Standard classes called from an OpenACC region may result in compilation/linking errors. Use math.h instead of cmath ☺
▪ Unified memory has improved over time but sometimes there might be a need to explicitly use data clause to minimize data copies
▪ Volta works excellent with program needing functionality of MPS
![Page 43: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/43.jpg)
44
ONGOING WORK
![Page 44: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/44.jpg)
45
SAJEEVINI: PARDOCKPascal vs Volta
▪ Average ‘time to predict’ over 160 datasets, ROTATE=1000
▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core
▪ 256 GB RAM▪ Tesla P100/V100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61/9.0.176▪ MPS enabled for GPU access
Volta is 2.1x faster than Pascal due to hardware
accelerated MPS
![Page 45: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/45.jpg)
46
SAJEEVINI: PARDOCKMulti-GPU scalability (2)
▪ ‘1qbt’ dataset, ROTATE=1000, 8 MPI procs
▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core
▪ 256 GB RAM▪ Tesla P100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61▪ MPS enabled for GPU access
▪ Higher concurrency possible with more devices->lower GPU time
▪ Lesser latency with more devices/MPS servers->lower CPU time
![Page 46: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/46.jpg)
47
SAJEEVINI: PARDOCKMulti-GPU scalability (3)
▪ ‘5cna’ dataset, ROTATE=100, 8 MPI procs, Tesla P100 GPUs, MPS
![Page 47: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/47.jpg)
48
SAJEEVINI: PARDOCKPascal vs Volta (2)
▪ ‘1a4w’ dataset, ROTATE=100, 8 MPI procs, Tesla P100/V100 GPUs, MPS
![Page 48: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/48.jpg)
49
REFERENCES: PARDOCK
• Gupta, A., et al. "ParDOCK: An all atom energy based Monte Carlo docking protocol for protein-
ligand complexes." Protein and peptide letters 14.7 (2007): 632-646.
• Nishikawa, Joy L., et al. "Inhibiting fungal multidrug resistance by disrupting an activator–Mediator
interaction." Nature 530.7591 (2016): 485.
• Singh, Tanya, D. Biswas, and Bhyravabhotla Jayaram. "AADS-An automated active site
identification, docking, and scoring protocol for protein targets based on physicochemical
descriptors." Journal of chemical information and modeling 51.10 (2011): 2515-2527.
• Singh, Tanya, Olayiwola Adedotun Adekoya, and B. Jayaram. "Understanding the binding of
inhibitors of matrix metalloproteinases by molecular docking, quantum mechanical calculations,
molecular dynamics simulations, and a MMGBSA/MMBappl study." Molecular BioSystems 11.4
(2015): 1041-1051.
• Jayaram, Bhyravabhotla, et al. "Sanjeevini: a freely accessible web-server for target directed lead
molecule discovery." BMC bioinformatics. Vol. 13. No. 17. BioMed Central, 2012.
![Page 49: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges](https://reader034.vdocuments.site/reader034/viewer/2022042805/5f6004658bd1846298212924/html5/thumbnails/49.jpg)
50
SANJEEVINI: PARDOCKSteps involved