gpu based virtual screening techniques for faster drug...

45
GPU based virtual screening techniques for faster drug discovery Research Scholar: Jayaraj P. B (P100016CS) Guided by Dr. K. Muralikrishnan & Dr. G. Gopakumar Department of Computer Science & Engineering, NIT Calicut 12/15/2016 GPU based virtual screening techniques for faster drug discovery 1 1

Upload: others

Post on 25-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

GPU based virtual screening techniques for faster drug discovery

Research Scholar: Jayaraj P. B (P100016CS)

Guided byDr. K. Muralikrishnan &Dr. G. Gopakumar

Department of Computer Science & Engineering, NIT Calicut

12/15/2016GPU based virtual screening techniques for faster drug discovery

1

1

Page 2: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Outline

GPU based virtual screening techniques for faster drug discovery

Introduction to Drug Discovery Computer aided Drug Discovery Virtual screening in Drug Discovery

Literature Survey Limitation of Virtual Screening Need for Parallelism

GPU Computing Proposed GPU Parallel methods for Virtual Screening

Random Forest Self Organizing Map Maximum Common Subgraph

Results & Comparison Conclusion & Future Work References Publications

12/15/2016

2

2

Page 3: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Introduction to Drug Discovery

o Drug discovery is an inventive process of developing a new drug which will be therapeutically active to a disease causing target molecule

Target : is a key molecule that is specific to a disease condition.

Ligand : a small molecule that will bind tightly to its target. This

binding results in a change of conformation of the target.

o It deals with design of molecule that is complementary in charge and shape to the target to which it binds

GPU based virtual screening techniques for faster drug discovery 12/15/2016

3

3

Page 4: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Traditional Drug discovery pipeline[66]

GPU based virtual screening techniques for faster drug discovery 12/15/2016

4

4

Image Courtesy : www.cresset-group.co

o Requires lot of wet lab experiments with target and large no of molecules

o Large exploration space need large execution time and money

Page 5: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Computer aided Drug Discovery[65]

12/15/2016GPU based virtual screening techniques for faster drug discovery

5

o Breaks the bottleneck by using modern computational techniques

o Data mining, machine learning, artificial intelligence, graph matching.

o It can avoid costly wet-lab experimentations

Image Courtesy : www.cresset-group.co

Page 6: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Virtual Screening[1,3]

Virtual Screening(VS) is a computational technique used indrug discovery for evaluating large number of molecules toidentify lead molecules that can be optimized to give a drugcandidate.

Because of VS only, the drug discovery consider enormouschemical space of over 1060 conceivable compounds forscreening[1].

This will speed up the Drug discovery process whilereducing the need for expensive wet lab work.

GPU based virtual screening techniques for faster drug discovery

6

6 12/15/2016

Page 7: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Types of Virtual Screening

12/15/2016GPU based virtual screening techniques for faster drug discovery

7

Image Courtesy : www.cresset-group.co

7

Page 8: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

VS : related works

S Ekins et al. [1] shown that the in-silico methods available in Pharmacology can also be used for drug discovery purpose. Amanda Schierz [4] explained the shortcomings of virtual screening in her work.

o VS using Machine learning J.C. Gertrudes et al.[2], Patrick Walters et al.[8], Peter Ripphausen et al.[13], A.

Srinivas Reddy et al.[14] and Campbell McInnes et al.[15] have reviewed theapproaches used in virtual screening methods.

R. Burbidge et al. [11] proved that support vector machine is well suited to QSAR analysis of virtual screening.

Vladimir Svetnik et al.[12] , used random forest for Compound Classification and QSAR Modeling.

o VS using similarity searching Yiqun Cao et al. [52] in their work proposed and tested the performance of a new

backtracking algorithm for finding Maximum Common Subgraph between two given graphs.

Peter Willet [61] , John W. Raymond et al. [62, 70] and Paul J. Durand [71] reviewed and analyzed several fingerprint based and graph based similarity methods for clustering chemical structures

GPU based virtual screening techniques for faster drug discovery 12/15/2016

8

8

Page 9: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Issues in Virtual screening

GPU based virtual screening techniques for faster drug discovery

Since the spectrum of input for virtual screening is toolarge, serial computing will not be much helpful.

The task of training the model using millions ofmolecules can take too much of time.

Also Screening billions of molecules will be an awful taskwith serial computing.

Works at Open Source Drug Discovery Consortium of CSIRat IISc campus.

Need for parallelism.

9

9 12/15/2016

Page 10: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

GPU Computing[17]

GPU based virtual screening techniques for faster drug discovery

Application Code

GPU CPU

Use GPU to Parallelize

Compute-Intensive Functions

Rest of SequentialCPU Code

12/15/2016

10

Image Courtesy : Nvidia CudaZone

10

CUDA - parallelcomputing platform &programming modelinvented by NVIDIAin 2006[9,10].

Page 11: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Problem Statement

GPU based virtual screening techniques for faster drug discovery

To design and develop efficient GPU based Parallel virtual

screening algorithms for faster drug discovery.

12/15/2016

11

11

Page 12: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

GPU accelerated Virtual Screening

12/15/2016GPU based virtual screening techniques for faster drug discovery

12

12

Page 13: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Proposed Methods

Contributions are made in devising following data parallel methods for virtual screening.

Method 1: Random Forest

Method 2: Self Organizing

Method 3: Maximal Common Subgraph

GPU based virtual screening techniques for faster drug discovery 12/15/2016

13

13

Page 14: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Proposed Method-1

GPU Based Random Forest Classifier for Virtual Screening

Given information about the reaction of a set of molecules to a target, Predict the reaction of new molecules when they interact with the same target, as active or inactive?

12/15/2016

14

GPU based virtual screening techniques for faster drug discovery 14

Page 15: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Random Forest[21]

Random Forest (RF) is a committee of weak learners for solving prediction problems.

In RF, a decision tree, CART (Classification And Regression Tree) is used as a weak learner.

CART follows greedy, top-down binary, recursive partitioning, that divides feature space into sets of disjoint rectangular regions.

Each internal node has an associated splitting predicate.

12/15/2016

15

Proposed Method1

GPU based virtual screening techniques for faster drug discovery 15

Page 16: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Binary Decision Trees

4

3

6

9

Leaf nodes

Split nodes1

2

5

11 12 13

14 15 16

v

𝒗

≥𝒗 𝒕𝟑

𝒗

𝒗

𝒇𝟏

8

5

17

77

10

1

Image Courtesy : www. machinelearningmastery.com/

12/15/2016

16

Proposed Method1

V : feature Vector: : : split function

: threshold ≥

GPU based virtual screening techniques for faster drug discovery 16

Page 17: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/2016

17

Input and Parameters used:

N - number of training samples

n - number of decision trees in the forestM - total number of featuresm - number of features splitting a node

Algorithm

1. Set a number of trees, n as well as a number of

features, m to be used in the creation of the trees.2. Using bootstrapping, create n training samples for each

tree.3. Grow the trees.

Random Forest – Serial Algorithm[20]Proposed Method1

GPU based virtual screening techniques for faster drug discovery 17

Page 18: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/2016

18

4. In each node, select a set m of M random features;

these randomly selected features are the featurespossible to perform a split on in the current node.

5. Split the node on the feature m, that best separates thetraining samples in the node with regard to their outputvalue.

6. For each instance in the test sample, let each predictorvotes an output value.

7. Final output of ensemble is the majority vote8. End

RF - Serial Algorithm Continued… Proposed Method1

GPU based virtual screening techniques for faster drug discovery 18

Page 19: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Molecule as a vector

• A molecule can be represented as a set of descriptors

• 179 descriptors areconsidered in theproposed work

• A molecule canbe considered as apoint in amultidimensional‘descriptor space’.

12/15/2016

19

Proposed Method1

GPU based virtual screening techniques for faster drug discovery

Image Courtesy : www.cresset-group.co

19

Page 20: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/2016GPU based virtual screening techniques for faster drug discovery 20

Proposed Method1

Issues: Modelcreation andprediction seriallyfor millions ofmolecules takes toomuch of time tocomplete the VSprocess.

RF Classifier for LBVS20

Page 21: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Parallel RF – Related Works

Grahn et al. [27] presented a new parallel version of the Random Forests algorithm - CudaRF was implemented using CUDA.

These methods seem to under-utilize the available parallelism of many core machines.

Essen et al. [28] compared the effectiveness of FGPAs, GPUs and multi-core CPUs for accelerating Compact Random Forest(CRF) classifiers in their work.

Liao et. al [29] introduced CudaTree, a GPU Random Forest implementation which adaptively switches between data and instruction parallelism.

12/15/2016

21Proposed Method1

GPU based virtual screening techniques for faster drug discovery 21

Page 22: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Parallel Decision Tree construction

12/15/2016GPU based virtual screening techniques for faster drug discovery

22

To grow a decision tree on GPU, a hybrid method is developed in the proposed work.

Hybrid method uses combination of depth first and breadth first constructions

Depth First tree construction is utilized at tree top.

Tree construction will switch to breadth first tree construction after a threshold value.

This cross over threshold can be set by the number of nodes grown in the tree.

22

Page 23: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/201623

Proposed Parallel Training Algorithm

GPU based virtual screening techniques for faster drug discovery

Proposed Method1

Page 24: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/201624

Proposed Method1

Proposed Parallel Training Algorithm Continued…

GPU based virtual screening techniques for faster drug discovery

Page 25: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/201625

Proposed Method1Proposed Parallel Training Algorithm

GPU based virtual screening techniques for faster drug discovery

Page 26: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/201626

Proposed Method1Proposed Parallel Prediction Algorithm

GPU based virtual screening techniques for faster drug discovery

Page 27: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Data set used

o Input – Bio-assay SDF file from NCBI PubChem[34]o Feature Extraction Tool Used : POWERMV[32]

- 179 descriptors are generated for each data set[4]

Proposed Method1

GPU based virtual screening techniques for faster drug discovery 12/15/201627

27

Page 28: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Performance [serial and parallel]

Serial

Parallel

Proposed Method1

GPU based virtual screening techniques for faster drug discovery 12/15/201628

28

Page 29: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Speed up in TrainingProposed Method1

o For smaller data set, the speedup is less due to the overhead inCPU-GPU data transfer.

o For Larger data set, there is a visible computational boost of 10fold.

GPU based virtual screening techniques for faster drug discovery 12/15/201629

29

Page 30: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Speed up in Prediction

o The data set for classification were taken from GDB17[35],

a chemical universal database for unknown compounds.

o Speedup of 5 – 60 times is achieved

Proposed Method1

GPU based virtual screening techniques for faster drug discovery 12/15/201630

30

Page 31: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Proposed Method-2

GPU based Self Organizing Map for Virtual Screening.

12/15/2016

31

GPU based virtual screening techniques for faster drug discovery 31

Page 32: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Self Organizing Map[36]

Introduced by Prof. Teuvo Kohonen in 1982.

It is a type of artificial neural network (ANN) to produce a low-dimensional discretizedrepresentation of the input space of the training samples.

It is implemented as an unsupervised system of competitive learners.

This makes SOM useful for visualizing low-dimensional views of high-dimensional data.

12/15/2016

32Proposed Method2

GPU based virtual screening techniques for faster drug discovery 32

Page 33: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

33

Input: vectors X, of length n

(x1,1, x1,2, ..., x1,i,…, x1,n)

(x2,1, x2,2, ..., x2,i,…, x2,n)

(xj,1, xj,2, ..., xj,i,…, xj,n)

(xp,1, xp,2, ..., xp,i,…, xp,n)

o Outputo A vector, Y, of length m: (y1, y2, ..., yi,…, ym)

o There is one weight vector of length n associated with each output unit.

o Each of the p vectors in the training data is classified as falling in one

of m clusters.

33GPU based virtual screening techniques for faster drug discovery 12/15/2016

Image Courtesy : http://www.csbdu.in/

Working of SOM

12/15/2016

Page 34: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

34

Proposed Method2

SOM - Serial Algorithm[36]

GPU based virtual screening techniques for faster drug discovery 12/15/2016

34

Page 35: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/201635

Proposed Method2

GPU based virtual screening techniques for faster drug discovery

Page 36: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

SOM :Related Works

SOM Software packages SOM PAK[38] - developed at Helsinki University of Technology by

Kohonen and his team. Viscovery[39] - for medicinal document classification kohonen[40] - an R Language based SOM package

Paul Elzer et al.[41] in their work used SOM for industrial pharmaceutical research.

Dimitar Hristozov et al.[42] used SOM for fingerprint similarity based virtual screening.

Couldn’t seen any work in literature related to SOM based virtual screening.

Running the classical SOM algorithm for virtual screening of millions of molecules serially on even a powerful computer cannot complete execution in a limited time frame[36].

12/15/2016

36Proposed Method2

GPU based virtual screening techniques for faster drug discovery 36

Page 37: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Proposed iterative SOM for VS

An iteration of the proposed algorithm consists of

A model building phase and

A prediction phase.

The model building phase of the proposed algorithmcombines

The unsupervised learning capability of the SOM with

A supervised labeling of the trained SOM neurons

Each successive iteration builds a better prediction modelfor test data.

12/15/2016

37Proposed Method2

GPU based virtual screening techniques for faster drug discovery 37

Page 38: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/201638

Proposed Method2Proposed iterative SOM for VS

GPU based virtual screening techniques for faster drug discovery

Page 39: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/201639

Proposed Method2

Proposed iterative SOM continued …

GPU based virtual screening techniques for faster drug discovery

Page 40: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Proposed SOM for Virtual screeningProposed Method2

GPU based virtual screening techniques for faster drug discovery 12/15/201640

40

Issues: timeconsuming due tothe computeintensive winnerneuron finding,neuron weightupdating stepsand iterations.

Neuron labelsa : activei : inactivenl : next levelu : undefined

Page 41: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Parallel SOM– Related Works

Raghavendra D Prabhu[43], Alexander Campbell et al.[44] and Peter Wittek et al.[45] have described different parallel SOM implementation for their many core and multi architecture.

The Work by McConnell et al. [48] compared different parallel SOM implementations using OpenCL, CUDA and MPI.

Gavin Davidson[46] created a parallel version of the SOM algorithm using OpenCL.

Gaute Myklebust et al. [47] put forth ideas on node parallelism and training sample parallelism in SOM.

Node parallelism is used in the proposed work

12/15/2016

41Proposed Method2

GPU based virtual screening techniques for faster drug discovery 41

Page 42: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

12/15/201642GPU based virtual screening techniques for faster drug discovery

Proposed Method2

Page 43: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Algorithm continued ……

12/15/201643

Proposed Method2

GPU based virtual screening techniques for faster drug discovery

Page 44: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Algorithm continued ……

12/15/201644

Proposed Method2

GPU based virtual screening techniques for faster drug discovery

Page 45: GPU based virtual screening techniques for faster drug ...people.cse.nitc.ac.in/jayaraj/files/thesis_ppt1.pdf · Computer aided Drug Discovery[65] GPU based virtual screening techniques

Algorithm continued ……

12/15/201645

Proposed Method2

GPU based virtual screening techniques for faster drug discovery