automatic optimization of predictive bioactivity models · guido bolick: automatic generation of...

39
April 25 th 2018 Automatic optimization of predictive Bioactivity models Chi Chung Lam, Fabian Steinmetz, Paul Czodrowski

Upload: others

Post on 07-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

April 25th 2018

Automatic

optimization of

predictive Bioactivity

models

Chi Chung Lam, Fabian Steinmetz, Paul Czodrowski

Page 2: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

2

Multiple models trained for biological targets

Random Forests

Neural Networks

Gradient Boosted Trees

NNs and GBTs are very sensitive to hyperparameter changes

Automated ways needed to build models with the right hyperparameters

Predictive Models in Production

Page 3: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

millions of unique

combinations possible3

NN Architectures & Hyperparameters

NN-Architecture

• Layer-Type

• Number of Layers

• Neurons per Layer

• Activation-Functions

Training-Parameters

• Optimizer

• Learning-Rate

• Weight-Decay

• Batch-Size

• Loss-Function

• …

Hyperparameters

Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016

Page 4: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

4

Genetic Algorithm for hyperparameter optimization

5.1

5.2

4

12 3

Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016

Page 5: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

5

Genetic Algorithm Workflow

Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016

Page 6: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

6

Comparing Global Models

Model Description

RF Random Forest with fixed hyperparams

Leiden DNN DNN with fixed hyperparams

GA DNN DNN with GA optimized hyperparams

Random DNN DNN with grid search optimized hyperparams

Feature-Wise Baseline Model that takes the fingerprint bit as prediction

XGBoost Gradient Boosted Trees with fixed hyperparams

Page 7: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

7

Assume that each fingerprint bit is a prediction, and select the best bit

Feature-Wise Baseline

Bit 0 Bit 1 Bit 2 Bit 3 Activity

Sample 1 1 0 0 1 0

Sample 2 1 0 0 0 0

Sample 3 1 1 1 1 1

Sample 4 1 1 1 0 1

Sample 5 0 0 1 1 0

Kappa score 0.41 1.00 0.67 -0.17

Page 8: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

8

Global Model Performance

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

CACO CLINT_H CLINT_M CLINT_R HERG SOL

Kappa S

core

Target

Global Model Performance

RF Leiden DNN GA DNN Random DNN Feature-Wise XGBoost XGBoost Random

Page 9: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

9

GA vs Random Search Comparison

Mean kappa score increases as GA evolution occurs

However, good solution is found too easily (already found in initial 100 architectures)

A random search of the same search space finds a similar or better solution

Page 10: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

10

Fingerprints hash a molecule’s substructures into a fixed bit

A small fingerprint size will cause “collisions”

A large fingerprint size will cause many redundant bits

Fingerprint Filtering: CLINT_R

FP Size 1024 4096

Avg substructures per bit 79.84 20.64

0.01 variance filter 3 2388

Substr/bit after 0.01 var filter 80.00 21.86

True size after 0.01 var filter 1021 1708

Page 11: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

11

Feature-selection of fingerprints by variance

Control: unfiltered FP of same length as filtered FP

Problem: Arbitrary choice of threshold variance

Fingerprint Filtering: CLINT_R

0,000

0,050

0,100

0,150

0,200

0,250

0.01 var Control 0.0 var Control Unfiltered

Mean K

appa S

core

CLINT_R 1024 Bits Filtering

DNN RF XGB

0,000

0,050

0,100

0,150

0,200

0,250

0,300

0.01 var Control 0.0 var Control Unfiltered

Mean K

appa S

core

CLINT_R 4096 Bits Filtering

DNN RF XGB

Page 12: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

Finding the optimal variance: CLINT_R

0,000

0,050

0,100

0,150

0,200

0,250

0.01 var 0.0 var Optimal Var Unfiltered

Mean K

appa S

core

CLINT_R Optimal Var Filtering

Page 13: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

Finding the optimal variance: HERG

0,000

0,100

0,200

0,300

0,400

0,500

0,600

0.01 var 0.0 var Optimal Var Unfiltered

Mean K

appa S

core

HERG Optimal Var Filtering

Page 14: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

Fingerprint Filtering: Problems

Variance of bits highly depends on sample size

Use threshold that is relative to sample size, instead of absolute value

Can we combine this filtering with the “feature-wise baseline” analysis?

Drop fingerprints that correlate poorly with dependent variable?

Page 15: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

15

Nested Cluster Validation

Page 16: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

16

The final models are used in production and served to chemists, etc.

Retraining occurs every 3 months

During these three months, models are “outdated”

Retraining more frequently is time-wise impractical

XGB and DNNs allow “On-line” updating

Fit new data during an additional training step of existing models

Can happen nearly real-time

Retraining only necessary when performance starts declining

On-line Updating of Models

Page 17: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

Our in house environments: CREAM and MOCCA

CREAM (Classification REgression At Merck)

- Python environment and modelling tool

- Used for the majority of predictive models

- Holds versatile features, such as

- Multiple machine learning algorithms

- Different validation methods

- Interface to MOCCA

MOCCA is the Merck Online Computational Chemistry Analyzer, our

web-based in-house prediction tool

Page 18: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global
Page 19: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

Global models

• Large Dataset

• Large Applicability Domain (AD)

• Endpoints, such as

• Physico-chemical Properties

• Pharmacokinetics

• Toxicity

• General Selectivity

Global vs. local models

Local models

• Smaller Dataset

• Smaller Applicability Domain

• Endpoints, such as

• Activity

• Selectivity

• Toxicity, Pharmacokinetics

Generally global models are preferrable dueto greater in-house modelling experience andlarger AD, but we are happy to supportprojects with local models if needed.

Page 20: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

e.g.

Page 21: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global
Page 22: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

22

• Chi Chung Lam

• Wolf-Guido Bolick (Andreas Dominik)

• Fabian Steinmetz

• Kristina Preuer, Günter Klambauer (Sepp Hochreiter)

• Friedrich Rippmann

• Marcel Baltruschat

• Cornelius Kohl

• Samo Turk

• Jan Fiedler

• Christian Röder

Acknowledgement

Page 23: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

23

back-up

Page 24: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

24

SET Train Test Classes

CACO 9637 523 3

CLINT_H 16264 797 3

CLINT_M 18313 981 3

CLINT_R 15910 760 3

HERG 6894 288 2

SOL 19615 667 3

Datasets

Page 25: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

millions of unique

combinations possible25

NN Architectures & Hyperparameters

NN-Architecture

• Layer-Type

• Number of Layers

• Neurons per Layer

• Activation-Functions

Training-Parameters

• Optimizer

• Learning-Rate

• Weight-Decay

• Batch-Size

• Loss-Function

• …

Hyperparameters

Page 26: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

26

Optimization of Hyperparameters

Expert Lucky People Everyone

Hyperparameters derived

from literature & experience

Hyperparameter search

within promising parameter

areas

Random-Search (Bergstra et al. 2012)

Grid-Search (Larochelle et al. 2007)

Probability based algorithms (Brochu et al. 2010, Bergstra et al. 2011)

Directed Random-Search

(e.g. genetic algorithms)

Page 27: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

27

What is a Genetic Algorithm?

5.1

5.2

4

12 3

Page 28: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

28

Validation Strategies

• Use as much data as possible for training

• Being able to get a realistic glimpse of the

performance

• 5-fold cross-validation

• Every compound represented in 4/5 models

• Hyperparameter optimization to increase

performance of validation sets

• Resulting performance trustworthy ?!

• 5-fold nested cross-validation 25 models

• Every compound represented in 16/25 models

• Increased computational requirements

• 5x Hyperparameter optimizations to increase

performances of validation sets

• Final performances evaluated using

corresponding outer loop test sets

Page 29: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

29

Getting a job (hyperparameters) from the jobserver

Repeat for all training/test sets:

Building of a NN based on hyperparameters

Training of the NN using a training set

Balanced-Batch-Generator maintains the same active/inactive-ratio within a batch

Early-Stopping, when mean validation-loss of sliding window (15 epochs) does not

improve for 100 epochs

Evaluation of best state (center of best window)

using validation set, metric Cohen’s Kappa

Training of a NN

1

2

2.1

2.2

2.3

Agreement of labels vs. prediction

Agreement of 2 random observers

Page 30: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

30

So many parameters..

Genetic Algorithm

• Population-Size: 100

• Workers: 10

• Fingerprint-Size:

1024

• Smarts-Patterns:

826

• Evolution-Strat.:

Drop-Worst-50%

Mutation Settings

• Default:

• Mutation-Rate: 5%

• Mutation-Strength: 1

• Crossing-Over-Rate: 30%

• Increased:

• Mutation-Rate: 10%

• Mutation-Strength: 2

• Crossing-Over-Rate: 30%

Training

• Optimizer: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam

• Loss-Functions: mae, mse, msle

• Learning-Rate: 0.05, 0.1, 0.5, 1.0

• Weight-Decay: 0.0, 1E-7, 5E-7

• Momentum: 0.0, 0.1, …, 0.9

• Nesterov: 0, 1

• Batch-Size: 5%, 6%, …, 20%

Architecture

• Layers: 1-4

• Layer-Types: Dense, Dropout

• Neurons: 32, 64, …, 512

• Dropout-Ratio: 5%, 10%, …, 90%

• Activation-Functions: linear, sigmoid, hard-sigmoid, softmax, relu, tanh

Page 31: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

31

Datasets

Dataset hERG Micronucleus-Test

Compounds 6999 798

Actives 3205 (46%) 263 (33%)

Inactives 3794 (54%) 535 (67%)

Binary Classification: Inactive 0

Active 1

Page 32: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

32

Found NN-Hyperparameters

Page 33: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

33

Found NN-Hyperparameters

Page 34: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

34

Improvement of NNs while running the GA

Initial population starts with inner-

kappa values of ~0.6 in all splits

GA is able to improve performance of

best entities even more (red line)

Mutations can lead to bad performing

entities (blue line) until the last

generation

Page 35: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

35

Novelty of Architectures

Proportion of new entities in population

decreases during the runtime of the GA

Higher mutation-rate (red line) increases

the searchable space for the GA

Page 36: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

36

Influence of Hyperparameters

1_activation (344)

First hidden

layer

Activation-function

of this layer

Number of

contributing pairs

Contributing pairs only differ by

the shown parameter

Boxplots are based on the

absolute difference of both inner-

kappa values of all contributing

pairs

Page 37: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

37

User-Interface

Page 38: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

38

Implemented an algorithm to create a consensus-model using 5-fold nested cross-validation

Each compound is represented in 16 of 25 NNs

Calculation needs 8-14 hours (e.g. during a night) using a GTX-Cluster

GA improves already high kappa values of NNs even more

Kappa values of final NN-models are mostly larger than 0.5 (“moderate” according to Landis et al. 1977)

Further steps:

Possibility to use chemical descriptors and multiple fingerprints

Option to create multi-class models (more classes than just 0 and 1) and regression models

(Polishing up and writing a paper)

Conclusion

Page 39: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global

39

Implementation of the GA