pka expert training and automation...training with pktraining with pka aaccuracaccuracy y extender...
TRANSCRIPT
pKaExpert Training and AutomationExpert Training and Automation
Eduard KolovanovCSO, PhysChem and ADME/Tox
Existing ways of ACD/pKa trainingExisting Methods for pKag y /p g
• ACD/Labs methods
TrainingACD/Labs methods• Compounds in User’s DB (Systematic Training)• pKa Accuracy Extender• pKa Accuracy Extender PROFESSIONAL
• PA method• Possible steps for AUTOMATION when using pKa AE
PRO
Systematic Training
Systematic TrainingSystematic Trainingy g
Advantages• Fast and simple to use• Fast and simple to use• One compound is enough to correct an entire
compound classDrawbacks
• No new Hammett equations. Only pK0 is corrected• Low accuracy
Training with pKa Accuracy ExtenderTraining with pKa Accuracy g p y
Advantages
ExtenderAdvantages
• High accuracy is achieved• NEW Hammett Equations q
may be addedDrawbacks
NEW H tt E ti h ld b bt i d• NEW Hammett Equation should be obtained by user outside pKa AE
• Requires manual work outside pKa AERequires manual work outside pKa AE• Low performance because of previous
reasons
Training with ADME BOXESTraining with ADME SuitegAdvantages
• Automated• Easy to use • Calculation of RI
Drawbacks• “Black Box” for User
N H tt E ti• No Hammett Equations–correction for whole classes not very evidentnot very evident
Training with pKa AE PRO
Training with pKa AE PROTraining with pKa AE PRO g p
Advantages• NEW Hammett Equations may be added easily• NEW Hammett Equations may be added easily• High accuracy is achieved• High performance and speed of data utilizationHigh performance and speed of data utilization• ALL experimental information is used for training
without casual lossesDrawbacks
• Hard to use by non-expert users
THREE Successful Examples of K AE PRO A li tipKa AE PRO Application
ACD/Labs Internal algorithm rebuilding• Fast speed—2 months for all Hammett equations
recalculation/improvement• Introducing multi substituted Reaction Centers• Introducing multi-substituted Reaction Centers
Work with GSK (2000 compounds, 1st experience)• 2 months for data preparation2 months for data preparation• 1 month for data analysis
Work with UCB Celltech (2000 compounds, 2nd experience)• 1 month for data preparation• 1 week for data analysis (on-site visit)
Results of ACD/Labs Internal Algorithm Rebuilding (InternalAlgorithm Rebuilding (Internal
Dataset)ClassName Sceleton Count Subst Count
ver. 12 ver. 10 ver. 12 ver. 10
alpha‐amino acids and derivatives 50 16 102 22
C id 26 2 3 29CH‐acids 26 21 37 29
all primary amines 87 61 151 108
secondary amines 87 26 167 40
t ti i 26 20 28 20tertiary amines 26 20 28 20
Imidazoles 12 5 38 15
Piperazines 12 3 29 5
Piperidines 14 1 31 1Piperidines 14 1 31 1
Pyrazines 4 2 11 3
Pyrimidines 46 11 105 18
Quinolines 23 5 49 13Quinolines 23 5 49 13
6‐member heteroaromatics 194 45 456 97
TOTAL in DB 1024 591 2009 1050
Results of ACD/Labs Internal Algorithm Rebuilding (InternalAlgorithm Rebuilding (Internal
Dataset)N R2 MAE
BEFORE algorithm rebuilding, 11,972 0.955 0.491g g,version 10
,
AFTER algorithm rebuilding,i
12,095 0.965 0.416version 12
Results of ACD/Labs Internal Algorithm Rebuilding (External
D t t f 655 K l f “I Sili P di ti f
Algorithm Rebuilding (External Dataset)
Data set of 655 pKa values from “In Silico Prediction of Physicochemical Properties (Report for European Commission, 2007)” by John Dearden and Andrew Worth, ) y
R2 MAE
BEFORE algorithm rebuilding, version 8 0.68 1.07
AFTER algorithm rebuilding, versions 12 0.92 0.54
Results of Expert training with GSK d UCB C llt hGSK and UCB Celltech
Statistics is presented for same ~4000 compounds whereStatistics is presented for same 4000 compounds, where some were used for training file calculations.
Feedback from GSK and UCB was that new compounds
R2 MAE
were also much improved.
R MAE
BEFORE Training ~ 0.7 ~ 1.0
AFTER Training 0 9 0 5AFTER Training ~ 0.9 ~ 0.5
Training with pKa AE PRO Workflow
AutomatedAutomated• Generation of ALL perspective Reaction Centers (Step
#2))
Semi-Automated• pKa Assignment (Step #1)
N t A t t dNot Automated• Hammett Equation Calculation (Step #3)
Training with pKa AE PRO• Step 1 – pKa Assignment• Auto pK assignment with ACD/Labs pK algorithm• Auto pKa assignment with ACD/Labs pKa algorithm• Auto pKa assignment with PA pKa algorithm• Manual checking (visually or by “double” assignment)Manual checking (visually or by double assignment)
• How to make assignment FULLY automatic?• It is possible to create combined auto-assignment
procedure with 1 or 2 assignments of same pKavaluevalue.
Training with pKa AE PRO Step 2—Generation of ALL perspective Reaction Centers
Training with pKa AE PRO Step 3—Analysis, Equations Calculation
Training with pKa AE PRO Challenges for full Automation
Calculation of Accurate Hammett Equation:• Selection of appropriate Sigma constants
R i i t l l d tli• Removing wrong experimental values and outliers• Reasonable detailing of Reaction Centers:
NH+N
R2
NH+N
R2
NH+N
R2
HNH+N
F
HR3 R1 NH2 R1 N
H
HR1 N
H
HR1
Training with pKa AE PRO Challenges for full Automation
• Additional Problems:• Addition of Hammett Equation when it can not be
calculated by software or statistically insignificant (Expert may force the program to use the Equation)
• After successful training of some pK (micro) when• After successful training of some pKa (micro), when calculate apparent pKa, another micro-pKa appears as most important and determines an apparent pKa—dditi l t i i i i dadditional training is required
Training with pKa AE PRO Challenges for full Automation
Form mostly depending on
NNH +
FF
Form mostly depending onapparent pKa BEFORE Training Form mostly depending on
apparent pKa AFTER Training
NH+N
NH N
FNNH
NH2 NNN
NH2 N+
NH2 N
+
R2R2
First Training stepSecond (additional)
2‐fluoro‐N,N‐dimethylpyrimidine‐4,6‐diamine
NH+N
NH
HR1
NNH+
R3
2
NR
R
( )Training step
R
Questions
• Is anyone interested in on-site development of pKa user training files?
• Is anyone interested in development of MORE Automated pKa EXPERT user training?u o a ed p a use a g
• Your questions…