new walters d3r 2018 - d3r | welcome... · 2018. 3. 5. · pat walters – d3r workshop february...
TRANSCRIPT
Howcanwegetbetteratthis?
PatWalters– D3RWorkshopFebruary23,2018
I’veDoneThisaFewTimes
2012 2013 2015 2016 2017 20182014
3
HowISpendMyTimeOnChallenges
Confidential|©2017RelayTherapeutics
Dealingwithpoorlyformattedsubmissions
Validatingevaluations MakingSlides
4
TheEvaluationProcess
Confidential|©2017RelayTherapeutics
PatEvaluate
ConnorandZiedEvaluate
FinalComparisons
5
TheLiteratureMakesItLookLikeActivityPredictionisaSolvedProblem
Confidential|©2017RelayTherapeutics
0.82 0.80
0.66 0.65
Pearsonr
6
ScoringPerformanceFromGC2andGC3
Confidential|©2017RelayTherapeutics
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
7
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
8
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
Whenevaluatingaregressionmodel,thedatasetshouldhaveadynamicrangesimilartothoseobservedindrugdiscoveryprojects(typically4-6logs)
9
DatasetsShouldSpanaReasonableDynamicRange
Confidential|©2017RelayTherapeutics
Thisdataset(PDBindv.2016coreset)spans10logsanddoesn’tprovideanappropriaterepresentationofcorrelation
10
CorrelationsCanChangeDramaticallyWithDynamicRange
Confidential|©2017RelayTherapeutics
R2=0.22MAE=0.69
R2=0.76MAE=0.55
Thisisthesamedataset.Ontheleftweconsidertheentireset,whichhasanunrealisticallylarge(~10log)dynamicrange.Ontherightweconsideramorerealisticsubsetwitha3logdynamicrange.Notethechangeincorrelation.
11
GC3CatSDatasetSpansaRealisticDynamicRange
Confidential|©2017RelayTherapeutics
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
12
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
13
Don’tCramMultipleDatasetsontotheSamePlot
Confidential|©2017RelayTherapeutics
http://pubs.acs.org/doi/abs/10.1021/acs.jpcb.7b07224 http://pubs.acs.org/doi/abs/10.1021/ja512751q
14
EvenMyFriendsAreGuilty
Confidential|©2017RelayTherapeutics
MillandNeysa(Yesterday)
15
Trellisingprovidesamuchmoreeffectivemeansofcomparingdatasets
Confidential|©2017RelayTherapeutics
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
16
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
ReportPearson,SpearmanandKendallcorrelationsFavorR2 overRwhenreportingaPearsoncorrelationcoefficientReportMAEand/orRMSE
17
Alwaysreportcorrelationsappropriately
Confidential|©2017RelayTherapeutics
Ihavenoideawhatthismeans
http://pubs.acs.org/doi/abs/10.1021/acs.jpcb.7b07224
18
MaximumAchievableCorrelation
Confidential|©2017RelayTherapeutics
StartwithexperimentaldataAddGaussianerror
§ Mean=0.0§ Standarddeviation=0.3log
CalculationcorrelationRepeat1000times
Brown,ScottP.,StevenW.Muchmore,andPhilipJ.Hajduk."Healthyskepticism:assessingrealisticmodelperformance.”DrugDiscoveryToday14.7(2009):420-427.
19
MaximumAchievableCorrelation- HPS90D3R1
Confidential|©2017RelayTherapeutics
https://github.com/PatWalters/metk
OpenSourceEvaluationCode(MoretoCome)
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
21
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
22
EnsureThatDifferencesinCorrelationAreSignificant
Confidential|©2017RelayTherapeutics
Inparticular,bothMM-PB/SAandMM-GB/SAproducedbetterresultsbyusingarepresentativestructure(R)0.72-0.79)ratherthanaveragingovertheconformationalensembleofeachgivencomplex(R)0.61-0.74
23Confidential|©2017RelayTherapeutics
M1_dynamic M1_static M2_static M3_dynamic M3_static M4_dynamic M4_static
Table L2
abs(
Pea
rson
r)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Aliteraturecomparisonof7methodsforscoringprotein-ligandinteractions
24
Rememberthatcorrelationshaveconfidenceintervalsandreporttheseintervals
Confidential|©2017RelayTherapeutics
25
It’sAlltheSame!
Confidential|©2017RelayTherapeutics
M1_dynamic M1_static M2_static M3_dynamic M3_static M4_dynamic M4_static
Table L2
ab
s(P
ea
rso
n r
)
0.0
0.2
0.4
0.6
0.8
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
26
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
MolecularweightandcalculatedLogParepoornullmodels
GenerateRDKitfingerprintsforligandsTrainonPDBbindrefinedset(n=4057)TestonPDBbindcoreset(n=290)Wallclocktime<5min
28
SimpleQSARasaNullModel
Confidential|©2017RelayTherapeutics
29
WhatConstitutesanAppropriateNullModel
Confidential|©2017RelayTherapeutics
MolecularWeight XLogP SimpleQSAR
30
ANullModelforRMSE
Confidential|©2017RelayTherapeutics
1.SampleNobservedvalues2.CalculateRMS3.Repeat1and2*1000
31
NullModelforGC1HSP90FreeEnergyChallenge
Confidential|©2017RelayTherapeutics
RMSE(kcal/m
ol)
32
ComparingRMSvsNullforGC1HSP90Challenge
Confidential|©2017RelayTherapeutics
Dashedlineindicatesthenullmodel
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
33
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
Alwaysprovideamachinereadabletable(e.g.csv)ofpredictedandexperimentalvaluesAtableinapaperisnotsufficient,itisoftenverydifficulttoextracttablesfrompdffilesChemicalstructuresshouldbeincludedasSDFor,whereappropriate,SMILEStofacilitatecomparisonwithothermethodsNeedtoenablereaderstoevaluatecorrelationsanderrors
34
Includeappropriatesupportinginformation
Confidential|©2017RelayTherapeutics
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
35
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
36
CanIReproduceYourMethod?
Confidential|©2017RelayTherapeutics
Code!!!AthoroughdescriptionofyourmethodAwebimplementationNoneoftheabove
37
WhatConstitutesReproducibility?
Confidential|©2017RelayTherapeutics
Weneedtoagreeon• Whatconstitutesareasonabledataset• Howdatashouldbereported• Evaluationmetrics• Statisticsforcomparison• Whatconstitutesanullmodel• Formatofsupportingmaterial• Criteriaforreproducibility
38
GuidelinesForReviewing”ScoringFunction”Papers
Confidential|©2017RelayTherapeutics
39
HowCanYouHelp?
Confidential|©2017RelayTherapeutics
40
DockingChallengesHaveBecomeMoreChallenging
Confidential|©2017RelayTherapeutics
Arewespendingenoughtimeunderstandcompoundsthatdockedpoorly?• Insufficientconformationalsampling• Insufficientposesampling• Inadequatescoring• LigandposeswithlimiteddensityIseveryonemissingthesamecompounds?Cangroupsworktogethertoimprovetheirmethods?
41
QuestionsonDockingChallenges
Confidential|©2017RelayTherapeutics
D3RParticipantsCSARParticipantsTDTParticipantsSAMPLParticipants
RommieAmaroMikeGilson
MillLambertNeysaNevins
ConnorParksZiedGaieb
ShuaiLiu
42
Acknowledgements
Confidential|©2017RelayTherapeutics
https://github.com/PatWalters/metk
OpenSourceEvaluationCode(MoretoCome)
BACKUP
44Confidential|©2017RelayTherapeutics
45
LooksLikeActivityPredictionisaSolvedProblem
Confidential|©2017RelayTherapeutics
0.82 0.80
0.66 0.65
Pearsonr
46
WhatConstitutesanAppropriateNullModel
Confidential|©2017RelayTherapeutics
MolecularWeight XLogP SimpleQSAR
47
WhatConstitutesanAppropriateNullModel
Confidential|©2017RelayTherapeutics
MolecularWeight XLogP SimpleQSAR
48
Evaluatemaximumpossiblecorrelationforadatasetgivenexperimentalerror
Confidential|©2017RelayTherapeutics https://www.sciencedirect.com/science/article/pii/S1359644609000403
StartwithexperimentaldataAddGaussianerror• Mean=0.0• Standarddeviation=0.3logCalculationcorrelationRepeat1000times
49
MaximumAchievableCorrelation
Confidential|©2017RelayTherapeutics