combining simulation and machine learning to recognize ...combining simulation and machine learning...
TRANSCRIPT
Combining simulation andCombining simulation andmachine learning to recognizemachine learning to recognizefunction in 4Dfunction in 4D
Russ B. AltmanRuss B. Altman, MD, PhD, MD, PhD
Departments of Bioengineering,Departments of Bioengineering,Genetics, Medicine, Computer ScienceGenetics, Medicine, Computer ScienceStanford UniversityStanford University
Biological MotivationsBiological Motivations
Structural genomics and predictionStructural genomics and prediction(structures without functions)(structures without functions)
Need to label putative molecularNeed to label putative molecularfunctions of new structuresfunctions of new structures
Need to discover new molecularNeed to discover new molecularfunctionsfunctions
Increasing examples ofIncreasing examples of polyfunctionalpolyfunctionalproteins.proteins.
OutlineOutline
Introduction to FEATURE &Introduction to FEATURE &WebFEATUREWebFEATURE
UsingUsing molecular molecular simulation tosimulation toimprove FEATUREimprove FEATURE
Protein Function Prediction &Protein Function Prediction &AnnotationAnnotation
FunctionPrediction
MethodProtein
Structures
X-Ray
NMR
StructurePredictions
ATP binding site
Calcium binding site
Radial MicroenvironmentRadial Microenvironment
ASP
ASP
FEATURE TrainingFEATURE Training
SitesSites Non-SitesNon-Sites
SignificanceSignificance
ScoreScore
CorrespondingCorrespondingVolumesVolumes
Histogram ofHistogram ofObserved valuesObserved values
FEATURE vectorsFEATURE vectors
4444 vectors of physical/chemical featuresvectors of physical/chemical features Counted in 6 shellsCounted in 6 shells 44 x 6 = 264 feature-shell combinations44 x 6 = 264 feature-shell combinations
that summarize the abundance of athat summarize the abundance of afeature in a shell (e.g. # of feature in a shell (e.g. # of oxygens oxygens ininshell that is 2-3 Angstromsshell that is 2-3 Angstroms from center)from center)
Vector = 264 continuousVector = 264 continuous or discreteor discretevalues describing environment aroundvalues describing environment aroundsite center.site center.
Example: Calcium binding siteExample: Calcium binding site
FEATUREFEATURECalciumCalciumModelModel
Calcium Model
Property Name
Significant presenceSignificant absence
AtomName
ChemicalGroup
AtomicProperties
ResidueName
Abundance ofASP amino acid
…
Challenge:Challenge:Create a library of modelsCreate a library of modelsavailable to FEATUREavailable to FEATURE
Solution:Solution:Use 1D sequence motifs asUse 1D sequence motifs as““seedsseeds”” for 3D motifs for 3D motifs
Shirley Wu
SeqFEATURESeqFEATURE
Build from Sequence Motif DatabasesBuild from Sequence Motif Databases
AutomaticallyAutomatically creates 3D motifs from 1D creates 3D motifs from 1Dsequence motifssequence motifs
Hypothesis: Hypothesis: 3D motifs perform better3D motifs perform betterthan 1D motifs in identifying functionalthan 1D motifs in identifying functionalsitessites
Extracting 3D motif examplesExtracting 3D motif examplesfrom 1D motiffrom 1D motif
Extracting non-sites forExtracting non-sites forbackground statisticsbackground statistics
Random amino acid with similar atomRandom amino acid with similar atomdensity and outside the site areadensity and outside the site area
SeqFEATURESeqFEATURE
Training Set
Sitespdbid (x y z)pdbid (x y z)…
Non-Sitespdbid (x y z)pdbid (x y z)…
FEATURE training
ProteinStructure
Model
FunctionalSites
Hits(x y z) score(x y z) score…
FEATURE scan
SequenceMotifs
SeqFEATURE
Site Selection
Non-site Selection
[LIVM]AS..T..H.[AD]..[RK]
(seq)FEATURE performs better whensequence and structural similarity are low
PROSITE other
1Z84: Galt-like proteinCutoffz-scoreSiteSeqFEATURE Model
3.0712.795CYS 66ZINC_FINGER_C2H2_1.3.CYS.SG3.1774.713CYS 63ZINC_FINGER_C2H2_1.1.CYS.SG
C2H2 type Zinc fingerSeqFEATURENucleotidyltransferase3-D templates
PredictionMethod
ADP-glucose phosphorylaseSSMNonePfamNonePROSITE
2.7954.713
1Z84: Galt-like protein
2GLI(zinc-finger protein)
1Z84(prediction)
back
Beta-lactamase prediction (left)
Go to other slides