combining simulation and machine learning to recognize ...combining simulation and machine learning...

25
Combining simulation and Combining simulation and machine learning to recognize machine learning to recognize function in 4D function in 4D Russ B. Altman Russ B. Altman , MD, PhD , MD, PhD Departments of Bioengineering, Departments of Bioengineering, Genetics, Medicine, Computer Science Genetics, Medicine, Computer Science Stanford University Stanford University

Upload: others

Post on 12-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Combining simulation andCombining simulation andmachine learning to recognizemachine learning to recognizefunction in 4Dfunction in 4D

Russ B. AltmanRuss B. Altman, MD, PhD, MD, PhD

Departments of Bioengineering,Departments of Bioengineering,Genetics, Medicine, Computer ScienceGenetics, Medicine, Computer ScienceStanford UniversityStanford University

Page 2: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Biological MotivationsBiological Motivations

Structural genomics and predictionStructural genomics and prediction(structures without functions)(structures without functions)

Need to label putative molecularNeed to label putative molecularfunctions of new structuresfunctions of new structures

Need to discover new molecularNeed to discover new molecularfunctionsfunctions

Increasing examples ofIncreasing examples of polyfunctionalpolyfunctionalproteins.proteins.

Page 3: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,
Page 4: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

OutlineOutline

Introduction to FEATURE &Introduction to FEATURE &WebFEATUREWebFEATURE

UsingUsing molecular molecular simulation tosimulation toimprove FEATUREimprove FEATURE

Page 5: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Protein Function Prediction &Protein Function Prediction &AnnotationAnnotation

FunctionPrediction

MethodProtein

Structures

X-Ray

NMR

StructurePredictions

ATP binding site

Calcium binding site

Page 6: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Radial MicroenvironmentRadial Microenvironment

ASP

ASP

Page 7: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,
Page 8: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

FEATURE TrainingFEATURE Training

SitesSites Non-SitesNon-Sites

SignificanceSignificance

ScoreScore

CorrespondingCorrespondingVolumesVolumes

Histogram ofHistogram ofObserved valuesObserved values

Page 9: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

FEATURE vectorsFEATURE vectors

4444 vectors of physical/chemical featuresvectors of physical/chemical features Counted in 6 shellsCounted in 6 shells 44 x 6 = 264 feature-shell combinations44 x 6 = 264 feature-shell combinations

that summarize the abundance of athat summarize the abundance of afeature in a shell (e.g. # of feature in a shell (e.g. # of oxygens oxygens ininshell that is 2-3 Angstromsshell that is 2-3 Angstroms from center)from center)

Vector = 264 continuousVector = 264 continuous or discreteor discretevalues describing environment aroundvalues describing environment aroundsite center.site center.

Page 10: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Example: Calcium binding siteExample: Calcium binding site

Page 11: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

FEATUREFEATURECalciumCalciumModelModel

Calcium Model

Property Name

Significant presenceSignificant absence

AtomName

ChemicalGroup

AtomicProperties

ResidueName

Abundance ofASP amino acid

Page 12: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,
Page 13: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,
Page 14: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,
Page 15: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Challenge:Challenge:Create a library of modelsCreate a library of modelsavailable to FEATUREavailable to FEATURE

Solution:Solution:Use 1D sequence motifs asUse 1D sequence motifs as““seedsseeds”” for 3D motifs for 3D motifs

Shirley Wu

Page 16: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

SeqFEATURESeqFEATURE

Build from Sequence Motif DatabasesBuild from Sequence Motif Databases

AutomaticallyAutomatically creates 3D motifs from 1D creates 3D motifs from 1Dsequence motifssequence motifs

Hypothesis: Hypothesis: 3D motifs perform better3D motifs perform betterthan 1D motifs in identifying functionalthan 1D motifs in identifying functionalsitessites

Page 17: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Extracting 3D motif examplesExtracting 3D motif examplesfrom 1D motiffrom 1D motif

Page 18: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Extracting non-sites forExtracting non-sites forbackground statisticsbackground statistics

Random amino acid with similar atomRandom amino acid with similar atomdensity and outside the site areadensity and outside the site area

Page 19: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

SeqFEATURESeqFEATURE

Training Set

Sitespdbid (x y z)pdbid (x y z)…

Non-Sitespdbid (x y z)pdbid (x y z)…

FEATURE training

ProteinStructure

Model

FunctionalSites

Hits(x y z) score(x y z) score…

FEATURE scan

SequenceMotifs

SeqFEATURE

Site Selection

Non-site Selection

[LIVM]AS..T..H.[AD]..[RK]

Page 20: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,
Page 21: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

(seq)FEATURE performs better whensequence and structural similarity are low

PROSITE other

Page 22: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

1Z84: Galt-like proteinCutoffz-scoreSiteSeqFEATURE Model

3.0712.795CYS 66ZINC_FINGER_C2H2_1.3.CYS.SG3.1774.713CYS 63ZINC_FINGER_C2H2_1.1.CYS.SG

C2H2 type Zinc fingerSeqFEATURENucleotidyltransferase3-D templates

PredictionMethod

ADP-glucose phosphorylaseSSMNonePfamNonePROSITE

2.7954.713

Page 23: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

1Z84: Galt-like protein

2GLI(zinc-finger protein)

1Z84(prediction)

back

Page 24: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Beta-lactamase prediction (left)

Page 25: Combining simulation and machine learning to recognize ...Combining simulation and machine learning to recognize function in 4D Russ B. Altman, MD, PhD Departments of Bioengineering,

Go to other slides