reconstructing biological networks from data:...
TRANSCRIPT
reconstructing biological networks from data: cMonkey & Inferelator
Richard [email protected]
http://www.cs.nyu.edu/~bonneau/
New York University,
Dept. of Biology &
Computer Science Dept.
!"#$"%&'(%&)"#(*+!,
#"- .(%/ 0#+1"%,+$.2#3&,.,$"*,&4+(5().
Wednesday, June 24, 2009
Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations. Morozov, Kortemme, Baker, PNAS, 2003
Oligo(N-aryl glycines): A New Twist on Structured Peptoids, Shah, Butterfoss, Bonneau, Kirshenbaum, 2008, JACS
justification of functional-from-principles, parameters-from-data by example
Wednesday, June 24, 2009
imd1 TR(Hrg) asnc VNG1845Coxygen
arsr
220
232310 338339
396407 411
431
448 455
16533 9977 39180
(Bi)clustering
aNX
E
P
S
A.
B.
D.
C.
1
98
2
29 3
7
61
124
163
205
141
184
53
15
100
1
Data
Dynamical
network model
Prediction
overview
1. co-regulatedmodules (integrate data types).
2. Learn topology and Dynamics withgreedy / local aprox.(inferelator 1.0, 1.1)
3. improving performanceover multiple time-scales(Inferelator 2.x)
Main results:
- Surprising predictive performance forprokaryotic networks, T-cell and macrophage differentiationEE Networks
- Longer time scale stability
- model flexibility
Wednesday, June 24, 2009
DNA RNA
microarrays
cDNAESTs
libraries ofFunctional RNA
phenotype
Automatedmicroscopy,etc.
Gene sequencingWhole Genome assembly
ChIP-chipTF-DNA Bindingexperiments
protein sequence databases,
Protein structures,
Proteomics
Protein-proteininteractions
protein
http://www.molbio.uoregon.edu/images/research/spragueg1.jpg
Metabolomics
Mass-spectroscopyNMRChromotography
Genotype & sequencing
Measuring affinities / binding
Measuring Levels
Assaying functional outcome
Wednesday, June 24, 2009
algorithms:David J. Reiss (cMonkey)Vesteinn Thorsson (Inferelator) Richard Bonneau functional genomics:Marc T. FacciottiAmy Schmid, Kenia WhiteheadMin Pan, Amardeep Kaur,Leroy HoodNitin S. Baliga
Wednesday, June 24, 2009
An example : Halobacterium
why halobacterium:• if your friends are working on
halo ... (Hood, Baliga)• not a “model” system (originally)• high IQ• diverse environment• small genome• good genetics, cultivable, etc. • a very tough extremophile,
bioengineering
Data collection and modeling effort✴ genome and genome annotation✴ microarrays✴ genetic and environmental perturbations✴ proteomics✴ ChIP-chip✴ some protein-protein
Wednesday, June 24, 2009
fructose,manribosomeatp,cobprecorrin,metdipeptideradA,hjr,smcrpooxphosk+ transmettrpDNA repairgvpFe!SmutS,dcdglycerol kinaseLPSmutS, primaseZn, + transnicotinamidephytoene,cyrptbat,bopaa trans and metarssirRphos upsop1O2!stressftsZ,cctA,flafla,cctB
Oxygen
Light
Iron
Metals
Radiation
VN
G6288C
VN
G1405C
asnC
gvpE
2tb
pC
.DkaiC
cspd1
ars
Rtb
pE
VN
G0751C
tfbF
thh3
VN
G2020C
tfbG
bat
VN
G2641H
cspd2
VN
G2614H
trh7
2.0
-2.0
! = 0.0
Halobacterium dataset including
>800 microarraystime seriesknock outs
ChIP-chip experiments
proteomics
phenotype
among the mostcomplete prokaryotic datasets
M. Facciotti, N. Baliga
min pan, Kenia Whitehead, Amy Schmid
Wednesday, June 24, 2009
cMonkey
integrative
biclustering
Expresion
Networks
Upstream
Biclusters,motifs,
subnetworks
Inferelator
inference of
dynamic regulatory
networks
Regulatorynetworkmodel
overview:
Wednesday, June 24, 2009
cspd2
VNG0040C
rhl
VNG0320H
3
kaic
AND
trh7
tbpe
AND
tbpd
4
tbpc
5
7
phou
AND
AND
VNG0293H
VNG2641H
AND
gvpe2
1H
VNG1029C
snp
VNG0194H
idr2
28
29
pai1
VNG2614H
4950
AND
AND
AND
52
AND
53
AND
78
tfbg
boa3
113
140
173
AND AND
Biological motivation:Learn regulatory interactions from data that are predictive of equilibrium and dynamical systems behavior
Challenges: Interactions between transcription factors Number of interactions much greater than number of observations Heterogeneous data (e.g. equilibrium and kinetic measurements) Indirect effects and noise (Causal symmetry between activators and
targets) Resultant models are a complex low-level abstraction
of the systems behavior
II. The Inferelator: regulatory network inference
Wednesday, June 24, 2009
experimental design
τ dydt
= −y + g(β • Z ) (1)
βZ = β1x1 + β2x2 + β3 min(x1, x2 ) (2)
10 7 6 8 5 1 2 9 3 4
6
2
4
1
11
8
9
7
3
5
10
8 1 5 7 6 9 3 2 4 10
2
1
4
11
9
8
6
7
3
5
10
Optimal (?) :-) the usual :-(
1. know your model/framework:
2. Time series:sampling with correctrate(s)!
3. samplingin correct regime.
4. Multifactorial on a budget:
Wednesday, June 24, 2009
Inferelator v 1.0
steady-state kinetic
Bonneau, Reiss, Baliga, Thorsson, 2006
yg( )
z = x1z = min(x1,x2)
Wednesday, June 24, 2009
steady state
time series/ kinetic
core assumption
steady-state
kinetic
Bonneau, Baliga, Thorsson, 2006
Wednesday, June 24, 2009
Representing Interactions:
�
y j = β1x1 j + β2x2 j + β3 min(x1 j ,x2 j ) +1
yj = β1x1 β2x2 β1x1⊕β2x2( )
(,,⊕) x⊕ y := min{x, y}
x y := x + y
Wednesday, June 24, 2009
Model selection using L1-shrinkage:avoiding overfitting
model size
(α̂, β̂) =(α̂ , β̂ )
argmin yi −α − β j zijj=1
p
∑⎛
⎝⎜⎞
⎠⎟
2
i=1
N
∑⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
why L1?beta -> 0LARS
β̂ jj=1
p
∑ ≤ t βols jj=1
p
∑
Wednesday, June 24, 2009
0.25 0.3 0.35 0.4 0.45 0.5 0.55
0.2
0.3
0.4
0.5
0.6
training error
ne
w d
ata
err
or
RMS error over 300 biclusters
1
2
3
4
5
6
Counts
0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.2
0.4
0.6
0.8
1
training cor
ne
w d
ata
co
r
Cor over 300 biclust
12346789101112131416171819
Counts
B. C.
D. E. F.
RMSD over trianing
RMSD (%var)
Fre
qu
en
cy
0.2 0.4 0.6 0.8
02
06
01
00
A.
mean = 0.369
RMSD (new conditions)
RMSD (%var)
Fre
qu
en
cy
0.2 0.4 0.6 0.8
02
04
06
08
0 mean = 0.375
Cor over trianing
Corr true vs. pred
Fre
qu
en
cy
0.0 0.2 0.4 0.6 0.8 1.0
04
08
01
20
mean = 0.788
Cor (new conditions) over
Corr true vs. pred
Fre
qu
en
cy
0.0 0.2 0.4 0.6 0.8 1.0
02
04
06
08
0 mean = 0.807
Predictive power over 130 new experiments
77. Amino acid uptake
! "
123 .Cell motility
150. Ribosome
205 . Phosphte uptake209 . Cation/ Zn transport
214 . Fe transport
217. Fe-S clusters, Heavy metal transport
244. Bop, DMSO resperation
251. DNA repair, nucleotide metabolism
258. Phosphate consumption
273. Pyrimidine biosynthesis
69 . K transport
Wednesday, June 24, 2009
VN G0040C
AN D
AN D
217
AN D
AN D
VN G2163HAN D
AN D
69
AN D
AN D
AN D
VN G0293H
125
257
214
289
251
282
86
205
150
264
232
77 3
238
6
11
215
273
174
163
124
209
79
68258
AN D
83
123
298
226
AN D
AN D
28
AN D
trh3
trh5
trh7
trh4
tbpd
csp d1
phou ka ic
rhl
imd1
bat
idr2
asn c
Fe transport, heme-aerotaxisDNA repair and mixed nucleotide metabolismPotasium transportpyridine biosythesisPhototrophy and DMSO metabolismCell motilityUnknown / MixedPhosphate uptakeAmino acid uptakeColbamine bisynthesis Phosphate consumptionCation / Zinc transportRibosomeFe-S clusters, Heavy metal transport, molybendum cofactor biosynthesis
VN G6 88C2
156
VN G0156C
inferred trh controlled subnetwork:
Homeostasisis an emergent
property ofthe global
network
Bonneau, et al, Genome Biology, 2006, Bonneau, et al. Cell, 2007
edges validated since by ChIP-chipM. Facciotti, N. Baliga
Wednesday, June 24, 2009
Finite difference approximation is poor
Inferelator 1--- Limitations
tk tk+1 tk+2
Error over long time intervals
PredictError propagation
Wednesday, June 24, 2009
Finite difference approximation is poor
Inferelator 1--- Limitations
tk tk+1 tk+2
Error over long time intervals
Predict
Wednesday, June 24, 2009
Finite difference approximation is poor
Inferelator 1--- Limitations
tk tk+1 tk+2
Error over long time intervals
Wednesday, June 24, 2009
Finite difference approximation is poor
Inferelator 1--- Limitations
tk tk+1 tk+2
Wednesday, June 24, 2009
Predictorslevels are const.
Finite difference approximation is poor
Inferelator 1--- Limitations
tk tk+1 tk+2
Wednesday, June 24, 2009
Explicit global solutions using Metropolis-Hastings:
Calculate gradient of the Energy with respect to
�
E β( ) = 12
xi(tk ) − xiobs(tk )( )
i=1
M
∑k=1
K
∑2
(1)
�
∂E β( )∂β j
= xi(tk ) − xiobs(tk )( )
i=1
M
∑k=1
K
∑ ∂xi(tk )∂β j
(2)
�
β jn+1 = β j
n − h ∂E(βn )
∂β j
+ 2σh ∗ζ jn
(3)
Slope
SlopeStep
Temperature
~N(0,1)
Aviv Madar on Right
Work done with Eric Vanden-Eijnden,Courant Institute of Mathematical Sciences,NYU
Wednesday, June 24, 2009
Finite difference approximation is improved
Inferelator 2: Concepts
tk tk+1
Predictorslevels are not const.
Wednesday, June 24, 2009
Finite difference approximation is improved
Inferelator 2: Concepts
tk tk+1
Predictorslevels are not const.
Wednesday, June 24, 2009
Finite difference approximation is improved
Inferelator 2: Concepts
tk tk+1
Predictorslevels are not const.
Wednesday, June 24, 2009
Finite difference approximation is improved
Inferelator 2: Concepts
tk tk+1
How do we estimate parameters?
Predictorslevels are not const.
Wednesday, June 24, 2009
Markovchain
Importancesampling
Gaussian Noise term
Inferelator 2: Mathematical Overview
Error over time series data
Error over steady state data
L2 norm constraint/regularizer
Minimize Energy (scoring/objective function)
Markov Chain Monte Carlo (MCMC) scheme to sample parameters
Wednesday, June 24, 2009
Markovchain
Importancesampling
Gaussian Noise term
Inferelator 2: Mathematical Overview
Error over time series data
Error over steady state data
L2 norm constraint/regularizer
Markov Chain Monte Carlo (MCMC) scheme to sample parameters
Wednesday, June 24, 2009
Markovchain
Importancesampling
Gaussian Noise term
Inferelator 2: Mathematical Overview
Error over time series data
Error over steady state data
L2 norm constraint/regularizer
Wednesday, June 24, 2009
Inferelator 2: Performance 5
time interval, minutes ->
length of time interval
rela
tive
err
or
Length of Time Interval vs. Relative Error
0 10 20 30 40 50 60
0.4
.81.2
1.6
train set
test set
Wednesday, June 24, 2009
cspd2
VNG0040C
rhl
VNG0320H
3
kaic
AND
trh7
tbpe
AND
tbpd
4
tbpc
5
7
phou
AND
AND
VNG0293H
VNG2641H
AND
gvpe2
1H
VNG1029C
snp
VNG0194H
idr2
28
29
pai1
VNG2614H
4950
AND
AND
AND
52
AND
53
AND
78
tfbg
boa3
113
140
173
AND AND
mixed time scales / mixed data-types:Learn regulatory interactions from sub-optimal datasetsMixed signaling & regulatory netsAdding metabolic effects
Inferelator 2, more explicit dynamics:New proposal distributionsNew functional forms for interactionsTesting in a wider variety of systems
Stochastic bayes /SDE aproach:Estimate or measure system convergence as well as mean, model error,multiple system paths
II. The Inferelator: Future Directions
Wednesday, June 24, 2009
AcknowledgmentsBonneau lab:Glenn ButterfossKevin DrewAviv MadarPeter WaltmanThadeous KacmarczykShailla MusharofDevorah KengmanaChris Poultny (Shasha)Irina NudelmanAlex Pearlman (Ostrer)Alex Pine
NYU:Eric Vanden-EijndenHarry OstrerMike PuruggananPatrick EichenbergerDennis Shasha
Tacitus- Howard Coale
• IBM– Robin Wilner
– Bill Boverman– Viktors Berstis– Rick Alther
• ETH Zurich
- Reudi Aebersold - Lars Malmstroem
Mike BoxemMarc Vidal
Dave Goodlett
Jochen Supper (Zell Lab)
- ISB
– Nitin Baliga (&lab)– Leroy Hood – Marc Facciotti
– David Reiss– Vesteinn Thorsson- Paul Shannon
- Iliana Avila-Campillo (MERC)
Alan Aderem
DOD-computing and society, NSF ABI, NSF Plant genome NSF DBI,DOE GTL
Rosetta CommonsCharlie Strauss (los alamos)David Baker (UW seattle)
Wednesday, June 24, 2009
1
Rosetta de novo structure prediction: The Human Proteomefolding Project
Kevin Drew,Lars Malmstroem,
Glenn Butterfoss,
Richard Bonneau
Rosetta Commons
Wednesday, June 24, 2009
4
Motivation: Genome AnnotationCheaper sequencing technologies
New protein sequences
Proteins w/ unknown function
Shibu Yooseph et al. 2007 Plos Biology
Wednesday, June 24, 2009
7
Background: Quick ExampleBacteriocin AS-48, Casp 4
1E68 1NKLGYFCESCRKIIQKLEDMVGPQPNEDTVTQAASQVCDKLKILRGLCKKIMRSFLRRISWDILTGKKPQAICVDIKICKE
MAKEFGIPAAVAGTVLNVVEAGGWVTTIVSILTAVGSGGLSLLAAAGRESIKAYLKKEIKKKGKRAVIAW 4%=
=
Cyclic Bacterial Lysin = NK Lysin
Structure:
Function:
Sequence:
Bonneau, R., Tsai, J., Ruczinski, I., Baker, D. Functional Inferences from Blind ab Initio Protein Structure Predictions. J. Structural Biology. (2001)
Wednesday, June 24, 2009
Plasmodium
SBRI top candidates for Vaccine
for preventing pregnancy
malaria
Wednesday, June 24, 2009
Arabidopsis example:
RPT3
1: cofactor
2: point of mutation causing
differential response to morphogen
1
2
Wednesday, June 24, 2009
E Andersen-Nissen, R Bonneau, R Strong, A AderemJournal of Experimental Medicine, 2007
Distant Multi-template fold recognition for Toll receptors
Wednesday, June 24, 2009
Rosetta
Local Sequence Bias
Non-local Interactions
Kevin Drew, Chivian, D., Bonneau, R. Ab initio structure prediction. (In) Bourne, P.E. (2007) Structural Bioinformatics (Methods of Biochemical Analysis, V. 44). New York: John Wiley & Sons; ISBN: 0471201995. Second Edition.
Wednesday, June 24, 2009
Rosetta
Local Sequence Bias
Non-local Interactions
CC
R
N
F Y
N
Hq
HH d
Experimental, Evolution
Kevin Drew, Chivian, D., Bonneau, R. Ab initio structure prediction. (In) Bourne, P.E. (2007) Structural Bioinformatics (Methods of Biochemical Analysis, V. 44). New York: John Wiley & Sons; ISBN: 0471201995. Second Edition.
Wednesday, June 24, 2009
Rosetta Fragment Libraries
• 25-200 fragments for each/every 3 and 9 residue sequence window (overlapping)
• Selected from database of known structures > 2.5Å resolution < 50% sequence identity
• Ranked by sequence similarity and similarity of predicted and known secondary structure
• Fragments restrict search to protein-like local conformations
Sequence similar or exact sequence BUT not long enough that the similarity is attributable to evolution (not homologous strictly speaking).
Wednesday, June 24, 2009
χ
χ
Low resolution:
Atom Model centroid reduction of side chains
Energy function terms
van der Waals repulsion
“pair” terms (electrostatics)
residue environment (prob of burial)
2º structure pairing terms (H-bonds)
radius of gyration
packing density
Implicit terms
fragments (local interactions)
Wednesday, June 24, 2009
Low resolution:
Atom Model centroid reduction of side chains
Energy function terms
van der Waals repulsion
“pair” terms (electrostatics)
residue environment (prob of burial)
2º structure pairing terms (H-bonds)
radius of gyration
packing density
Implicit terms
fragments (local interactions)
d
E
CLASH!!
�
(rij2 − dij
2 )2
rijj< i∑
i∑ ;dij < rij
�
d = distance
r = radii∑
Evaluate between Centoids and Backbone Atoms
Wednesday, June 24, 2009
Low resolution:
Atom Model centroid reduction of side chains
Energy function terms
van der Waals repulsion
“pair” terms (electrostatics)
residue environment (prob of burial)
2º structure pairing terms (H-bonds)
radius of gyration
packing density
Implicit terms
fragments (local interactions)
Pair-wise probability based on PDB statistics(electrostatics)
�
−lnP(aai,aa j | sijdij )
P(aai | sijdij )P(aai | sijdij )
⎡
⎣ ⎢
⎤
⎦ ⎥
j> i∑
i∑
aa = residue typed = centroid distance (binned, interpolated)s = sequence seperation (must be > 8 res )
Wednesday, June 24, 2009
Low resolution:
Atom Model centroid reduction of side chains
Energy function terms
van der Waals repulsion
“pair” terms (electrostatics)
residue environment (prob of burial)
2º structure pairing terms (H-bonds)
radius of gyration
packing density
Implicit terms
fragments (local interactions)
neighbors within 10 Å of Cβ
binned by : 0-3, 4,5, … , >30
also interpolated�
−ln P(aai | neighborsi)[ ]i∑
Probability of burial /exposure(solvation)
Wednesday, June 24, 2009
Low resolution:
Atom Model centroid reduction of side chains
Energy function terms
van der Waals repulsion
“pair” terms (electrostatics)
residue environment (prob of burial)
2º structure pairing terms (H-bonds)
radius of gyration
packing density
Implicit terms
fragments (local interactions)
Optimize 2º orientation
Wednesday, June 24, 2009
Low resolution:
Atom Model centroid reduction of side chains
Energy function terms
van der Waals repulsion
“pair” terms (electrostatics)
residue environment (prob of burial)
2º structure pairing terms (H-bonds)
radius of gyration
packing density
Implicit terms
fragments (local interactions)
N
R1
R2
C
O
Represent protein as vectors of2 residue “strands”
sheet vector
helix vector
Wednesday, June 24, 2009
Low resolution:
Atom Model centroid reduction of side chains
Energy function terms
van der Waals repulsion
“pair” terms (electrostatics)
residue environment (prob of burial)
2º structure pairing terms (H-bonds)
radius of gyration
packing density
Implicit terms
fragments (local interactions)
Coordinate system
v1
v2
θ
φ
σ
r
hb
Scores selected to discriminate “near native structures for “non native”:
Relative direction (φ,θ)
Relative H-bond orientation (hb)
Distance (r, rσ)
Number of sheets given number of strands
Helix-Strand Packing
Wednesday, June 24, 2009
Low resolution:
Atom Model centroid reduction of side chains
Energy function terms
van der Waals repulsion
“pair” terms (electrostatics)
residue environment (prob of burial)
2º structure pairing terms (H-bonds)
radius of gyration
packing density
Implicit terms
fragments (local interactions)
Used in earlier stages and for filtering�
RG = dij2
Density = −lnPcompact (neighborsi,sh )Prandom (neighborsi,sh )
⎡
⎣ ⎢
⎤
⎦ ⎥
sh∑
i∑
Promote a compact fold
Wednesday, June 24, 2009
χ
χ
High resolution:
Atom Model full atom representation
Energy function terms
Rotamer (Dunbrack)
Ramachandran
Solvation (Lazaridius Karplus)
Hydrogen bonding
Lennard-Jones
Pair (electrostatic)
Reference energies
Wednesday, June 24, 2009
9
Process of Obtaining Structures1. Split proteins into domains (ginzu,
Chivian) (chop, Rost)
2. Find domains we can annotate using Rosetta
3. Fold remaining domains using Rosetta on IBM’s World Community Grid
• 180,000 domains folded from 120 genomes
BIG caveat emptor:all results from this pointfor domains < 170 aa
Wednesday, June 24, 2009
worldcommunitygrid.org & grid.org
collaborators: Lars Malmstroem, Viktors Berstis, Mike Riffle, Leroy Hood, David Baker
Wednesday, June 24, 2009
Bacterial and Archaea:Bonneau, & Baliga. (2004)Genome Biology:Annotaion of Halobacterium NRC-1identification of transcription factorsrole of chemotaxis sensing domains
Yeast:Malstroem, Baker, Bonneau (2006) Plos Biology
Human & others:Bonneau, Malstroem, IBM: Human and others (in Progress)
Completed and ongoing projects
Wednesday, June 24, 2009
13Molecular Function GO Tree
P( MF | Predicted Structure, GO Process, GO Localization, … )
specificity
overview of approach
Wednesday, June 24, 2009
15
1. Structure (MCM) Score
2. Training Set (attaching structure to GO)
3. Naïve Bayes• Naïve Bayes with continuous SF prob• Naïve Bayes with GO terms
overview of approach
Wednesday, June 24, 2009
16
Mammoth Confidence Metric (MCM)• Compare Cluster Representatives
to PDB Structures• MCM Score [0…1] probability • based on:
– Quality of match– Rosetta quality– Length ratio of PDB and cluster
rep– Contact Order
Wednesday, June 24, 2009
17
Gene Ontology (GO) & Training Data • Function, Process, Localization terms
• 1.6 million sequences with annotations•
BLAST astral sequences to GO sequences (astral = pdb w/ SCOP SF)
• Cluster using CD-hit to reduce redundancy
• Cluster again using genome of benchmark sequences and remove matches
Wednesday, June 24, 2009
19
Naïve Bayes w/ Superfamilies
• How to take continuous probabilities of SF (by way of mcm scores) – we weight log-likelihood by the mcm scores:
Wednesday, June 24, 2009
20
Naïve Bayes w/ GO terms• Problem: Go terms are not
independent – if we use all terms annotated to a sequence we end up
double counting
• Solution: pick a term that will be predictive– Mutual information between term and MF
Wednesday, June 24, 2009
22
Results: Solved StructuresHow accurate are we when we predict SCOP Superfamily for PDB Structures?
Wednesday, June 24, 2009
23
Results: Since Solved Structures (2005)How accurate are we when we predict SCOP Superfamily for Swissprot Proteins?
P(s|rosetta)
Wednesday, June 24, 2009
24
Results: Bayes Function Prediction (Swissprot Benchmark )How accurate are our function predictions using structure only?
77 unique molecular functions100 unique molecular functions
P(MF|rosetta)
Wednesday, June 24, 2009
25
Results: Bayes Function Prediction (Swissprot Benchmark )How accurate are our function predictions using GO process & structure?
P(MF|rosetta,P)
Wednesday, June 24, 2009
VAM6/ YDL077C: Vacuolar protein that plays a critical role in the tethering steps of vacuolar membrane fusion by facilitating guanine nucleotide exchange on small guanosine triphosphatase Ypt7p.We find the following: (domain1) unknown (domain 2) Rosetta hit to Polynucleotide phosphorylase/guanosine pentaphosphate synthase (PNPase/GPSI), domain 3 (domain 3) Clathrin proximal leg (domain 4) Rosetta de novo hit to Hemerythrin (domain 5) Rosetta hit to SAM/ Pointed domain.
VPS29: Endosomal protein that is a subunit of the membrane-associated retromer complex essential for endosome-to-Golgi retrograde transport; forms a subcomplex with Vps35p and Vps26p that selects cargo proteins for endosome-to-Golgi retrieval. But, with this context so well defined, there is still no molecular function known, that is to say there is no precise mechanistic role known for this protein. We find a strong hit to Mre11 ( a double stranded mismatch repair protein, metal dependent phosphotase for domain 1 and a strong Rosetta hit for domain 2 to the PUA-domain like fold (implicated in RNA binding OR ATP sulfurylase N-terminal domain). The fold predictions are as confident as we ever see (MCM = 0.95, psiblast evalue to domain 1 hit Z = 13. ).
Results: HPF:vesicle transport
Wednesday, June 24, 2009