a global view of the protein structure universe and protein evolution sung-hou kim university of...
TRANSCRIPT
A Global View of the Protein Structure Universe and
Protein Evolution
Sung-Hou Kim
University of California, Berkeley, CA
U.S.A.
June 27, 2006
TopicsI. Global view of the protein structure universe
II. Mapping of protein functions on the structural universe
III. Global view of the evolution of proteins
J. Hou
G. Sims
I.-G. Choi
S.-R. JunC. Zhang
I. Mapping the Protein Structure Universe: Structural Demography
The Protein Universe• 500 – 20,000 genes per organism• >13.6 106 species• >1010 – 1012 protein sequences
but………..• ~105 protein sequence families• ~104 protein structure families• ~103 protein fold domain
families
“Mapping” by Metric Matrix Distance Geometry(Classical Multidimensional Scaling)
Pair-wise relational distanceswith “errors”
Most likely (consistent)global relational “mapping”
d1,2
x1
x2 x3
x4
d2,4
d1,3
d2,3
d3,4
d1,4
Method
• Take all protein structures in PDB (>35,000)
• Construct a non-redundant set at 25% sequence identity (~2000 structures)
• Calculate all-to-all pair-wise structural similarities, then convert to dissimilarity scores
• Apply metric matrix distance geometry to find the global position of each structure in N-dimensional space
• 3-D plot to capture the major features of the protein structure space
Protein Structure Distance Matrix(~2000 structures with <25% sequence ID)
P1 P2 P3 P4 P5 P6 ……………P1898
P1
P2
P3
P4
P5
P6
.
.
P1898
D 3,4
Eigen values
0.00
500.00
1,000.00
1,500.00
2,000.00
2,500.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Positional coordinates in 1898 dimensional space.Major feature extraction in 3-dimension
The Protein Structure Universe (2005)
A1A2
A5
A3
A4
A1: (2ERL:_) MATING PHEROMONE ER-1;
A2: (1ELW:B) TPR1-DOMAIN OF HOP;
A3: (1A6M:_) MYOGLOBIN;
A4: (1E85:A) CYTOCHROME C’;
A5: (1M57:C) CYTOCHROME C OXIDASE;
Four demographic regions of the protein structure universe
Four Protein Fold Classes
n n n nm
+
Major Features of the Protein Structural
Space1. Protein structural space is
sparsely populated2. Four elongated regions
corresponding to four protein “fold” classes
3. Small to large size distribution along three of four “feature axes”
II. Mapping of Functions(1) Enzymatic functions
Molecular functions:Basic chemistry
EC
EC3: Hydrolases
EC6: Ligases
II. Mapping of Functions(2) Metal Binding
Ca
Co
Cu
Fe
Mn
Mo
Ni
Zn
Multi-bound
Not bound
Metal Binding
Zn
Cu
Major Features of Functional Mapping
Maximum diversity in architectural preference for a given molecular function:
“scaffold” selection vs. design
III. Evolution of Proteins (a) “Ages” of Protein
Families
Method: “Common Structural Ancestor”
The “age” of the “common structural ancestor” of a protein family
“Age” of CSA
Ages of the Common Structural Ancestors
Population averaged Chain length has similar distribution
III. Evolution of Proteins (b) Protein Fold Classes
ML Relative “age” of common structural ancestors
III. Evolution of Proteins (e) Protein Families
Hypothesis: Multiple Origins of Protein Families
Summary
• Mapping of protein structures—Sparse except four highly populated demographic regions (structural selection)
• Mapping of molecular functions—Opportunistic use of structural features for molecular function (selection, not design)
• Mapping of CSA ages—(1) Evolution of protein fold classes (2)”Multiple origin model” for the evolution
of protein families
Organismic evolution by natural selection for
environment
may be founded on
Molecular evolution by structural selection for
function