comparing the complete s. cerevisiae and c. elegans proteomes: orthology and divergence

29
Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University

Upload: drew

Post on 20-Jan-2016

30 views

Category:

Documents


3 download

DESCRIPTION

Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence. Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University. Goals of this study. Explore protein sequence and domain conservation between S. cerevisiae and C. elegans . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence

Stephen A. Chervitz

Saccharomces Genome Database

NCBI

Boston University

Page 2: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

2

Goals of this study Explore protein sequence and domain

conservation between S. cerevisiae and C. elegans. Unicellular vs. multicellular lifestyles

Classify yeast and worm similarity groups using functional annotation of yeast genes.

Enhance the SGD website and add value to the worm genomic sequence.

Page 3: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

3

Organization of this study

Shared core biology Whole protein sequence comparisons

Divergence Protein domain comparisions

No gene predictions No mitochondrial sequence

Page 4: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

4

Definitions

Orthologs: Genes from different species that perform the same biological function and are likely to be evolved from a common ancestral gene.

Paralogs: Genes that perform different biological functions in the same species that likely arose by duplication and divergence from a common ancestral gene.

Page 5: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

5

Genome Scorecards

Saccharomyces cerevisiae Caenorhabditis elegansx200X20,000

No. of cells: 1 ~1000Size (Mbp): 12 97Chromosomes: 16 6Predicted ORFs: 6,217 19,099Percent coding: 72% 27%ORFs with gene names: 3,344 (53%) 688 (4%)

Page 6: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

6

Core biology is carried out by similar numbers of proteins

Page 7: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

7

Building a Biological Rosetta Stone

P-ValueYeast ORFs with

functional descriptionWorm orthologs with functional description

1e-10 86% 64%1e-20 89% 69%1e-40 93% 61%1e-60 96% 74%1e-80 96% 74%1e-100 98% 77%1e-200 98% 88%

Page 8: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

8

Distribution of core biological functions conserved in both yeast and worm

Page 9: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

9

Core Biological Functions Signal Transduction: kinases, phosphatases, Ras

superfamily and other GTP-binding proteins,GDP/GTP exchange factors, ADP-ribosylation factors, adenylyl/guanylyl cyclases, phosphatidylinositol kinases, EF-hand proteins

DNA/RNA Metabolism: polymerases, helicases, topoisomerases, repair/recombination-related, nucleases, primases, splicing factors, initiation/elongation factors (transcription & translation), tRNA synthetases, histone acetylases/deacetylases

Transport & Secretion: ABC transporters, permeases, vesicle coat & fusion proteins, clatherin-accociated, protein targeting, signal recognition particle, nuclear pore-associated

Cytoskeletal: Actin, myosin, tubulin, actin-related proteins, actin-interacting proteins, septins, cytokinesis-related proteins

Page 10: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

10

Core Biological Functions (cont’d) Ribosomal:ribosomal proteins (small & large subunit),

ribosome processing proteins Protein Folding and Degradation: heat shock

proteins, chaperonins, proteasome subunits, ubiquitin-related, peptidyl prolyl cis-trans isomerase, protein disulfide isomerases, aminopeptidases, post-translational modifying enzymes (farnesyltransferase, myristoyltransferase, glycosylation, GPI-anchoring)

Intermediary Metabolism: dehydrogenases, reductases, mutases, lyases, isomerases, carboxylases, decarboxylases, nucleotide biosynthetic enzymes, transaminases, deaminases, epimerases, oxygenases, cytochromes, flavoproteins

Page 11: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

11

Constructing Sequence Similarity Groups

Page 12: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

12

Similarity Groups: MCM DNA replication initiator complex

Page 13: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

13

Similarity Groups: Tubulin

Page 14: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

14

Multiple Sequence Alignments

Page 15: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

15

Domain Analysis

122 common eukaryotic protein domains. Associated with regulation of gene expression

and signal transduction. Compare occurrence and domain architectures

in yeast and worm protein sequences. Position-dependent weight matrices (profiles)

to detect domains (PSI-BLAST). Classify worm-only, yeast-only, and shared

domains.

Page 16: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

16

Worm-Only Domains Nuclear hormone receptors Epidermal growth factor Degenerins FMRFamides (neuropeptides) Cadherin PTB (phosphotyrosine binding) T-box, SMAD (transcription factor domains) Insulin-like peptides Laminin NT

Page 17: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

17

Yeast-Only Domains

C6 (Zn-binding cluster) ASPES (DNA-binding)

Page 18: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

18

Shared Domains (Yeast & Worm) Protein kinase catalytic C2H2 Finger AAA ATPase DAG Kinase Arrestin Ankyrin SWI/SNF helicase RING-finger bHLH RHO GAP/GEF Plecstrin homology SH3 Ubiquitin

SH2 cNMP-signaling domains CaM EF-hands Homeodomains Potassium channels 7TM receptors HINT Immunoglobulin LRR vWA MATH POZ LIM

Page 19: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

19

Frequency of occurrence of common domains

Domain counts are normalized to the number of proteins with a given domain per 1000 genes.

Page 20: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

20

Conclusions Core biological functions are carried out by

orthologous proteins occurring in comparable numbers in yeast and worm.

These represent approx. 40% of the predicted yeast ORFs and 20% of the predicted worm ORFs.

Regulatory and signaling proteins in worm do not have orthologs in yeast but often share domains.

Complete results are available online at SGD at http://genome-www.stanford.edu/Saccharomyces/worm

Page 21: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

21

Future Directions

Incorporate more sensitive sequence search results. More sophisticated clustering scheme.

Multi-domain proteins and weak similarities.

Up-to-date with to changes in the genomic datasets. Add/remove protein coding regions Correction of errors in the genomic sequence Sequence name changes

Extended annotation support. Controlled vocabularies, gene function ontologies.

Comparative genomics framework for additional genomes. More flexible browsing of genome-wide similarities.

Prototype yeast genome protein similarity Java viewer

Page 22: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

22

Genome-wide protein similarity view Explore protein sequence similarities within or

between genomes Graphical user interface Available at SGD for the yeast genome

Sequence Resources, Protein Similarity View

Page 23: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

23

Page 24: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

24

Acknowledgements

Saccharomyces Genome Database (Stanford) Gavin Sherlock Cathy Ball Selina Dwight Midori Harris Kara Dolinski Shuai Weng Eric Hester Mike Cherry David Botstein

Page 25: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

25

Acknowledgements (cont’d)

NCBI (Nat’l Library of Medicine) L. Aravind Eugene Koonin

Boston University Scott Mohr James Freeman Temple Smith

Neomorphic Software (Berkeley) www.neomorphic.com

Page 26: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

26

Extra slides

Page 27: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

27

Single-linkage clustering and multi-domain proteins

“Chaining”

1.

2.

3.

Page 28: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

28

Whole genomic DNA microarrayDeRisi et al.(1997) Science 278: 680

Page 29: Comparing the Complete  S. cerevisiae  and  C. elegans  Proteomes:  Orthology and Divergence

29

Building a Biological Rosetta Stone