a tale of two worms: comparing the genomes of c. elegans & c. briggsae lincoln stein cold spring...
TRANSCRIPT
A Tale of Two Worms:A Tale of Two Worms:Comparing the Genomes ofComparing the Genomes of
C. elegansC. elegans & C & C. briggsae. briggsae
Lincoln SteinCold Spring Harbor Laboratory
My LabMy Lab
International HapMap ProjectInternational HapMap Project
• Find common regions of genetic variation in human population to reduce cost of genetic association studies.
• Reduce cost of genetic association studies.
• 600,000 SNPs x 270 individuals
GrameneGramene
• Comparative genomics among monocots
• Rice as model system• Rice genome, maps, proteins,
mutants, QTLs, phenotypes• Map alignments to maize, wheat,
oats, barley &c.
Genome KnowledgeBaseGenome KnowledgeBase
• Biological pathways in human• Curated by experts in the field• Designed for
• Education• Data mining & discovery
• Open data/Open software
WormBaseWormBase
• Community database for C. elegans• C. elegans genome• C. briggsae genome• Genetic maps• Developmental anatomy• RNAi screens• Microarray screens• Evolutionary biology
Generic Model Organism Generic Model Organism DbDb• Reusable software for
building model organism databases
• Used by WormBase, FlyBase, Gramene, RatDB, SGD, MGD…
• Genome browsers, genetic maps, curation tools…
male (lateral)
male (ventral)
hermaphrodite
200 µm 200 µm
10 µm 10 µm
Sequencing Sequencing C. briggsaeC. briggsae
• Isolate DNA, make libraries (2 mo)• Map libraries (4 mo)• Shotgun sequence genome (1 wk)• Assemble genome (6 mo)• Analyze genome (9 mo)
The DraftThe Draft
Contig Type Count N50 (kb) Length (% genome)
Contigs 5341 41 105.6
Supercontigs
899 474 107.5
Scaffolds 142 1450 102.4 (98%)
BAC Map
Sequence Contigs
Supercontigs
Scaffolds
Jim Mullikin, Sanger Center; LaDeana Hillier, WUSTL
Calling Calling C. briggsae C. briggsae GenesGenes
briggsaebriggsae genes: “hybrid” genes: “hybrid” strategystrategy
Eleganspredictions
Briggsaepredictions
Avril Coghlan, University of Dublin
How accurate is it?How accurate is it?
• C. elegans gold standard• 2,257 genes entirely confirmed by mRNA
data
• Results on C. elegans set• 92% of time, hybrid method picked the
“gold standard gene” correctly• 32 genes incorrectly split into 2 or more
predictions (1.4%)• 49 genes incorrectly merged into 1
prediction (1%)
Gene Sets Very SimilarGene Sets Very SimilarTable 1: Comparison of the C. briggsae and C. elegans Protein-Coding Gene Sets
C. briggsae C. elegansWS77*
C. elegans Hybrid
Number of genes 19,507 18,808 20,621
Median gene length
1.90 kbp 1.91 kbp 1.83 kbp
Summed length of genes 55.7 Mbp 52.5 Mbp 55.6 Mbp
Average gene density
5.4 kbp per gene
5.3 kbp per gene
4.9 kbp per gene
Identifying OrthologsIdentifying Orthologs
Ce Cb ortholog pair
ortholog vs paralog?
best similarity match
best similarity match
Use colinearityto resolveambiguities
(Todd Harris, CSHL)
Comparing OrthologsComparing Orthologs
• 12,155 orthologs• 807 C. briggsae “orphans”• 1,061 C. elegans “orphans”• Divergence date: 80-110 Mya• All genes under various degrees of
purifying selection
(Todd Harris, Jason Stajich)
Orthologs Similar but Differ in Orthologs Similar but Differ in DetailDetail
Briggsae has 1 new intron every 5th gene.
Comparing Gene Families:TRIBE-MCL
Comparing Gene Families:TRIBE-MCL
Comparing Gene Families:TRIBE-MCL
Comparing Gene Families:TRIBE-MCL
Comparing Gene Families:TRIBE-MCL
Cluster 1
Cluster 2
Cluster 3
Cluster 4 Cluster 5
Cb/Ce protein clusters
(Jason Stajich, rotation student)
2169 clusters of >= 2 members24% of elegans single-copy genes28% of briggsae single-copy genes
Protein Family Clusters
Cluster Elegans Briggsae
Description
5 112 128 Zn-finger
3 128 150 Protein kinase
1 215 105 7TM receptor, subf 2
6 122 86 7TM receptor, subf 2
2 169 135 7TM receptor, subf 1
8 2 204 Unknown
17 116 22 DUF38>200 clusters unbalanced by more than 2-fold
A Rapidly Evolving Family:Olfactory Receptors
PFAM Class C. elegans C. briggsae
7tm_4 269 222
7tm_5 322 163
sra 37 18
srb 16 12
sre 55 51
srg 32 30
Total 718 476
Putative ortholog pair
elegans exclusive subtree
Sra Olfactory Receptor Family
Synteny: Aligning Synteny: Aligning C.b.C.b. to to C.e.C.e.
Type Intergen Upstr Downst CDS Intron 5' UTR 3'UTR Repeat TotalStrong 61,615 27,512 30,192 49,358 114,323 2,783 7,239 28,313 321,335Coding 41,817 11,600 15,571 152,086 49,135 855 1,557 12,095 284,716Weak 115,200 53,189 59,542 188,601 250,603 5,885 11,823 49,624 734,467
TOTAL 218,632 92,301105,305 390,045 414,061 9,523 20,619 90,032 1,340,518
(Todd Harris & Jason Stajich)
Synteny ReconstructionSynteny Reconstruction
raw aligned segments (WABA)
Merge overlaps
Merge adjacent
merged segments
Reconstruct interrupted segments
reconstructed segments
(Yours truly)
Reconstructing Reconstructing briggsaebriggsae
•4,837 reconstructed segments•~85% of genome•0.5-0.7 bkpts/Mb/My)
Rearrangement is LocalRearrangement is Local
I II III IV V X
I 335 17 22 31 32 16
II 256 28 21 30 9
III 289 42 28 6
IV 314 38 14
V 272 21
X 170
Junctions of elegans chromosomes onbriggsae contigs
Rearrangement is LocalRearrangement is Local
left arm center right arm
left arm 494 174 123
center 592 163
right arm
445
Junctions of elegans chromosome arms onbriggsae contigs
big mapbig mapSyntenic blocks
Genes & meiotic map
Orthologs
Orphans
Essential genes
Repetitive elements
KA/KS
KS
Improving Improving elegans: elegans: new new gene?gene?
Improving Improving elegans: elegans: bad bad exon?exon?
Corrections to Corrections to C. elegansC. elegansTable 11. Updating the C. elegans Gene Set Using C. briggsae Similarity
Gene Set WS77 WS103
New genes 1,275 985
New exons in existing genes 1,763 1,243
Exon extensions in existing genes
1,115 845
Exon deletions in existing genes
2,093 1,600
Exon truncations in existing genes
1,675 1,114
Recent Work:Recent Work:Chemosensory ReceptorsChemosensory Receptors
PFAM Class C. elegans C. briggsae7tm_4 269 222
7tm_5 322 163
sra 37 18
srb 16 12
sre 55 51
srg 32 30
• Third largest C. elegans protein family.• Subclass of GPCR 7TM receptors.
Questions Questions
• Are these differences real?• Mechanism of the differences?
• Amplification vs gene loss
• Why are some subfamilies unbalanced and not others?
• Phenotypic consequences of the differences?
Putative ortholog pair
elegans exclusive subtree
Sra Olfactory Receptor FamilySra Olfactory Receptor Family
Are the Differences Real?Are the Differences Real?
• Intensive search for missing sra family members.
C. elegans C. briggsae
srasra
srasrasra
sra sra
sra srasrasra
sra
elegans genome briggsae genome
(Jack Chen, Postdoc; Shraddha Pai, URP)
similaritysearching
newnew
newnew
newnew
ResultsResults
• Family size differences real (still roughly twice as manyelegans sra as briggsae sra)
• Differences due to species-specific tandem duplications, not due to conversion into pseudogenes.
• But…
Hitting non-sra Hitting non-sra eleganselegans genes?genes?
18 non-sra C. elegans genes17 non-sra C. briggsae genes
A New Nematode A New Nematode Chemosensory (Sub)family?Chemosensory (Sub)family?
srafamily
sra-likegenes
7TM Domain Structure7TM Domain Structure
C36C5.7
• Most candidates showed 7 transmembrane domain signatures characteristic of GPCR membrane receptors.
……UsuallyUsuallyGene Name Number of TMsC36C5.6 4C36C5.7 7C36C5.8 7C36C5.10 7C36C5.11 7C36C5.2 7C36C5.1 7T20D4.18 6T20D4.2 4T20D4.1 2C04F5.4 7C04F5.5 7C04F5.6 6C33G8.5 6C47A10.6 7T21H8.2 6T21H8.3 6T21H8.4 7
Gene Name Number of TMsCBG07353 6CBG07355 7CBG19062 6CBG19390 5CBG19391 7CBG13454 5CBG13479 5CBG06298 5CBG05677 7CBG18741 4CBG08673 6CBG08675 6CBG08677 5CBG21805 7CBG07352 7CBG21852 7
Repairing Incomplete Repairing Incomplete GenesGenes
missedexon
Before & AfterBefore & After
After repairing: 7 TMsBefore repairing: 6 TMs
After RepairAfter RepairGene Name Number of TMsC36C5.6 7C36C5.7 7C36C5.8 7C36C5.10 7C36C5.11 7C36C5.2 7C36C5.1 7T20D4.18 7T20D4.2 7T20D4.1 7C04F5.4 7C04F5.5 7C04F5.6 7C33G8.5 6C47A10.6 7T21H8.2 6T21H8.3 6T21H8.4 7
Gene Name Number of TMsCBG07353 7CBG07355 7CBG19062 6CBG19390 5 ψCBG19391 7CBG13454 5 ψCBG13479 5CBG06298 5CBG05677 7CBG18741 4 ψCBG08673 6CBG08675 6CBG08677 6 ψCBG21805 7CBG07352 7CBG21852 7
Expression Patterns Co-Expression Patterns Co-Cluster with sra Family Cluster with sra Family
GenesGenes
sra genes
sra-likegenes
Kim et al Science, 293: 2087-2092. 2001
Anatomic Expression Anatomic Expression PatternPattern
Promoter/GFP Fusion AnalysisPromoter/GFP Fusion Analysis(Collaboration w/ David Baillie)(Collaboration w/ David Baillie)
Gene Head TailT20D4.1 -
T21H8.4 -
C33G8.5
C36C5.6
C47A10.6
C05F5.4 -
T20D4.18 - -
Phasmid Neuron Phasmid Neuron ExpressionExpression
anus
PHA/PHB
T21H8.4-GFP
Amphid Neuron Amphid Neuron ExpressionExpression
axon
dendrite
ASx Cell body
C05F5.4
ConclusionConclusion
• Likely new olfactory receptor subfamily
• Closely related to sra subfamily• ~50% more members in elegans
than briggsae• Specific expression in both amphid
& phasmid sensory neurons
Next StepsNext Steps
• Continue refining families in the 2 species• More novel candidate (sub)families,
one with ~100 members.
• Characterize expression patterns
Longer TermLonger Term
• Deconvolute odorant combinatorial code.
chemotaxis aversion
odorant
chemosensory receptors
sensory neurons
interneurons
Resources to ApplyResources to Apply
• Neuronal wiring chart.• Receptor promoter::GFP fusions.• Calcium-flux sensitive GFP
constructs (“Cameleons”)• Phenotypic assays (elegans &
briggsae)• Transgenesis/rescue
And Coming SoonAnd Coming Soon
Who DunnitWho DunnitZhirong Bao Alan Coulson Shraddha Pai
Thomas Blumenthal Richard DurbinBob Plumb
Michael Brent Sam Griffith-Jones Jane Rogers
Jack Chen Todd Harris Mark Sohrmann
Laura Clarke LaDeana Hillier Jason Stajich
Chris Clee Patricia Kuwabara Robert Waterston
Avril Coghlan James Mullikin David WilleyCSHL, WUGSC, Trinity College Dublin, Wellcome Trust/Sanger Institute, NIH, Duke University, UW
Funding from: NIH & Wellcome Trust
Who DunnitWho DunnitZhirong Bao Alan Coulson Shraddha Pai
Thomas Blumenthal Richard DurbinBob Plumb
Michael Brent Sam Griffith-Jones Jane Rogers
Jack Chen Todd Harris Mark Sohrmann
Laura Clarke LaDeana Hillier Jason Stajich
Chris Clee Patricia Kuwabara Robert Waterston
Avril Coghlan James Mullikin David WilleyCSHL, WUGSC, Trinity College Dublin, Wellcome Trust/Sanger Institute, NIH, Duke University, UW
Funding from: NIH & Wellcome Trust
Who DunnitWho DunnitZhirong Bao Alan Coulson Shraddha Pai
Thomas Blumenthal Richard DurbinBob Plumb
Michael Brent Sam Griffith-Jones Jane Rogers
Jack Chen Todd Harris Mark Sohrmann
Laura Clarke LaDeana Hillier Jason Stajich
Chris Clee Patricia Kuwabara Robert Waterston
Avril Coghlan James Mullikin David Willey
David Baillie, University of British Columbia