genome biology and biotechnology 2. the genome structures of invertebrates prof. m. zabeau...

43
Genome Biology and Genome Biology and Biotechnology Biotechnology 2. The genome structures of invertebrates 2. The genome structures of invertebrates Prof. M. Zabeau Prof. M. Zabeau Department of Plant Systems Biology Department of Plant Systems Biology Flanders Interuniversity Institute for Biotechnology Flanders Interuniversity Institute for Biotechnology (VIB) (VIB) University of Gent University of Gent International course 2005 International course 2005

Upload: teresa-wiggins

Post on 25-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Genome Biology and Genome Biology and BiotechnologyBiotechnology

2. The genome structures of invertebrates2. The genome structures of invertebrates

Prof. M. ZabeauProf. M. ZabeauDepartment of Plant Systems Biology Department of Plant Systems Biology

Flanders Interuniversity Institute for Biotechnology (VIB)Flanders Interuniversity Institute for Biotechnology (VIB)University of GentUniversity of Gent

International course 2005International course 2005

Page 2: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Sequenced genomes of Sequenced genomes of invertebratesinvertebrates

¤ Nematodes– Caenorhabditis elegans (1998)– Caenorhabditis briggsae (2003)

¤ Insects– Drosophila melanogaster – fruit fly (2000)– Drosophila pseudoobscura – fruit fly (2005)– Anopheles gambiae - mosquito (2002)– Bombyx mori - silkworm (2004)

¤ Tunicates: ancestral vertebrate genome– Ciona intestinalis (2002)

Page 3: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Phylogeny of the invertebratesPhylogeny of the invertebrates

550 MY

~800 MY

>1000 MY

Page 4: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Genome Sequence of the Nematode Genome Sequence of the Nematode C. elegansC. elegans

¤ Paper presents– The first complete genome sequence of a multicellular

organism• The initial sequence covered 97-Mbp (6 gaps) • The complete sequence (June 2003) comprises 100,2Mbp

without gaps

The C. elegans Sequencing Consortium, Science, 282, 2012 (1998)

Page 5: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Protein coding GenesProtein coding Genes

¤ First large-scale genome sequence annotation– The gene structure predictions based on EST and protein

similarities• Only 40% of the predicted genes had a confirming EST match

¤ The first annotation predicted 19,099 genes– An average density of 1 predicted gene per 5 kb– 27% of the genome resides in predicted exons

– Each gene has an average of five introns– WormBase: updated and manually curated gene set

• Currently contains 18,808 genes

Reprinted from: The C. elegans Sequencing Consortium, Science, 282, 2012 (1998)

Page 6: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

RNA genes and repetitive sequencesRNA genes and repetitive sequences

¤ RNA genes– rRNA genes: occur in long tandem arrays – tRNA genes: 659 tRNA genes occur widely dispersed – Noncoding RNA genes: in dispersed multigene families– Micro RNA genes (miRNA)

• ~100 identified to date

¤ Repetitive Sequences– Dispersed repeat sequences

• Most of them are associated with transposons of C. Elegans which are probably no longer active in the genome

– Local repeat sequences• Tandem, inverted, or simple sequence repeats

Reprinted from: The C. elegans Sequencing Consortium, Science, 282, 2012 (1998)

Page 7: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Chromosome Structure and Chromosome Structure and OrganizationOrganization

¤ The genome structure is remarkably uniform– Gene density is fairly constant across the chromosomes– No localized centromeres

• Like in yeast, but in contrast to all other eukaryotes

¤ Differences between the central portion and the arms of the chromosomes– The conserved eukaryotic genes are in the central portion– Repetitive DNA is more prevalent in the arms– Meiotic recombination is much higher on the chromosome

arms– suggest that DNA in the arms might be evolving more

rapidly than in the central regions

Reprinted from: The C. elegans Sequencing Consortium, Science, 282, 2012 (1998)

Page 8: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Distribution of sequence elements on Distribution of sequence elements on Chromosome IChromosome I

Reprinted from: The C. elegans Sequencing Consortium, Science, 282, 2012 (1998)

TTAGGC repeats

Tandem repeats

Inverted repeats

Yeast similarities

EST matches

Predicted genes

Central part armarm

Page 9: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

ConclusionsConclusions

¤ The complete sequence of the C. elegans genome has – provided a basis for the discovery of all the genes of a

multicellular eukaryotic organism• First inventory of eukaryotic genes

¤ C. elegans is a very effective model organism for – eukaryotic gene analysis: widely used for functional

genomics– human disease gene research– nematode pest control research

Reprinted from: The C. elegans Sequencing Consortium, Science, 282, 2012 (1998)

Page 10: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

The Genome Sequence of Caenorhabditis The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative briggsae: A Platform for Comparative

GenomicsGenomics

¤ Paper presents– high-quality draft (> 10-fold coverage) sequence of C.

briggsae– Comparative genome analysis of C. briggsae and C. elegans

• The two species diverged ~ 100 million years ago • morphologically indistinguishable• same chromosome number (5) and genome size (104 and

100Mb)

– Comparisons of the genomes of related species allows • More precise annotation of protein-coding genes• Discovery of noncoding genes, regulatory sequences and

“unknown” functional elements

Stein et. al., PLoS Biol 1: 166-192 (2003)

Page 11: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Colinearity of the Colinearity of the C. briggsae and C. elegans C. briggsae and C. elegans GenomesGenomes

¤ Alignment of sequences– ~80% Collinearity

• inversions and translocations

– blocks of synteny • orthologous genes

Reprinted from: Stein et. al., PLoS Biol 1: 166-192 (2003)

Page 12: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Reprinted from: Stein et. al., PLoS Biol 1: 166-192 (2003)

Annotation of Protein-Coding GenesAnnotation of Protein-Coding Genes

¤ Concordance of gene predictions refines gene models– C. elegans gene annotation improvement

• >6,000 (30%) genes exon addition, deletion or alterations• 1,300 new genes • 18,808 protein-coding genes C. elegans • 19,507 protein-coding genes C. briggsae

Most concordant

Page 13: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Comparison of Protein-Coding Genes Comparison of Protein-Coding Genes

¤ ~65% are orthologs in C. briggsae /C. elegans– gene pairs with a one-to-one correspondence in the two

species• have a common ancestor• have similar gene and coding sequence lengths • show ~80% percent identity at the protein level

¤ ~25% are paralogs in C. briggsae /C. elegans– proteins with multiple BLASTP matches in the other species

• Evolving gene families

¤ ~5% are orphans in C. briggsae /C. elegans– proteins that have no BLASTP matches in the other species

• 807 in C. elegans and 1061 in C. briggsae genes • Novel genes or pseudogenes?

Reprinted from: Stein et. al., PLoS Biol 1: 166-192 (2003)

Page 14: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Conservation of Operon Structure Conservation of Operon Structure

¤ C. elegans is unusual among animals in having operons– co-transcribed genes that make a polycistronic pre-mRNA

• subsequently separated into single-gene mRNAs by trans-splicing

– ~15% of C. elegans genes are encoded in ~1000 operons • contain 2–8 genes

– 96% of the operons are preserved intact in C. briggsae genome

¤ C. elegans operons comprise – co-regulated genes encoding proteins with related functions– specific functional classes of genes

• Transcription• RNA splicing• translation• RNA degradation

Reprinted from: Stein et. al., PLoS Biol 1: 166-192 (2003)

Page 15: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Repetitive sequencesRepetitive sequences

¤ The different genome sizes result from– Differences in repeat content

• 23.3 Mbp of the C. briggsae genome (104 Mbp) • 16.5 Mbp of the C. elegans genome (100.3 Mbp)

¤ Repeated DNA families– comprise DNA transposons or tandem arrays– Not orthologous between the two genomes

• suggests that most repeat elements in the two genomes postdate the divergence of the two species

– Accumulation of new repetitive elements is balanced by deletions so that

• genome sizes remain similar

Reprinted from: Stein et. al., PLoS Biol 1: 166-192 (2003)

Page 16: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Chromosome Structure and Chromosome Structure and OrganizationOrganization

¤ The centers contain orthologous (1) and essential genes (2)– Very long synteny blocks

¤ The arms contain orphan genes (3) and repetitive elements (4)– Short synteny blocks– The arms of the chromosomes are evolving more rapidly than the centers

Reprinted from: Stein et. al., PLoS Biol 1: 166-192 (2003)

1

2

3

4

Page 17: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

ConclusionsConclusions

¤ C. briggsae/C. elegans comparison shows that– despite large differences at the genomic level, C. briggsae and C.

elegans are morphologically almost indistinguishable – Many protein families are very dynamic

• ~200 families have expanded or contracted by > 2-fold• several hundred families are either novel or have diverged

extensively – share only ~ 50% of the non-coding sequence

¤ Sequencing of additional species is necessary to– identify candidate cis-regulatory elements based on sequence

conservation • the noise level in a two-way comparison is too high

Reprinted from: Stein et. al., PLoS Biol 1: 166-192 (2003)

Page 18: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

The Genome Sequence of Drosophila The Genome Sequence of Drosophila melanogastermelanogaster

¤ Draft sequence – (2000)– Whole-genome shotgun sequencing

• Sequence contained 128 physical gaps and 1630 sequence gaps

– Some regions were of poor sequence quality

– Demonstrated that whole-genome shotgun sequencing can be used for large eukaryotic genomes

• Adams et. al., Science, 287, 2185 (2000)

¤ Finished sequence – (2002)– BAC clone sequencing and gap filling– Sequence contains 7 physical gaps and 37 sequence gaps– Very accurate sequence: error rate of < 1/100.000

• Celniker et al., Genome Biol. ; 3: research 0079.1–0079.14 (2002)

Page 19: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

The The DrosophilaDrosophila Genome Genome

¤ The (female) Drosophila genome is ~176 Mb in size– Euchromatic part: 117 Mb completely sequenced– heterochromatic part: partly (~20Mb) sequenced

(unassembled)• Female: estimated at ~59 Mb • Male: the 40Mb Y chromosome is completely heterochromatic

Reprinted from: Adams et. al., Science, 287, 2185 (2000)

Page 20: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Euchromatin and HeterochromatinEuchromatin and Heterochromatin

¤ Euchromatin– Gene rich portion of the genome– Condenses during mitosis and de-condenses there after – Portion of the genome that can be cloned stably in BACs

¤ Heterochromatin– Consists mainly of simple sequence repeats (sattelite

DNAs), transposable elements, and tandem arrays of rRNA genes

– Remains condensed after mitosis– Gene poor portion of the genome– Contains elements required for centromere function

¤ Euchromatin - heterochromatin transition– is gradual at the molecular level

Reprinted from: Adams et. al., Science, 287, 2185 (2000)

Page 21: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

EuchromatEuchromatic Genomeic GenomeSequence Sequence

Reprinted from: Celniker et al., Genome Biol. ; 3: research 0079.1–0079.14 (2002)

Transposons

centromere

Page 22: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Gene Content of the Drosophila Gene Content of the Drosophila GenomeGenome

¤ Annotation of the draft genome sequence – Predicted 13,601 genes

• >10,000 genes (>75%) supported by EST and protein matches• This annotation was incomplete

– Large number of sequence gaps and sequencing errors

¤ Annotation of the finished genome sequence– Predicted same number of genes: 13,676

• Majority (85%) of the gene models revised

– Improved: a collection of 250.000 ESTs and full length cDNAs– Found only 17 pseudogenes ( much less than in C. elegans )– Heterochromatic part may contain ~500 genes

• The 20Mb sequenced contains ~300 protein coding genes

– Reannotation reveals many complex gene models • genes that do not fit the simple 5’UTR – exons – 3’UTR

Reprinted from: Adams et. al., Science, 287, 2185 (2000)

Page 23: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Complex Gene modelsComplex Gene models

¤ Alternatively splicing or alternative polyadenylation – At least ~20% of genes have >1 predicted transcript

• 65% encode two or more protein products • 35% differ in the UTRs - most have different 5’UTRs:

alternative promoters

Reprinted from: Misra et. al., Genome Biology, 3: research 0083.1-0083.22 (2002)

Page 24: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Complex Gene modelsComplex Gene models

¤ Dicistronic genes: 2 non-overlapping coding regions on one mRNA– 31 dicistronic gene pairs found represent an underestimate

Reprinted from: Misra et. al., Genome Biology, 3: research 0083.1-0083.22 (2002)

Page 25: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Complex Gene modelsComplex Gene models

¤ Overlapping genes– overlap of mRNAs on opposite strands: 15% of the genes

¤ Nested genes– genes included within introns of other genes: 15% of the

genes

Reprinted from: Misra et. al., Genome Biology, 3: research 0083.1-0083.22 (2002)

Page 26: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

ConclusionsConclusions

¤ The Drosophila genome sequence reveals – genes and proteins common to all multicellular organisms

• proteins involved in transcription control and metabolism are very similar to their human counterparts

¤ Drosophila provides an experimental platform for – the study of of human disease genes involved in

• DNA replication and repair• Metabolism of drugs and toxins.

Reprinted from: Adams et. al., Science, 287, 2185 (2000)

Page 27: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Comparative genome sequencing of Comparative genome sequencing of Drosophila pseudoobscuraDrosophila pseudoobscura: Chromosomal, : Chromosomal,

gene, and gene, and ciscis-element evolution -element evolution

¤ Paper presents– High quality draft genome sequence of a second Drosophila

species Drosophila pseudoobscura– Comparison with the genome sequence of D. melanogaster

• Evolutionary distance is well suited to study – Conserved and diverged genes– Conserved regulatory elements– Mechanisms of genome rearrangement

Richards et. al., Genome Res. 15: 1-18 (2005)

Page 28: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

The The D. pseudoobscuraD. pseudoobscura genome genome

¤ The euchromatic part is estimated at 131 Mb– ~17% larger than that of D. melanogaster– the additional sequence is

• primarily found in the intergenic regions• only partly caused by expansion of repeated DNA

¤ The two species show a very high gene synteny– Synteny blocks were identified

• on the basis of conservation of protein order• ~10.500/14.000 genes are true orthologs

– All synteny blocks are short and extremely mixed • extensive genome rearrangement in the two Drosophila

lineages

Reprinted from: Richards et. al., Genome Res. 15: 1-18 (2005)

Page 29: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

The synteny between The synteny between D. pseudoobscuraD. pseudoobscura and and D. melanogasterD. melanogaster

¤ The great majority of syntenic blocks are found – on the same chromosome arms in the two species– Chromosomal rearrangements in the two species

• Almost exclusively paracentric inversions

Reprinted from: Richards et. al., Genome Res. 15: 1-18 (2005)

Page 30: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Intraspecific inversion Intraspecific inversion breakpointsbreakpoints

¤ Repetitive sequences at the inversion breakpoints – Frequently comprise a breakpoint motif – Only found in D. pseudoobscura

Reprinted from: Richards et. al., Genome Res. 15: 1-18 (2005)

breakpoint motifs

Page 31: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Conservation of gene Conservation of gene segmentssegments

¤ Sequence conservation in noncoding regions– Is insufficient for the identification of regulatory sequences– Multiple genome sequence alignments will be needed

Reprinted from: Richards et. al., Genome Res. 15: 1-18 (2005)

Page 32: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

The Genome Sequence of the Malaria The Genome Sequence of the Malaria Mosquito Mosquito Anopheles gambiaeAnopheles gambiae

¤ The papers present– Draft genome sequence of the PEST strain of A. gambiae – A comparison of the genomes and proteomes of Anopheles

and Drosophila• Two very different diptera that diverged ~250MY ago

Sequence: Holt et. al., Science. 298: 129-149 (2002)

Comparison: Zdobnov et. al., Science, 298, 149 (2002)

Page 33: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Reprinted from: Holt et. al., Science. 298: 129-149 (2002)

The Mosquito Genome SequenceThe Mosquito Genome Sequence

¤ The draft genome spans 278 Mb– Covers the entire genome including the heterochromatic

DNA – Mosquito have larger genomes than Drosophila

• estimates from 250 to 500 Mb• Transposable elements constitute ~16% of the genome

– Drosophila experienced a recent genome size reduction

¤ The predicted number of genes is ~14.000– Very similar to Drosophila

¤ The comparison of the Anopheles and Drosophila genomes and proteomes reveals – considerable similarities and numerous differences– Reflects selection and adaptation to different ecologies and

life strategies

Page 34: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Reprinted from: Zdobnov et. al., Science, 298, 149 (2002)

Similarity at the protein levelSimilarity at the protein level

¤ Identified 4 proteins classes– True orthologs: ~45%

(~6.000)• Exhibit 1:1 relationship• Genes with conserved

function

– Paralogs: ~12%• Duplicated genes

– Homologs: ~~25%• Unclear relationship

– Orphans: 11% to 18%• New genes • Rapidly evolving genes

Page 35: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

The core of conserved proteinsThe core of conserved proteins

¤ Dynamics of Gene Structure in a span of 250MY– Exon lengths and intron frequencies are similar – introns in Drosophila have half the length of Anopheles

• systematic reduction of noncoding regions in Drosophila– Only 50% of the introns are perfectly conserved

• one intron gain or loss per gene per 125 My – Intron sequences diverge rapidly

• sequence similarity in <2% of the equivalent introns

Reprinted from: Zdobnov et. al., Science, 298, 149 (2002)

Page 36: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Reprinted from: Zdobnov et. al., Science, 298, 149 (2002)

Family expansions and reductionsFamily expansions and reductions

¤ Increases and decreases in protein families– Related to adaptations to life

strategies and environment

¤ Expansions or reductions are– Uneven: a single gene in one

species has many paralogs in the other

– More frequent in Anopheles– Examples:

• Cuticular proteins • Innate immunity genes

– FBN-like (fibrinogen) proteins massively expanded in Anopheles

Page 37: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Reprinted from: Zdobnov et. al., Science, 298, 149 (2002)

Genome RearrangementsGenome Rearrangements

¤ Microsynteny– 34% of the orthologs map to

~1000 microsynteny blocks• 2-3 genes per block (cfr.

fish-human)

¤ Macrosynteny– Both species have 5 five

major chromosomal arms – Clear 1:1 homologies

between the chromosomal arms

• Inversions much more frequent than translocations

Page 38: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

The Draft Genome of The Draft Genome of Ciona intestinalisCiona intestinalis:: Insights into chordate and vertebrate originsInsights into chordate and vertebrate origins

¤ Paper presents– Draft genome sequence of Ciona intestinalis, an ancestral

chordate– Chordates appear in the fossil record at the Cambrian

explosion• ~ 550 million years ago

Dehal et. al., Science, 298, 2157-2167 (2002)

550 MY

Tunicates

Page 39: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Reprinted from: Dehal et. al., Science, 298, 2157-2167 (2002)

Ciona intestinalisCiona intestinalis

¤ Tessile, hermaphroditic marine invertebrates ¤ Adults are simple filter feeders

– Encased in a fibrous tunic

Adult Juvenile showing the internal structures: •ds, digestive system•es, endostyle•ht, heart•os, neuronal complex; •pg, pharyngeal gill.

Page 40: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Reprinted from: Dehal et. al., Science, 298, 2157-2167 (2002)

Gene content and global Gene content and global comparisonscomparisons

¤ Predicted ~ 16.000 gene models– 75% of the predicted genes are supported by EST evidence– Genes are compact and densely packed: one gene per

7.5 kb

¤ Global comparisons– 60% of the genes have a detectable fly or worm homolog– 20% of the genes have no clear homolog

• tunicate- specific genes– 17% of the genes have a vertebrate homolog but no

detectable fly or worm homolog• Many are single-copy genes for the vertebrate gene

families – signalling and regulatory processes in development

– The gene content is a reasonable approximation of the ancestral chordate

Page 41: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Future PerspectivesFuture Perspectives

¤ Invertebrate genomes are sequenced at a rapid pace– Worms: 10 species of medical and agricultural importance

• Schistosoma, Ancylostoma, Ascaris, Globodera, Meloidogyne – Insects: ~20 species of primarily agricultural importance

• Mosquito’s, honey bee, lepidoptera and > 10 Drosophila species

– Protozoa: several species of medical importance• Trypanosoma, Theileria, Plasmodium, Leishmania,…

– Broad range of species• Sponge, sea urchin, Daphnia, Hydra, snail, lamprey,…

¤ Source: GOLDTM Genomes OnLine Database – http://www.genomesonline.org/

Page 42: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Recommended readingRecommended reading

¤ The nematode genome sequence• The C. elegans Sequencing Consortium, Science, 282, 2012 (1998)

¤ The Drosophila genome sequence • Adams et. al., Science, 287, 2185 (2000)

Page 43: Genome Biology and Biotechnology 2. The genome structures of invertebrates Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity

Further reading Further reading

¤ Nematode genomes– C. briggsae:

• Stein et. al., PLoS Biol 1: 166-192 (2003)

¤ Insect genomes– Finished Drosophila genome sequence:

• Celniker et al., Genome Biol. ; 3: research 0079.1–0079.14 (2002) – Annotation of the Drosophila genome :

• Misra et. al., Genome Biology, 3: research 0083.1-0083.22 (2002)– Draft Drosophila pseudoobscura genome sequence

• Richards et. al., Genome Res. 15: 1-18 (2005)– Draft mosquito genome sequence

• Holt et. al., Science. 298: 129-149 (2002)• Zdobnov et. al., Science, 298, 149 (2002)

¤ Ciona genome• Dehal et. al., Science, 298, 2157-2167 (2002)