lectures for 4y03 (a) efficiency in bacterial cells (b)codon usage bias (c) mitochondrial genomes
DESCRIPTION
Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias (c) Mitochondrial Genomes. Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Efficiency in the Genome Small organisms care about DNA replication time. No wasted space - PowerPoint PPT PresentationTRANSCRIPT
Lectures for 4Y03
(a) Efficiency in Bacterial Cells(b)Codon Usage Bias
(c) Mitochondrial Genomes
Paul Higgs
Dept. of Physics, McMaster University, Hamilton, Ontario.
Hou and Lin – PLoS ONE 2009
Efficiency in the Genome
Small organisms care about DNA replication time.No wasted space
1 gene per 1000 bases in prokaryotes
Haemophilus influenzae1762 genes in 1.8 Mb
Human23000 genes in 3080 Mb
Eukaryotic genomes have lots of transposons and repetitive sequences.
Sharp et al (2005)
Most genes in bacteria are single copy genes,but rRNA and tRNA genes are often duplicated.
rRNA
Duplicate rRNA genes means faster synthesis of rRNA molecules
Ribosomal protein genes are not duplicated.
Many proteins can be synthesized from each mRNA
mRNA
protein
Correlation between time of colony appearance and rRNA operon copy number.
Klappenbach J A et al. Appl. Environ. Microbiol. 2000;66:1328-1333
Rocha (2004) Genome Research
More tRNA genes in rapidly growing bacteria
How many tRNA gene copies are present in genomes?
Organism # gene copies
# different anticodons
minimum doubling time (hrs)
animal mitochondria 22 22 -Buchneria aphidicola 31 29 36
Agrobacterium tumefaciens 53 39 3
Corynebacterium glutamicum 60 42 1.2
Escherichia coli 86 39 0.35Vibrio vulnificus 112 32 0.16Saccharomyces cerevisiae 273 41 -
Caenorhabditis elegans 605 47 -
Human 448 49 -
Evidence for Translational selection #1 More gene copies More tRNA molecules More rapidly multiplying bacteria
Asp TotalAnticodon GGCUGCCGCGUCUUGCUGGGGUGGCGGGGAUGACGA
Epsilon Proteobacteria
Campylobacter jejuni 1 3 0 2 1 0 0 1 0 1 1 0 42Helicobacter pylori J99 1 1 0 1 1 0 1 1 0 1 1 0 36Gamma Proteobacteria
Pseudomonas aeruginosa 2 4 0 4 1 0 1 1 1 1 1 1 63Vibrio parahaemolyticus 1 4 0 6 6 0 0 3 0 1 4 0 126Haemophilus influenzae 1 3 0 3 2 0 0 2 0 1 2 0 56Buchnera aphidicola 1 1 0 1 1 0 0 1 0 1 1 0 32#Blochmannia floridanus 0 1 0 1 1 0 0 1 0 0 1 1 36#Wigglesworthia glossinidia 1 1 0 1 1 0 0 1 0 1 1 0 34#Escherichia coli K12 2 3 0 3 2 2 1 1 1 2 1 1 86Alpha Proteobacteria
Agrobacterium tumefaciens 1 4 0 2 1 1 1 1 1 1 1 1 53Sinorhizobium meliloti 1 3 1 2 1 1 1 1 1 1 1 1 51Rickettsia prowazekii 1 1 0 1 1 0 0 1 0 1 1 0 33#Wolbachia (D. mel) 0 1 0 1 1 0 0 1 0 1 1 0 34#Caulobacter crescentus 1 2 1 2 1 0 1 1 1 1 1 1 51Mitochondria
Reclinomonas americana 0 1 0 1 1 0 0 1 0 0 1 0 26Homo sapiens 0 1 0 1 1 0 0 1 0 0 1 0 22
Pro Ser-UCNAla Gln
Changes in tRNA content of genomes from bacteria to mitochondria
Only one type of tRNA remains for each codon family in human mitochondria. Still need 2 tRNAs for Leu and Ser. Therefore 22 in total.
# denotes intracellular parasite or endosymbiont. Small size genomes in bacteria also have reduced numbers of tRNAs.
More tRNA gene copies for amino acids that are more frequent in protein sequences
Duret (2000)
GAU
GAC
GAA
GAG
AAU
AAC
AAA
AAG
CAU
CAC
CAA
CAG
UAU
UAC
UAA
UAG
Asp
Asp
Glu
Glu
Asn
Asn
Lys
Lys
His
His
Gln
Gln
Tyr
Tyr
*
*
GCU
GCC
GCA
GCG
ACU
ACC
ACA
ACG
CCU
CCC
CCA
CCG
UCU
UCC
UCA
UCG
Ala
Ala
Ala
Ala
Thr
Thr
Thr
Thr
Ser
Ser
Ser
Ser
GUU
GUC
GUA
GUG
AUU
AUC
AUA
AUG
CUU
CUC
CUA
CUG
UUU
UUC
UUA
UUG
Val
Val
Val
Val
Ile
Ile
Ile
Met
Leu
Leu
Leu
Leu
Phe
Phe
Leu
Leu
GGU
GGC
GGA
GGG
Gly
Gly
Gly
Gly
AGU
AGC
AGA
AGG
Ser
Ser
Arg
Arg
CGU
CGC
CGA
CGG
Arg
Arg
Arg
Arg
UGU
UGC
UGA
UGG
Cys
Cys
*
Trp
Standard Genetic Code
Pro
Pro
Pro
Pro
tRNA structure
A C G 3 2 1
Codon-anticodon pairing
5’3’
Wobble position
How many different tRNAs do we need ?
Two codon U+C families
codon anticodonAsp GAUAsp GAC GUC
Two codon A+G families
codon anticodonGlu GAA UUCGlu GAG CUC
Always a wobble-G tRNA only
Always a wobble-U tRNA Sometimes a wobble-C tRNA
GC
U GUb
GCb
Selection-Mutation-Drift Theory
Li (1987), Shields (1990), Bulmer (1991)
CUu
u(1-)
1 1+s
1)exp(
)exp()(
S
SS
nn
n
UC
C
In absence of selection the relative frequency of C is
CU
C
nn
n
In presence of selection the relative frequency of C is
where S = 2Nes.
= when S = 0
1 when S is large
S NGMycoplasma penetrans Asn 0.236 0.403 0.78 1
Asp 0.140 0.201 0.43 1Agrobacterium tumefaciens Asn 0.561 0.841 1.42 1
Asp 0.491 0.764 1.21 2Sinorhizobium meliloti Asn 0.649 0.817 0.88 1
Asp 0.642 0.732 0.42 2Corynebacterium glutamicum Asn 0.666 0.952 2.29 2
Asp 0.444 0.722 1.18 2Escherichia coli Asn 0.550 0.875 1.74 4
Asp 0.372 0.657 1.17 3Bacillus subtilis Asn 0.435 0.775 1.50 4
Asp 0.360 0.470 0.45 4Schizosaccharomyces pombe Asn 0.343 0.718 1.58 6
Asp 0.292 0.438 0.64 8Saccharomyces cerevisiae Asn 0.410 0.860 2.18 10
Asp 0.350 0.578 0.93 16Caenorhabditis elegans Asn 0.378 0.575 0.80 20
Asp 0.324 0.559 0.97 27
low
ClowU
lowC
nn
n
lowC
highU
lowU
highC
nn
nn
S
SS ln
1
)(1
)(ln
Estimate S from sequence data – U+C familiesAssume S is negligible in low expression genes, but significant in high expression genes.
S is positive in all these examples.C codon is always preferred.
GC
U GUb
GCb
1)exp(
)exp()(
S
SS
nn
nhighU
highC
highC
S NU:NCMycoplasma penetrans Gln 0.063 0.017 -1.33 1:0
Glu 0.094 0.033 -1.11 1:0Deinococcus radiodurans Gln 0.828 0.900 0.63 1:1
Glu 0.561 0.376 -0.75 1:0Agrobacterium tumefaciens Gln 0.831 0.982 2.40 1:1
Glu 0.427 0.273 -0.69 2:0Clostridium perfringens Gln 0.138 0.029 -1.67 2:0
Glu 0.230 0.189 -0.25 3:0Lactobacillus plantarum Gln 0.360 0.046 -2.45 2:1
Glu 0.245 0.030 -2.35 2:1Sinorhizobium meliloti Gln 0.818 0.973 2.09 1:1
Glu 0.579 0.384 -0.79 3:1Corynebacterium glutamicum Gln 0.615 0.974 3.16 1:2
Glu 0.437 0.689 1.04 1:3Escherichia coli Gln 0.653 0.813 0.84 2:2
Glu 0.311 0.244 -0.34 4:0Bacillus subtilis Gln 0.488 0.200 -1.34 4:0
Glu 0.320 0.227 -0.47 6:0Schizosaccharomyces pombe Gln 0.285 0.156 -0.77 4:2
Glu 0.322 0.565 1.01 4:6Saccharomyces cerevisiae Gln 0.307 0.009 -3.89 9:1
Glu 0.300 0.028 -2.70 14:2Caenorhabditis elegans Gln 0.343 0.307 -0.16 20:7
Glu 0.375 0.436 0.25 17:24
Estimate S from sequence data – A+G families
lowG
highA
lowA
highG
nn
nnS ln
Positive S means G is preferred.Negative S means A is preferred.
This depends on which tRNA genes are present.
Selection on codon usage is strongest in the organisms where rRNAs and tRNAs are highly duplicated. These are the fastest growing ones. Sharp et al (2005).
growth ratecodon bias between high and low expression genes
tRNA gene copy number
Selection on codon usage is strongest in the organisms where tRNAs are highly duplicated. These are the fastest growing ones. Ran and Higgs (2012).
log10(protein production rate)
Codon frequencies in yeast vary smoothly as a function of protein production rate.
Shah and Gilchrist (PNAS 2011)
Energetic efficiency of amino acid synthesis
Bacteria use cheaper amino acids in highly expressed genes
“Expression Level”
Akashi and Gojobori (2002) PNAS
Mitochondria are organelles inside eukaryotic cells.
They are the site of oxidative phosphorylation and ATP synthesis.
They contain their own genome distinct from the DNA in the nucleus.
Typical animal mitochondrial genomes are short and circular (~16,000 bases).
They usually contain:
2 rRNAs
22 tRNAs
13 proteins
Small Scale Evolution –
Variation in Frequencies of Bases and Amino Acids
G G C A A A A T
C C G T T T T A
The two strands of DNA are complementary.
Freq of A on one strand = Freq of T on the other
Freq of C on one strand = Freq of G on the other
If the two strands are subject to the same mutational processes then the freq of any base should be equal (statistically) on both strands.
This means that A = T and C = G on any one strand.
In this case base frequencies can be described by a single variable: G+C content.
BUT – mitochondrial genomes have an asymmetrical replication process. The two strands are not equivalent.
The frequencies of bases on the two strands are not equal.
On any one strand the frequencies of the four bases may vary independently.
Mitochondrial genome replication
Figure from
Faith & Pollock (2003) Genetics
Rank genes in order of increasing time spent single strandedCOI < COII < ATP8 < ATP6 < COIII < ND3 < ND4L < ND4 < ND1 < ND5 <ND2 < Cytb
ND6 is on the other strand
The Genetic Code maps the 64 DNA codons to the 20 amino acids.
(This version applies to Vertebrate Mitochondria) SECOND POSITION
T C A G THIRDPOSITION
FIRST
POSITION
T TTT F 1 TTC F
TCT STCC S 6TCA STCG S
TAT Y 10TAC Y
TGT C 17TGC C
T C A G
TTA L 2TTG L
TAA StopTAG Stop
TGA W 18TGG W
C CTT LCTC LCTA LCTG L
CCT PCCC P 7CCA PCCG P
CAT H 11CAC H
CGT RCGC R 19CGA RCGG R
T C A G
CAA Q 12CAG Q
A ATT I 3ATC I
ACT TACC T 8ACA TACG T
AAT N 13AAC N
AGT S 20AGC S
T C A G
ATA M 4ATG M
AAA K 14AAG K
AGA StopAGG Stop
G GTT VGTC V 5GTA VGTG V
GCT AGCC A 9GCA AGCG A
GAT D 15GAC D
GGT GGGC G 21GGA GGGG G
T C A G
GAA E 16GAG E
4-codon families where the third position is synonymous
Base frequencies at FFD sites in each gene (averaged over mammals)
Deamination: C to U and A to G on the heavy strand
Base frequencies at FFD sites are controlled by mutation.
Base frequencies at 1st and 2nd positions are influenced by mutation and selection
Model fitting (Data from Fish) – assume a fraction of fixed sites and a fraction of neutral sites.
Selection at 1st position is weaker than at 2nd
Mutation pressure is sufficient to cause change in amino acid frequencies.
Second Position
T C A G Third Pos.
First
Position
T F 1 F
SS 6SS
Y 10Y
C 17C
T C A G
LLL 2LLL
StopStop
W 18W
C PP 7PP
H 11H
RR 19RR
T C A G
Q 12Q
A I 3I
TT 8TT
N 13N
S 20S
T C A G
M 4M
K 14K
StopStop
G VV 5VV
AA 9AA
D 15D
GG 21GG
T C A G
E 16E
GAU
GAC
GAA
GAG
AAU
AAC
AAA
AAG
CAU
CAC
CAA
CAG
UAU
UAC
UAA
UAG
12
51
63
15
29
131
84
9
18
79
82
8
35
89
4
3
D
D
E
E
N
N
K
K
H
H
Q
Q
Y
Y
*
*
39
123
79
5
50
155
132
10
37
119
52
7
29
99
81
7
GCU
GCC
GCA
GCG
ACU
ACC
ACA
ACG
CCU
CCC
CCA
CCG
UCU
UCC
UCA
UCG
22
45
61
8
112
196
165
32
65
167
276
42
69
139
65
11
A
A
A
A
T
T
T
T
P
PPP
S
S
S
S
GUU
GUC
GUA
GUG
AUU
AUC
AUA
AUG
CUU
CUC
CUA
CUG
UUU
UUC
UUA
UUG
V
V
V
V
I
I
M
M
L
L
L
L
F
F
L
L
16
87
61
19
GGU
GGC
GGA
GGG
G
G
G
G
11
37
1
0
AGU
AGC
AGA
AGG
S
S
*
*
6
26
28
0
CGU
CGC
CGA
CGG
R
R
R
R
5
17
90
9
UGU
UGC
UGA
UGG
C
C
W
W
Homo sapiens Strand = + 3624 codons
1.654GG0.552CG1.433UG
1.027GA0.906CA1.136UA
1.005GC1.163CC0.743UC
0.763GU1.101CU0.939UU
Mammals - 23
1.891GG0.554CG1.274UG
1.145GA0.938CA1.030UA
0.878GC1.205CC0.756UC
0.605GU0.939CU1.250UU
Fish - 23
1.369GG1.293AG0.546CG0.856UG
0.776GA0.974AA0.945CA1.206UA
0.873GC0.797AC1.363CC0.994UC
1.115GU0.996AU1.082CU0.855UU
Mammals - 31
1.499GG1.228AG0.609CG1.049UG
0.758GA1.135AA0.849CA1.096UA
0.839GC0.739AC1.371CC0.918UC
0.911GU0.907AU1.162CU0.933UU
Fish - 31
Frequency ratios
))q(Yq(X
)Yp(X=)Yr(X
32
3232
Codon bias seems to be a dinucleotide mutational effect in mitochondria, rather than an effect of translational selection.
CpG effect.... (increased rate of C to U mutations in CG dinucleotides. Expect high UG and CA)
DNA binding proteins....
The OGRe front page: http://ogre.mcmaster.ca
Sequence information for OGRe is taken from GenBank.
LOCUS NC_001922 16646 bp DNA circular VRT 20-SEP-2002DEFINITION Alligator mississippiensis mitochondrion, complete genome.ACCESSION NC_001922VERSION NC_001922.1 GI:5835540KEYWORDS .SOURCE mitochondrion Alligator mississippiensis (American alligator) ORGANISM Alligator mississippiensis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archosauria; Crocodylidae; Alligatorinae; Alligator.REFERENCE 1 (bases 1 to 16646) AUTHORS Janke,A. and Arnason,U. TITLE The complete mitochondrial genome of Alligator mississippiensis and the separation between recent archosauria (birds and crocodiles) JOURNAL Mol. Biol. Evol. 14 (12), 1266-1272 (1997) MEDLINE 98066357 PUBMED 9402737FEATURES Location/Qualifiers source 1..16646 /organism="Alligator mississippiensis" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:8496" /tissue_type="liver" /dev_stage="adult" rRNA 1..976 /product="12S ribosomal RNA" tRNA 977..1044 /product="tRNA-Val" /anticodon=(pos:1009..1011,aa:Val) rRNA 1046..2635 /product="16S ribosomal RNA" tRNA 2636..2710 /product="tRNA-Leu" /note="codons recognized: UUR" /anticodon=(pos:2672..2674,aa:Leu) gene 2711..3676 /gene="ND1" CDS 2711..3676 /gene="ND1"
1 caacagactt agtcctggtc ttttcattag ctagtactca acttatacat gcaagcatcc 61 gcgaaccagt gagaacaccc tacaagtctg acagacgaat ggagccggca tcaggcacat 121 caaccgatag cccaaaacgc ctagcccagc cacaccccca agggtctcag cagtgattaa 181 ccttaaacca taagcgaaag cttgatttag ttagagtaga tatagaggcg gtcaactctc 241 gtgccagcaa ccgcggttag acgaaaacct caagttaatt gacaaacggc gtaaattgtg 301 gctagaactc tatctccccc attagtgcag atacggtatc acagtagtga taaacttcat 361 cacaccgcaa acatcaacac aaaactggcc ctaatctcaa agatgtactc gattccacga 421 aagctgagaa acaaactggg attagatacc ccactatgct cagcccttaa cattggtgta 481 gtacacaaca gactaccctc gccagagaat tacgagcccc gcttaaaact caaaggactt 541 gacggcactt taaacccccc tagaggagcc tgtcctataa tcgacagtac acgttacacc 601 cgaccacctt tagcctactc agtctgtata ccgccgtcgc aagcccgtcc catttgaggg 661 aaacaaaacg cgcgcaacag ctcaaccgag ctaacacgtc aggtcaaggt gcagccaaca 721 aggtggaaga gatgggctac attttctcaa catgtagaaa tattcaacgg agagccctat 781 gaaatacagg actgtcaaag ccggatttag cagtaaactg ggaaagaata cctagttgaa 841 gtcggtaacg aagtgcgtac acaccgcccg tcaccctcct cgaacccaac aaaatgccca 901 aacaacaggc acaatgttgg gcaagatggg gaaagtcgta acaaggtaag cgtaccggaa 961 ggtgcacttg gaacatcaaa atgtagctta aatttaaagc attcagttta cacctgaaaa 1021 agtcccacca tcggaccatt ttgaaaccca tatctagccc tacctccttt caacatgctt
An example of a GenBank file
Complete mitochondrial genome of the Alligator
OGRe (= Organellar Genome Retrieval) is a relational database.
available at http://ogre.mcmaster.ca
More than 1000 complete animal mitochondrial genomes.
Efficient means of storage and retrieval of information. Uses PostgreSQL
Schema defines relationships between different types of information.
genome_codecodonstrandusage
codon_usage
group_nameparentalternative_parenthas_children
classification
genome_codeoffstfilename
fileindex
genome_codemedline_code
citations
species_codeclassificationgroup_namelatin_namecommon_name
species
feature_idstartstopstrand
feature_location
feature_idamino_acidanticodoncodon
trna
feature_namedescription
feature_descriptions
feature_idgenome_codetypefeature_namenotesdescriptionalignment_file
feature
genomegenome_codespecies_codegenome_typencbiacdescriptionncbidateaccession_datelastmodifiedlastmodifiedbynotesgenome_lengthgenetic_codea_contentc_contentg_contentt_contentfinalgene_ordergene_order_notrnarna_a_contentrna_c_contentrna_g_contentrna_t_content
Species may be selected individually from an alphabetical list
Or taxa may be selected from a hierarchy. Here the Arthropods have been expanded and the Myriapods and Crustaceans have been selected
On the ogre web site, a visual comparison can be made of any two selected species. Colour is used to indicate conserved blocks of genes.
Alligator and Bird genomes differ by interchange of two tRNA genes (red and yellow)… …and by translocation of the two
genes in the blue block.
Large Scale – Evolution of Gene Order in Whole Genomes
Genome reshuffling mechanisms
Inversions:
A B C D A D
-C -B
A
BC
D
Translocations:
A (B C) D A D B C
Duplications and deletions
A B C B C DA B C D A C B D/ /
Example of an inversion
Example of a translocation
The T and –F genes are duplicated in Cordylus warreni.
If the first T and the second –P were deleted, the relative position of T and –P would change.
Sometimes things go crazy ….
Drosophila and Thrips are both insects
yet there are 30 breakpoints for only 37 genes
i.e. almost nothing in common.
OGRe contains gene orders as strings. This allows searching and comparison.
231 unique gene orders have been found in 858 species.
The standard vertebrate order is shared by 398 species (including humans). There are many other species with unique gene orders.
Some species conserve gene order over 100s of millions of years. Others get scrambled in a few million.
Still to do (new project) :
- estimate relative rates of different rearrangement processes
- predict most likely ancestral gene orders
- use gene order evidence in phylogenetics
0.1
TerebratulinaKatharina
LimulusHeptathela
OrnithoctonusHabronattus
VarroaCarios
Ornithodoros moubataOrnithodoros porcinus
RhipicephalusAmblyomma
HaemaphysalisIxodes holocyclus
Ixodes hexagonusIxodes persulcatus
ScutigeraLithobius
ThyropygusNarceus
SpeleonectesVargula
HutchinsoniellaTigriopus
ArmilliferArgulus
TetraclitaPollicipes
PenaeusCherax
PortunusPanulirus
PagurusArtemia
TriopsDaphnia
TetrodontophoraGomphiocephalus
TricholepidionLocusta
AleurodicusTriatoma
PhilaenusThrips
LepidopsocidHeterodoxus
PyrocoeliaTriboliumCrioceris
ApisMelipona
OstriniaAntheraeaBombyx
AnophelesDrosophilaChrysomya
Images coutesy of University of Nebraska, Dept.of Entomology.
http://entomology.unl.edu/images/
Arthropod phylogenetics
Very difficult due to strong variation in rates of evolution between species.
tRNA tree – branch lengths optimized on fixed consensus topology
Long branch species are problematic if tree is not fixed.
Images coutesy of University of Nebraska, Dept.of Entomology.
http://entomology.unl.edu/images/
protein tree – branch lengths optimized on fixed consensus topology
0.1
TerebratulinaKatharina
LimulusHeptathela
OrnithoctonusHabronattusVarroa
CariosOrnithodoros moubata
Ornithodoros porcinusRhipicephalus
AmblyommaHaemaphysalis
Ixodes holocyclusIxodes hexagonus
Ixodes persulcatusScutigera
LithobiusThyropygus
NarceusSpeleonectes
VargulaHutchinsoniella
TigriopusArmillifer
ArgulusTetraclitaPollicipes
PenaeusCherax
PortunusPanulirus
PagurusArtemia
TriopsDaphnia
TetrodontophoraGomphiocephalus
TricholepidionLocusta
AleurodicusTriatoma
PhilaenusThrips
LepidopsocidHeterodoxus
PyrocoeliaTribolium
CriocerisApis
MeliponaOstrinia
AntheraeaBombyx
AnophelesDrosophilaChrysomya
Same species are on long branches in proteins as in RNAs
Relative rate test for sequence evolution - Templeton
0 1 2
Three aligned sequences with 0 known to be the outgroup. Test whether rates of evolution in branch 1 and branch 2 are equal.
m1 = number of sites where 0 and 2 are the same and 1 is different.m2 = number of sites where 0 and 1 are the same and 2 is different.
)(
)(
21
2212
mm
mmm
Calculate:
Should follow a chi squared distribution with one degree of freedom.
Many pairs of related species found to have different rates in the mitochondrial sequences.
The gene order of the ancestral arthropod is thought to be the same as that of the horseshoe crab Limulus.Image courtesy of Marine Biology Lab, Woods Hole. www.mbl.edu/animals/Limulus
Gene Order sometimes gives evidence of phylogenetic relationships
The same translocation of tRNA-Leu is found in insects and crustaceans but not myriapods and chelicerates. Strong argument for the group Pancrustacea (= insects plus crustaceans)
Moderately rearranged
Completely scrambled
Breakpoints Inversions Dup/Del tRNA ProteinTigriopus japonicus 35 32 0 2.15 1.34Heterodoxus macropus 35 32 0 1.39 1.83Thrips imaginis 32 29 1 1.34 1.32Pollicipes polymerus 22 16 2 0.69 0.59Cherax destructor 20 16 0 0.54 0.57Tetraclita japonica 20 16 0 0.66 0.57Argulus americanus 20 18 0 0.72 1.12Speleonectes tulumensis 19 16 1 0.83 0.93Apis mellifera 19 16 0 0.84 1.50Hutchinsoniella macracantha 18 16 0 0.86 0.87Pagurus longicarpus 18 12 0 0.65 0.45Vargula hilgendorfii 17 15 0 0.79 1.41Lepidopsocid RS-2001 17 16 0 0.60 0.59Habronattus oregonensis 16 14 0 1.48 1.09Ornithoctonus huwena 15 13 0 1.95 1.23Scutigera coleoptrata 15 15 0 0.48 0.44Melipona bicolor 14 8 2 0.93 1.66Varroa destructor 14 12 0 0.83 1.09Armillifer armillatus 13 12 0 0.85 1.73Narceus annularus 9 9 0 0.63 0.58Thyropygus sp. 9 9 0 0.49 0.46Aleurodicus dugesii 8 5 1 1.04 1.54Anopheles gambiae 8 6 0 0.41 0.47Tetrodontophora bielanensis 8 6 0 0.77 0.70Artemia franciscana 7 5 0 0.63 0.64Rhipicephalus sanguineus 7 6 0 0.82 0.96Amblyomma triguttatum 7 6 0 0.88 1.00Haemaphysalis flava 7 6 0 0.82 0.96Locusta migratoria 6 5 0 0.38 0.52Bombyx mori 6 5 0 0.51 0.54Portunus trituberculatus 6 5 0 0.51 0.44Ostrinia furnacalis 6 5 0 0.49 0.48Tribolium castaneum 6 5 0 0.55 0.53Antheraea pernyi 6 5 0 0.50 0.54Chrysomya putoria 4 2 1 0.36 0.42Tricholepidion gertschi 3 2 0 0.44 0.39Daphnia pulex 3 2 0 0.62 0.51Pyrocoelia rufa 3 2 0 0.52 0.77Drosophila melanogaster 3 2 0 0.37 0.42Panulirus japonicus 3 2 0 0.58 0.53Triatoma dimidiata 3 2 0 0.59 0.50Lithobius forficatus 3 3 0 1.13 0.61Philaenus spumarius 3 2 0 0.69 0.58Gomphiocephalus hodgsoni 3 2 0 0.69 0.62Penaeus monodon 3 2 0 0.34 0.32Crioceris duodecimpunctata 3 2 0 0.55 0.58Triops cancriformis 3 2 0 0.42 0.40Limulus polyphemus 0 0 0 0.36 0.40Ixodes persulcatus 0 0 0 0.72 0.82Ixodes holocyclus 0 0 0 0.76 0.83Ixodes hexagonus 0 0 0 0.74 0.90Carios capensis 0 0 0 0.70 0.79Ornithodoros porcinus 0 0 0 0.67 0.86Heptathela hangzhouensis 0 0 0 0.76 0.87Ornithodoros moubata 0 0 0 0.68 0.88
Very High
High
Medium
Low
Species ranked according to breakpoint distance from ancestor.
R =0.99 R =0.59
R =0.53 R =0.69
min mean max min mean maxVery High 1.33 1.62 2.14 1.32 1.50 1.83High 0.48 0.86 1.94 0.44 0.99 1.73Moderate 0.38 0.63 1.04 0.43 0.69 1.54Low 0.34 0.60 1.13 0.32 0.62 0.90
tRNA only High 0.66 1.01 1.94 0.57 1.15 1.73tRNA only Mod/Low 0.34 0.60 1.13 0.32 0.63 1.54
tRNA distance protein distanceBreakpoint category
Highly rearranged genomes have highly divergent sequences.
Rates of sequence evolution and genome rearrangement are correlated.
Both are very non-clocklike.
There are many species where only tRNAs have changed position.
Species with highly reshuffled tRNAs have high rates of sequence evolution in both tRNAs and proteins.
Relative rate of genome rearrangement (Xu et al 2006)
0 1 2
Three gene orders with 0 known to be the outgroup. Test whether rates of rearrangement in branch 1 and branch 2 are equal.
n1 = number of gene couples in 0 and 2 but not in 1 – i.e. New breakpoint in 1n2 = number of gene couples in 0 and 1 but not in 2 – i.e. New breakpoint in 2
)(
)(
21
2212
nn
nnn
Calculate:
Should follow a chi squared distribution with one degree of freedom.
We took pairs where there was a significant difference in rearrangement rates (χn
2 was large) and showed that there was a significant difference in substitution rates too (χm
2 was large).
Good Guys
Credits:
Daniel Jameson/ Bin Tang – Database design and management
Daniel Urbina – Base and Amino Acid Frequencies
Wei Xu – Gene Order Analysis and Arthropod Phylogenies
Bad Guys
Gene order is sometimes a strong phylogenetic markerbut the Bad Guys are problematic in gene order analysis as well as phylogenetics.
Why does the evolutionary rate speed up in these isolated groups of species?Why to tRNA genes move more frequently?What are the relative rates of inversion and translocation?