lectures for 4y03 (a) efficiency in bacterial cells (b)codon usage bias (c) mitochondrial genomes

47
Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias (c) Mitochondrial Genomes Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario.

Upload: graham

Post on 02-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias (c) Mitochondrial Genomes. Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Efficiency in the Genome Small organisms care about DNA replication time. No wasted space - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Lectures for 4Y03

(a) Efficiency in Bacterial Cells(b)Codon Usage Bias

(c) Mitochondrial Genomes

Paul Higgs

Dept. of Physics, McMaster University, Hamilton, Ontario.

Page 2: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Hou and Lin – PLoS ONE 2009

Efficiency in the Genome

Small organisms care about DNA replication time.No wasted space

1 gene per 1000 bases in prokaryotes

Haemophilus influenzae1762 genes in 1.8 Mb

Human23000 genes in 3080 Mb

Eukaryotic genomes have lots of transposons and repetitive sequences.

Page 3: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Sharp et al (2005)

Most genes in bacteria are single copy genes,but rRNA and tRNA genes are often duplicated.

Page 4: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

rRNA

Duplicate rRNA genes means faster synthesis of rRNA molecules

Ribosomal protein genes are not duplicated.

Many proteins can be synthesized from each mRNA

mRNA

protein

Page 5: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Correlation between time of colony appearance and rRNA operon copy number.

Klappenbach J A et al. Appl. Environ. Microbiol. 2000;66:1328-1333

Rocha (2004) Genome Research

More tRNA genes in rapidly growing bacteria

Page 6: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

How many tRNA gene copies are present in genomes?

Organism # gene copies

# different anticodons

minimum doubling time (hrs)

animal mitochondria 22 22 -Buchneria aphidicola 31 29 36

Agrobacterium tumefaciens 53 39 3

Corynebacterium glutamicum 60 42 1.2

Escherichia coli 86 39 0.35Vibrio vulnificus 112 32 0.16Saccharomyces cerevisiae 273 41 -

Caenorhabditis elegans 605 47 -

Human 448 49 -

Evidence for Translational selection #1 More gene copies More tRNA molecules More rapidly multiplying bacteria

Page 7: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Asp TotalAnticodon GGCUGCCGCGUCUUGCUGGGGUGGCGGGGAUGACGA

Epsilon Proteobacteria

Campylobacter jejuni 1 3 0 2 1 0 0 1 0 1 1 0 42Helicobacter pylori J99 1 1 0 1 1 0 1 1 0 1 1 0 36Gamma Proteobacteria

Pseudomonas aeruginosa 2 4 0 4 1 0 1 1 1 1 1 1 63Vibrio parahaemolyticus 1 4 0 6 6 0 0 3 0 1 4 0 126Haemophilus influenzae 1 3 0 3 2 0 0 2 0 1 2 0 56Buchnera aphidicola 1 1 0 1 1 0 0 1 0 1 1 0 32#Blochmannia floridanus 0 1 0 1 1 0 0 1 0 0 1 1 36#Wigglesworthia glossinidia 1 1 0 1 1 0 0 1 0 1 1 0 34#Escherichia coli K12 2 3 0 3 2 2 1 1 1 2 1 1 86Alpha Proteobacteria

Agrobacterium tumefaciens 1 4 0 2 1 1 1 1 1 1 1 1 53Sinorhizobium meliloti 1 3 1 2 1 1 1 1 1 1 1 1 51Rickettsia prowazekii 1 1 0 1 1 0 0 1 0 1 1 0 33#Wolbachia (D. mel) 0 1 0 1 1 0 0 1 0 1 1 0 34#Caulobacter crescentus 1 2 1 2 1 0 1 1 1 1 1 1 51Mitochondria

Reclinomonas americana 0 1 0 1 1 0 0 1 0 0 1 0 26Homo sapiens 0 1 0 1 1 0 0 1 0 0 1 0 22

Pro Ser-UCNAla Gln

Changes in tRNA content of genomes from bacteria to mitochondria

Only one type of tRNA remains for each codon family in human mitochondria. Still need 2 tRNAs for Leu and Ser. Therefore 22 in total.

# denotes intracellular parasite or endosymbiont. Small size genomes in bacteria also have reduced numbers of tRNAs.

Page 8: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

More tRNA gene copies for amino acids that are more frequent in protein sequences

Duret (2000)

Page 9: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

GAU

GAC

GAA

GAG

AAU

AAC

AAA

AAG

CAU

CAC

CAA

CAG

UAU

UAC

UAA

UAG

Asp

Asp

Glu

Glu

Asn

Asn

Lys

Lys

His

His

Gln

Gln

Tyr

Tyr

*

*

GCU

GCC

GCA

GCG

ACU

ACC

ACA

ACG

CCU

CCC

CCA

CCG

UCU

UCC

UCA

UCG

Ala

Ala

Ala

Ala

Thr

Thr

Thr

Thr

Ser

Ser

Ser

Ser

GUU

GUC

GUA

GUG

AUU

AUC

AUA

AUG

CUU

CUC

CUA

CUG

UUU

UUC

UUA

UUG

Val

Val

Val

Val

Ile

Ile

Ile

Met

Leu

Leu

Leu

Leu

Phe

Phe

Leu

Leu

GGU

GGC

GGA

GGG

Gly

Gly

Gly

Gly

AGU

AGC

AGA

AGG

Ser

Ser

Arg

Arg

CGU

CGC

CGA

CGG

Arg

Arg

Arg

Arg

UGU

UGC

UGA

UGG

Cys

Cys

*

Trp

Standard Genetic Code

Pro

Pro

Pro

Pro

Page 10: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

tRNA structure

A C G 3 2 1

Codon-anticodon pairing

5’3’

Wobble position

Page 11: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

How many different tRNAs do we need ?

Two codon U+C families

codon anticodonAsp GAUAsp GAC GUC

Two codon A+G families

codon anticodonGlu GAA UUCGlu GAG CUC

Always a wobble-G tRNA only

Always a wobble-U tRNA Sometimes a wobble-C tRNA

GC

U GUb

GCb

Page 12: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Selection-Mutation-Drift Theory

Li (1987), Shields (1990), Bulmer (1991)

CUu

u(1-)

1 1+s

1)exp(

)exp()(

S

SS

nn

n

UC

C

In absence of selection the relative frequency of C is

CU

C

nn

n

In presence of selection the relative frequency of C is

where S = 2Nes.

= when S = 0

1 when S is large

Page 13: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

S NGMycoplasma penetrans Asn 0.236 0.403 0.78 1

Asp 0.140 0.201 0.43 1Agrobacterium tumefaciens Asn 0.561 0.841 1.42 1

Asp 0.491 0.764 1.21 2Sinorhizobium meliloti Asn 0.649 0.817 0.88 1

Asp 0.642 0.732 0.42 2Corynebacterium glutamicum Asn 0.666 0.952 2.29 2

Asp 0.444 0.722 1.18 2Escherichia coli Asn 0.550 0.875 1.74 4

Asp 0.372 0.657 1.17 3Bacillus subtilis Asn 0.435 0.775 1.50 4

Asp 0.360 0.470 0.45 4Schizosaccharomyces pombe Asn 0.343 0.718 1.58 6

Asp 0.292 0.438 0.64 8Saccharomyces cerevisiae Asn 0.410 0.860 2.18 10

Asp 0.350 0.578 0.93 16Caenorhabditis elegans Asn 0.378 0.575 0.80 20

Asp 0.324 0.559 0.97 27

low

ClowU

lowC

nn

n

lowC

highU

lowU

highC

nn

nn

S

SS ln

1

)(1

)(ln

Estimate S from sequence data – U+C familiesAssume S is negligible in low expression genes, but significant in high expression genes.

S is positive in all these examples.C codon is always preferred.

GC

U GUb

GCb

1)exp(

)exp()(

S

SS

nn

nhighU

highC

highC

Page 14: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

S NU:NCMycoplasma penetrans Gln 0.063 0.017 -1.33 1:0

Glu 0.094 0.033 -1.11 1:0Deinococcus radiodurans Gln 0.828 0.900 0.63 1:1

Glu 0.561 0.376 -0.75 1:0Agrobacterium tumefaciens Gln 0.831 0.982 2.40 1:1

Glu 0.427 0.273 -0.69 2:0Clostridium perfringens Gln 0.138 0.029 -1.67 2:0

Glu 0.230 0.189 -0.25 3:0Lactobacillus plantarum Gln 0.360 0.046 -2.45 2:1

Glu 0.245 0.030 -2.35 2:1Sinorhizobium meliloti Gln 0.818 0.973 2.09 1:1

Glu 0.579 0.384 -0.79 3:1Corynebacterium glutamicum Gln 0.615 0.974 3.16 1:2

Glu 0.437 0.689 1.04 1:3Escherichia coli Gln 0.653 0.813 0.84 2:2

Glu 0.311 0.244 -0.34 4:0Bacillus subtilis Gln 0.488 0.200 -1.34 4:0

Glu 0.320 0.227 -0.47 6:0Schizosaccharomyces pombe Gln 0.285 0.156 -0.77 4:2

Glu 0.322 0.565 1.01 4:6Saccharomyces cerevisiae Gln 0.307 0.009 -3.89 9:1

Glu 0.300 0.028 -2.70 14:2Caenorhabditis elegans Gln 0.343 0.307 -0.16 20:7

Glu 0.375 0.436 0.25 17:24

Estimate S from sequence data – A+G families

lowG

highA

lowA

highG

nn

nnS ln

Positive S means G is preferred.Negative S means A is preferred.

This depends on which tRNA genes are present.

Page 15: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Selection on codon usage is strongest in the organisms where rRNAs and tRNAs are highly duplicated. These are the fastest growing ones. Sharp et al (2005).

Page 16: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

growth ratecodon bias between high and low expression genes

tRNA gene copy number

Selection on codon usage is strongest in the organisms where tRNAs are highly duplicated. These are the fastest growing ones. Ran and Higgs (2012).

Page 17: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

log10(protein production rate)

Codon frequencies in yeast vary smoothly as a function of protein production rate.

Shah and Gilchrist (PNAS 2011)

Page 18: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Energetic efficiency of amino acid synthesis

Bacteria use cheaper amino acids in highly expressed genes

“Expression Level”

Akashi and Gojobori (2002) PNAS

Page 19: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Mitochondria are organelles inside eukaryotic cells.

They are the site of oxidative phosphorylation and ATP synthesis.

They contain their own genome distinct from the DNA in the nucleus.

Typical animal mitochondrial genomes are short and circular (~16,000 bases).

They usually contain:

2 rRNAs

22 tRNAs

13 proteins

Page 20: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Small Scale Evolution –

Variation in Frequencies of Bases and Amino Acids

G G C A A A A T

C C G T T T T A

The two strands of DNA are complementary.

Freq of A on one strand = Freq of T on the other

Freq of C on one strand = Freq of G on the other

If the two strands are subject to the same mutational processes then the freq of any base should be equal (statistically) on both strands.

This means that A = T and C = G on any one strand.

In this case base frequencies can be described by a single variable: G+C content.

BUT – mitochondrial genomes have an asymmetrical replication process. The two strands are not equivalent.

The frequencies of bases on the two strands are not equal.

On any one strand the frequencies of the four bases may vary independently.

Page 21: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Mitochondrial genome replication

Figure from

Faith & Pollock (2003) Genetics

Rank genes in order of increasing time spent single strandedCOI < COII < ATP8 < ATP6 < COIII < ND3 < ND4L < ND4 < ND1 < ND5 <ND2 < Cytb

ND6 is on the other strand

Page 22: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

The Genetic Code maps the 64 DNA codons to the 20 amino acids.

(This version applies to Vertebrate Mitochondria) SECOND POSITION

T C A G THIRDPOSITION

FIRST

POSITION

T TTT F 1 TTC F

TCT STCC S 6TCA STCG S

TAT Y 10TAC Y

TGT C 17TGC C

T C A G

TTA L 2TTG L

TAA StopTAG Stop

TGA W 18TGG W

C CTT LCTC LCTA LCTG L

CCT PCCC P 7CCA PCCG P

CAT H 11CAC H

CGT RCGC R 19CGA RCGG R

T C A G

CAA Q 12CAG Q

A ATT I 3ATC I

ACT TACC T 8ACA TACG T

AAT N 13AAC N

AGT S 20AGC S

T C A G

ATA M 4ATG M

AAA K 14AAG K

AGA StopAGG Stop

G GTT VGTC V 5GTA VGTG V

GCT AGCC A 9GCA AGCG A

GAT D 15GAC D

GGT GGGC G 21GGA GGGG G

T C A G

GAA E 16GAG E

4-codon families where the third position is synonymous

Page 23: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Base frequencies at FFD sites in each gene (averaged over mammals)

Deamination: C to U and A to G on the heavy strand

Page 24: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Base frequencies at FFD sites are controlled by mutation.

Base frequencies at 1st and 2nd positions are influenced by mutation and selection

Model fitting (Data from Fish) – assume a fraction of fixed sites and a fraction of neutral sites.

Selection at 1st position is weaker than at 2nd

Page 25: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Mutation pressure is sufficient to cause change in amino acid frequencies.

Second Position

T C A G Third Pos.

First

Position

T F 1 F

SS 6SS

Y 10Y

C 17C

T C A G

LLL 2LLL

StopStop

W 18W

C PP 7PP

H 11H

RR 19RR

T C A G

Q 12Q

A I 3I

TT 8TT

N 13N

S 20S

T C A G

M 4M

K 14K

StopStop

G VV 5VV

AA 9AA

D 15D

GG 21GG

T C A G

E 16E

Page 26: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

GAU

GAC

GAA

GAG

AAU

AAC

AAA

AAG

CAU

CAC

CAA

CAG

UAU

UAC

UAA

UAG

12

51

63

15

29

131

84

9

18

79

82

8

35

89

4

3

D

D

E

E

N

N

K

K

H

H

Q

Q

Y

Y

*

*

39

123

79

5

50

155

132

10

37

119

52

7

29

99

81

7

GCU

GCC

GCA

GCG

ACU

ACC

ACA

ACG

CCU

CCC

CCA

CCG

UCU

UCC

UCA

UCG

22

45

61

8

112

196

165

32

65

167

276

42

69

139

65

11

A

A

A

A

T

T

T

T

P

PPP

S

S

S

S

GUU

GUC

GUA

GUG

AUU

AUC

AUA

AUG

CUU

CUC

CUA

CUG

UUU

UUC

UUA

UUG

V

V

V

V

I

I

M

M

L

L

L

L

F

F

L

L

16

87

61

19

GGU

GGC

GGA

GGG

G

G

G

G

11

37

1

0

AGU

AGC

AGA

AGG

S

S

*

*

6

26

28

0

CGU

CGC

CGA

CGG

R

R

R

R

5

17

90

9

UGU

UGC

UGA

UGG

C

C

W

W

Homo sapiens Strand = + 3624 codons

Page 27: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

1.654GG0.552CG1.433UG

1.027GA0.906CA1.136UA

1.005GC1.163CC0.743UC

0.763GU1.101CU0.939UU

Mammals - 23

1.891GG0.554CG1.274UG

1.145GA0.938CA1.030UA

0.878GC1.205CC0.756UC

0.605GU0.939CU1.250UU

Fish - 23

1.369GG1.293AG0.546CG0.856UG

0.776GA0.974AA0.945CA1.206UA

0.873GC0.797AC1.363CC0.994UC

1.115GU0.996AU1.082CU0.855UU

Mammals - 31

1.499GG1.228AG0.609CG1.049UG

0.758GA1.135AA0.849CA1.096UA

0.839GC0.739AC1.371CC0.918UC

0.911GU0.907AU1.162CU0.933UU

Fish - 31

Frequency ratios

))q(Yq(X

)Yp(X=)Yr(X

32

3232

Codon bias seems to be a dinucleotide mutational effect in mitochondria, rather than an effect of translational selection.

CpG effect.... (increased rate of C to U mutations in CG dinucleotides. Expect high UG and CA)

DNA binding proteins....

Page 28: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

The OGRe front page: http://ogre.mcmaster.ca

Sequence information for OGRe is taken from GenBank.

Page 29: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

LOCUS NC_001922 16646 bp DNA circular VRT 20-SEP-2002DEFINITION Alligator mississippiensis mitochondrion, complete genome.ACCESSION NC_001922VERSION NC_001922.1 GI:5835540KEYWORDS .SOURCE mitochondrion Alligator mississippiensis (American alligator) ORGANISM Alligator mississippiensis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archosauria; Crocodylidae; Alligatorinae; Alligator.REFERENCE 1 (bases 1 to 16646) AUTHORS Janke,A. and Arnason,U. TITLE The complete mitochondrial genome of Alligator mississippiensis and the separation between recent archosauria (birds and crocodiles) JOURNAL Mol. Biol. Evol. 14 (12), 1266-1272 (1997) MEDLINE 98066357 PUBMED 9402737FEATURES Location/Qualifiers source 1..16646 /organism="Alligator mississippiensis" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:8496" /tissue_type="liver" /dev_stage="adult" rRNA 1..976 /product="12S ribosomal RNA" tRNA 977..1044 /product="tRNA-Val" /anticodon=(pos:1009..1011,aa:Val) rRNA 1046..2635 /product="16S ribosomal RNA" tRNA 2636..2710 /product="tRNA-Leu" /note="codons recognized: UUR" /anticodon=(pos:2672..2674,aa:Leu) gene 2711..3676 /gene="ND1" CDS 2711..3676 /gene="ND1"

1 caacagactt agtcctggtc ttttcattag ctagtactca acttatacat gcaagcatcc 61 gcgaaccagt gagaacaccc tacaagtctg acagacgaat ggagccggca tcaggcacat 121 caaccgatag cccaaaacgc ctagcccagc cacaccccca agggtctcag cagtgattaa 181 ccttaaacca taagcgaaag cttgatttag ttagagtaga tatagaggcg gtcaactctc 241 gtgccagcaa ccgcggttag acgaaaacct caagttaatt gacaaacggc gtaaattgtg 301 gctagaactc tatctccccc attagtgcag atacggtatc acagtagtga taaacttcat 361 cacaccgcaa acatcaacac aaaactggcc ctaatctcaa agatgtactc gattccacga 421 aagctgagaa acaaactggg attagatacc ccactatgct cagcccttaa cattggtgta 481 gtacacaaca gactaccctc gccagagaat tacgagcccc gcttaaaact caaaggactt 541 gacggcactt taaacccccc tagaggagcc tgtcctataa tcgacagtac acgttacacc 601 cgaccacctt tagcctactc agtctgtata ccgccgtcgc aagcccgtcc catttgaggg 661 aaacaaaacg cgcgcaacag ctcaaccgag ctaacacgtc aggtcaaggt gcagccaaca 721 aggtggaaga gatgggctac attttctcaa catgtagaaa tattcaacgg agagccctat 781 gaaatacagg actgtcaaag ccggatttag cagtaaactg ggaaagaata cctagttgaa 841 gtcggtaacg aagtgcgtac acaccgcccg tcaccctcct cgaacccaac aaaatgccca 901 aacaacaggc acaatgttgg gcaagatggg gaaagtcgta acaaggtaag cgtaccggaa 961 ggtgcacttg gaacatcaaa atgtagctta aatttaaagc attcagttta cacctgaaaa 1021 agtcccacca tcggaccatt ttgaaaccca tatctagccc tacctccttt caacatgctt

An example of a GenBank file

Complete mitochondrial genome of the Alligator

Page 30: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

OGRe (= Organellar Genome Retrieval) is a relational database.

available at http://ogre.mcmaster.ca

More than 1000 complete animal mitochondrial genomes.

Efficient means of storage and retrieval of information. Uses PostgreSQL

Schema defines relationships between different types of information.

genome_codecodonstrandusage

codon_usage

group_nameparentalternative_parenthas_children

classification

genome_codeoffstfilename

fileindex

genome_codemedline_code

citations

species_codeclassificationgroup_namelatin_namecommon_name

species

feature_idstartstopstrand

feature_location

feature_idamino_acidanticodoncodon

trna

feature_namedescription

feature_descriptions

feature_idgenome_codetypefeature_namenotesdescriptionalignment_file

feature

genomegenome_codespecies_codegenome_typencbiacdescriptionncbidateaccession_datelastmodifiedlastmodifiedbynotesgenome_lengthgenetic_codea_contentc_contentg_contentt_contentfinalgene_ordergene_order_notrnarna_a_contentrna_c_contentrna_g_contentrna_t_content

Page 31: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Species may be selected individually from an alphabetical list

Or taxa may be selected from a hierarchy. Here the Arthropods have been expanded and the Myriapods and Crustaceans have been selected

Page 32: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

On the ogre web site, a visual comparison can be made of any two selected species. Colour is used to indicate conserved blocks of genes.

Alligator and Bird genomes differ by interchange of two tRNA genes (red and yellow)… …and by translocation of the two

genes in the blue block.

Large Scale – Evolution of Gene Order in Whole Genomes

Page 33: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Genome reshuffling mechanisms

Inversions:

A B C D A D

-C -B

A

BC

D

Translocations:

A (B C) D A D B C

Duplications and deletions

A B C B C DA B C D A C B D/ /

Page 34: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Example of an inversion

Example of a translocation

Page 35: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

The T and –F genes are duplicated in Cordylus warreni.

If the first T and the second –P were deleted, the relative position of T and –P would change.

Page 36: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Sometimes things go crazy ….

Drosophila and Thrips are both insects

yet there are 30 breakpoints for only 37 genes

i.e. almost nothing in common.

Page 37: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

OGRe contains gene orders as strings. This allows searching and comparison.

231 unique gene orders have been found in 858 species.

The standard vertebrate order is shared by 398 species (including humans). There are many other species with unique gene orders.

Some species conserve gene order over 100s of millions of years. Others get scrambled in a few million.

Still to do (new project) :

- estimate relative rates of different rearrangement processes

- predict most likely ancestral gene orders

- use gene order evidence in phylogenetics

Page 38: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

0.1

TerebratulinaKatharina

LimulusHeptathela

OrnithoctonusHabronattus

VarroaCarios

Ornithodoros moubataOrnithodoros porcinus

RhipicephalusAmblyomma

HaemaphysalisIxodes holocyclus

Ixodes hexagonusIxodes persulcatus

ScutigeraLithobius

ThyropygusNarceus

SpeleonectesVargula

HutchinsoniellaTigriopus

ArmilliferArgulus

TetraclitaPollicipes

PenaeusCherax

PortunusPanulirus

PagurusArtemia

TriopsDaphnia

TetrodontophoraGomphiocephalus

TricholepidionLocusta

AleurodicusTriatoma

PhilaenusThrips

LepidopsocidHeterodoxus

PyrocoeliaTriboliumCrioceris

ApisMelipona

OstriniaAntheraeaBombyx

AnophelesDrosophilaChrysomya

Images coutesy of University of Nebraska, Dept.of Entomology.

http://entomology.unl.edu/images/

Arthropod phylogenetics

Very difficult due to strong variation in rates of evolution between species.

tRNA tree – branch lengths optimized on fixed consensus topology

Long branch species are problematic if tree is not fixed.

Page 39: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Images coutesy of University of Nebraska, Dept.of Entomology.

http://entomology.unl.edu/images/

protein tree – branch lengths optimized on fixed consensus topology

0.1

TerebratulinaKatharina

LimulusHeptathela

OrnithoctonusHabronattusVarroa

CariosOrnithodoros moubata

Ornithodoros porcinusRhipicephalus

AmblyommaHaemaphysalis

Ixodes holocyclusIxodes hexagonus

Ixodes persulcatusScutigera

LithobiusThyropygus

NarceusSpeleonectes

VargulaHutchinsoniella

TigriopusArmillifer

ArgulusTetraclitaPollicipes

PenaeusCherax

PortunusPanulirus

PagurusArtemia

TriopsDaphnia

TetrodontophoraGomphiocephalus

TricholepidionLocusta

AleurodicusTriatoma

PhilaenusThrips

LepidopsocidHeterodoxus

PyrocoeliaTribolium

CriocerisApis

MeliponaOstrinia

AntheraeaBombyx

AnophelesDrosophilaChrysomya

Same species are on long branches in proteins as in RNAs

Page 40: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Relative rate test for sequence evolution - Templeton

0 1 2

Three aligned sequences with 0 known to be the outgroup. Test whether rates of evolution in branch 1 and branch 2 are equal.

m1 = number of sites where 0 and 2 are the same and 1 is different.m2 = number of sites where 0 and 1 are the same and 2 is different.

)(

)(

21

2212

mm

mmm

Calculate:

Should follow a chi squared distribution with one degree of freedom.

Many pairs of related species found to have different rates in the mitochondrial sequences.

Page 41: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

The gene order of the ancestral arthropod is thought to be the same as that of the horseshoe crab Limulus.Image courtesy of Marine Biology Lab, Woods Hole. www.mbl.edu/animals/Limulus

Gene Order sometimes gives evidence of phylogenetic relationships

The same translocation of tRNA-Leu is found in insects and crustaceans but not myriapods and chelicerates. Strong argument for the group Pancrustacea (= insects plus crustaceans)

Page 42: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Moderately rearranged

Completely scrambled

Page 43: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Breakpoints Inversions Dup/Del tRNA ProteinTigriopus japonicus 35 32 0 2.15 1.34Heterodoxus macropus 35 32 0 1.39 1.83Thrips imaginis 32 29 1 1.34 1.32Pollicipes polymerus 22 16 2 0.69 0.59Cherax destructor 20 16 0 0.54 0.57Tetraclita japonica 20 16 0 0.66 0.57Argulus americanus 20 18 0 0.72 1.12Speleonectes tulumensis 19 16 1 0.83 0.93Apis mellifera 19 16 0 0.84 1.50Hutchinsoniella macracantha 18 16 0 0.86 0.87Pagurus longicarpus 18 12 0 0.65 0.45Vargula hilgendorfii 17 15 0 0.79 1.41Lepidopsocid RS-2001 17 16 0 0.60 0.59Habronattus oregonensis 16 14 0 1.48 1.09Ornithoctonus huwena 15 13 0 1.95 1.23Scutigera coleoptrata 15 15 0 0.48 0.44Melipona bicolor 14 8 2 0.93 1.66Varroa destructor 14 12 0 0.83 1.09Armillifer armillatus 13 12 0 0.85 1.73Narceus annularus 9 9 0 0.63 0.58Thyropygus sp. 9 9 0 0.49 0.46Aleurodicus dugesii 8 5 1 1.04 1.54Anopheles gambiae 8 6 0 0.41 0.47Tetrodontophora bielanensis 8 6 0 0.77 0.70Artemia franciscana 7 5 0 0.63 0.64Rhipicephalus sanguineus 7 6 0 0.82 0.96Amblyomma triguttatum 7 6 0 0.88 1.00Haemaphysalis flava 7 6 0 0.82 0.96Locusta migratoria 6 5 0 0.38 0.52Bombyx mori 6 5 0 0.51 0.54Portunus trituberculatus 6 5 0 0.51 0.44Ostrinia furnacalis 6 5 0 0.49 0.48Tribolium castaneum 6 5 0 0.55 0.53Antheraea pernyi 6 5 0 0.50 0.54Chrysomya putoria 4 2 1 0.36 0.42Tricholepidion gertschi 3 2 0 0.44 0.39Daphnia pulex 3 2 0 0.62 0.51Pyrocoelia rufa 3 2 0 0.52 0.77Drosophila melanogaster 3 2 0 0.37 0.42Panulirus japonicus 3 2 0 0.58 0.53Triatoma dimidiata 3 2 0 0.59 0.50Lithobius forficatus 3 3 0 1.13 0.61Philaenus spumarius 3 2 0 0.69 0.58Gomphiocephalus hodgsoni 3 2 0 0.69 0.62Penaeus monodon 3 2 0 0.34 0.32Crioceris duodecimpunctata 3 2 0 0.55 0.58Triops cancriformis 3 2 0 0.42 0.40Limulus polyphemus 0 0 0 0.36 0.40Ixodes persulcatus 0 0 0 0.72 0.82Ixodes holocyclus 0 0 0 0.76 0.83Ixodes hexagonus 0 0 0 0.74 0.90Carios capensis 0 0 0 0.70 0.79Ornithodoros porcinus 0 0 0 0.67 0.86Heptathela hangzhouensis 0 0 0 0.76 0.87Ornithodoros moubata 0 0 0 0.68 0.88

Very High

High

Medium

Low

Species ranked according to breakpoint distance from ancestor.

Page 44: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

R =0.99 R =0.59

R =0.53 R =0.69

Page 45: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

min mean max min mean maxVery High 1.33 1.62 2.14 1.32 1.50 1.83High 0.48 0.86 1.94 0.44 0.99 1.73Moderate 0.38 0.63 1.04 0.43 0.69 1.54Low 0.34 0.60 1.13 0.32 0.62 0.90

tRNA only High 0.66 1.01 1.94 0.57 1.15 1.73tRNA only Mod/Low 0.34 0.60 1.13 0.32 0.63 1.54

tRNA distance protein distanceBreakpoint category

Highly rearranged genomes have highly divergent sequences.

Rates of sequence evolution and genome rearrangement are correlated.

Both are very non-clocklike.

There are many species where only tRNAs have changed position.

Species with highly reshuffled tRNAs have high rates of sequence evolution in both tRNAs and proteins.

Page 46: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Relative rate of genome rearrangement (Xu et al 2006)

0 1 2

Three gene orders with 0 known to be the outgroup. Test whether rates of rearrangement in branch 1 and branch 2 are equal.

n1 = number of gene couples in 0 and 2 but not in 1 – i.e. New breakpoint in 1n2 = number of gene couples in 0 and 1 but not in 2 – i.e. New breakpoint in 2

)(

)(

21

2212

nn

nnn

Calculate:

Should follow a chi squared distribution with one degree of freedom.

We took pairs where there was a significant difference in rearrangement rates (χn

2 was large) and showed that there was a significant difference in substitution rates too (χm

2 was large).

Page 47: Lectures for 4Y03 (a) Efficiency in Bacterial Cells (b)Codon Usage Bias  (c) Mitochondrial Genomes

Good Guys

Credits:

Daniel Jameson/ Bin Tang – Database design and management

Daniel Urbina – Base and Amino Acid Frequencies

Wei Xu – Gene Order Analysis and Arthropod Phylogenies

Bad Guys

Gene order is sometimes a strong phylogenetic markerbut the Bad Guys are problematic in gene order analysis as well as phylogenetics.

Why does the evolutionary rate speed up in these isolated groups of species?Why to tRNA genes move more frequently?What are the relative rates of inversion and translocation?