evidence of selection on genomic gc content in bacteria · 2011. 4. 11. · at gc = 1 . four-fold...

49
Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Axel Meyer Adam Eyre-Walker

Upload: others

Post on 25-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Evidence of Selection on Genomic GC Content in Bacteria!

    Falk Hildebrand!Axel Meyer!

    Adam Eyre-Walker!

  • Genomic G+C content!

  • Genomic GC content!

  • Codons!

    123!ATA CCC!CTA CCT!

    Non-synonymous

    Synonymous

    2-fold : TTT TTC

    4-fold : CCT CCC CCA CCG

  • Variation!

    0 20 40 60 80 1000.000

    0.005

    0.010

    0.015

    0.020

    0.025

    0.030

    GC3

    0 20 40 60 80 100

    0.01

    0.02

    0.03

    0.04

    0.05

    GC12

  • Correlations!

    0.0 0.2 0.4 0.6 0.8 1.00.0

    0.2

    0.4

    0.6

    0.8

    1.0

    GC12

    GC3

  • Explanations!

    •  Mutation bias!•  Suoeka (1961) & Freese (1962)!•  Intrinsic and/or extrinsic!

    •  Selection!•  Many authors!

    •  Biased gene conversion!•  Anonymous referees!

  • Evidence of selection!

    •  Escherichia coli!•  Mutation pattern!

    •  273 GCAT versus 131 ATGC!•  Predicted GC content = 0.32!•  Observed GC content = 0.50!•  Observed GC at neutral sites = 0.58!

    Lynch (2007) Origins of genome architecture

  • Test of mutation bias!

    •  If GC content is!•  Due to mutation bias alone!•  Stationary!•  And the infinite sites assumption holds!

    •  Then!•  # GCAT mutations = # ATGC mutations!

  • Orienting mutations!

    Strain 1 ACT GCT TTG GCT TTA TGG!Strain 2 ACT GCT TTG GCT TTA TGA!Strain 3 ACT GCT TTG GCT TTA TGG!Strain 4 ACT GCT TTC GCT TTA TGA!Strain 5 ACC GCT TTC GCT TTA TGG!Strain 6 ACT GCT TTG GCT TTA TGG!

    TC GC GA

    GCAT = 1 ATGC = 1

  • Four-fold synonymous sites!

    Genomic GC

    GC4!

  • Data!

    •  Popset!•  Keyword “bacteria”!•  8 or more sequences from same species!•  149 bacterial species!

    •  8 phyla, 15 classes and 77 genera!•  1 or more genes!•  10 or more synonymous polymorphisms!•  4-fold diversity < 0.1!

  • Overall result!

    No. of SNPs!

    GCAT! 11045!

    ATGC! 8309!

    P

  • Bias versus GC4!

    Z = GCAT

    GCAT

    No. species! Z > 0.5! P-value!GC-rich! 82! 69!

  • Phylogenetic distribution!Phylum! Class! No. of species! GC4 range! Mean Z

    (GC40.34)!

    Actinobacteria! Actinobacteria! 3! 0.64-0.93! no species! 0.64!

    Bacteroidetes! Bacteroidetes! 3! 0.12-0.46! 0.43! 0.36!

    Chlamydiae+! Chlamydiae! 2! 0.21-0.30! 0.45! no species!

    Cyanobacteria! Chroococcales! 2! 0.38-0.51! no species! 0.53!

    Cyanobacteria! Nostocales! 3! 0.26-0.31! 0.45! no species!

    Cyanobacteria! Oscillatoriales! 2! 0.41! no species! 0.38!

    Cyanobacteria! Stigonemales! 1! 0.40! no species! 0.59!

    Firmicutes! Bacilli! 27! 0.085-0.68! 0.44! 0.58!

    Firmicutes! Clostridia! 5! 0.050-0.28! 0.34! no species!

    Proteobacteria! Alphaproteobacteria! 16! 0.099-0.94! 0.43! 0.65!

    Proteobacteria! Betaproteobacteria! 6! 0.66-0.96! no species! 0.67!

    Proteobacteria! delta/epsilon! 6! 0.15-0.99! 0.49! 0.78!

    Proteobacteria!Gammaproteobacteria! 62! 0.095-0.95! 0.50! 0.66!

    Spirochaetes! Spirochaetes! 7! 0.12-0.60! 0.45! 0.54!

    Tenericutes! Mollicutes! 4! 0.023-0.24! 0.33! no species!

  • Potential problems!

    •  Infinite sites assumption!

    •  Sequencing error!

  • Zpred!

  • Z-Zpred!

    No. of species! Z-Zpred > 0! P-value!

    GC-rich! 82! 61!

  • Potential problems!

    •  Infinite sites assumption!

    •  Sequencing error!

  • Sequencing error!

    No. of species! Z > 0.5! P-value!

    GC-rich! 82! 60!

  • Explanations!

    •  Non-stationary base composition!•  Selection for translational efficiency!•  Biased gene conversion!•  Selection upon base composition!

  • Explanations!

    •  Non-stationary base composition!•  Selection for translational efficiency!•  Biased gene conversion!•  Selection upon base composition!

  • Non-stationary base composition!

  • Explanations!

    •  Non-stationary base composition!•  Selection for translational efficiency!•  Biased gene conversion!•  Selection upon base composition!

  • Selection on codon usage!

    Amino Acid! Codon! High usage! Low usage!Phenylalanine! UUU! 0.22! 0.71!

    UUC! 0.78! 0.29!

    Valine! GUU! 0.46! 0.36!GUC! 0.09! 0.19!GUA! 0.24! 0.23!GUG! 0.21! 0.23!

  • Translational efficiency!

    No. of species! Z > 0.5! P-value!

    GC-rich! 31! 29!

  • Explanations!

    •  Non-stationary base composition!•  Selection for translational efficiency!•  Biased gene conversion!•  Selection upon base composition!

  • Biased gene conversion!

    A T

    C G

    A G

    C T

    C G

    C G

  • Four gamete test!

    G A!G T!C A!

    G A!G T!C A!C T!

    No recombination Recombination

  • Biased gene conversion!

    No. species! Z > 0.5! P-value!

    GC-rich! 28! 19! 0.087!

    GCAT! ATGC! P-value!

    No. of SNPs! 1079! 844!

  • Biased gene conversion!

    GC AT -w w

    if New >> 1 BGC effective if New

  • Biased gene conversion!

    r / m! p-value!

    GC4! -0.076! 0.67!

    GC4pred! -0.115! 0.52!

    34 species with estimate of r / m Vos & Didelot (2009) ISME J.

  • Biased gene conversion!

    θ r / m! p-value!

    GC4! 0.039! 0.83!

    GC4pred! -0.031! 0.86!

    πsrm

    = 2Neurm

    = 2Nerk

  • Explanations!

    •  Non-stationary base composition!•  Selection for translational efficiency!•  Biased gene conversion!•  Selection upon base composition!

  • Selection on GC content!

    H(x) = J(x)J(x)dx0

    1∫

    J(x) = eSx xV −1(1− x)U −1

    S = 2Nes U = 2Neµ(1− f ) V = 2Neµf f = v /(u + v)

    GC AT uµ

    vµ +s -s

  • Selection on GC content!

    2 4 6 8 10

    0.6

    0.7

    0.8

    0.9

    1.0Zpred

    S

  • Selection on GC4!

    0.2 0.4 0.6 0.8 1.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Zobs

    Zpred

  • Selection on GC4!

    0.2 0.4 0.6 0.8 1.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Zobs

    Zpred

    f = α + β GC4

    f = 0.2 + 0.35 GC4

  • Selection on GC4!f = α + β GC4

    f = 0.2 + 0.35 GC4

    0.2 0.4 0.6 0.8 1.0

    0.25

    0.30

    0.35

    0.40

    0.45

    0.50

    0.55

    f

    GC4

  • Selection on genomic GC!

    Genomic GC

    GC4!

  • Correlates!

    •  Genome size!•  positive correlation!

    •  Lifestyle!•  higher GC in free living!

    •  Aerobiosis!•  higher GC in aerobic!

    •  Nitrogen utilization!•  higher GC amongst N fixers!

    •  Temperature !•  higher amongst thermophiles?!

  • Environmental meta-genomics!

    Foerstner et al. (2005) EMBO Reports

  • Environmental meta-genomics!

  • Thanks!

    Falk Hildebrand Axel Meyer

  • Mitochondrial DNA!

    •  488 animal datasets!

    Group! Percentage!Mammals! 30!

    Birds! 12!Fish! 23!

    Insects! 16!Crustacea! 7!Molluscs! 12!

  • Mitochondrial DNA!

    0.0 0.2 0.4 0.6 0.8 1.0

    1

    2

    3

    4

    GC content

  • Z versus GC4!

    0.2 0.4 0.6 0.8 1.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Z

    GC content

  • Z-Zpred versus GC4!

    0.1 0.2 0.3 0.4 0.5 0.6

    0.4

    0.2

    0.0

    0.2

    0.4

    Z-Zpred

    GC4

    r=0.11, p=0.02

  • Evidence of selection II!

    •  Phylogenetic analyses!•  Mycobacterium leprae (Lynch 2007)!•  Escherichia coli (Balbi et al. 2009)!•  (5 pathogenic bacteria (Hershberg and

    Petrov 2010))!•  Excess of GC AT!