lecture24 bacterialenviro algenomics ericalm

Upload: nickinheaven

Post on 06-Apr-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    1/11

    Genome Dynamics and

    Environmental Adaptation in

    Bacteria

    Eric AlmDepts. Of Biological Eng. And Civil and

    Environmental Eng., MITBroad Institute of MIT and Harvard

    The Modern View of Bacterial

    Genome Dynamics

    Horizontal Gene Transfer is rampant

    Closely related strains harbor lots of newly

    acquired DNA HGT is a key mechanism for niche adaptation

    native genes are insulated from dynamics at

    the periphery of networks

    Uptake of Foreign DNA

    Transformation

    Phage

    Conjugation

    Genomic islands as reservoirs of new

    DNA

    Colemann et al., Science 2006

    How Common Is It?

    Marine isolates of co-existing microdiversity

    Large variation in genome

    size among closely relatedstrains

    Thompson et al., Science 2005

    From: Lerat et al. (2005) PLoS Biol 3(5): e130

    Genome Dynamics (HGT) at the

    Periphery

    Pal, Papp & Lercher,Nature Genetics, 2005.*horizontal gene transfer into the E. colilineage since itssplit from the Vibrio lineage.

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    2/11

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    3/11

    (gene family) - Principle # 2 & 3: More important proteins evolve

    slowly. (e.g. Ribosome)

    (genome) - Principle #1: Molecular clock. Rate of changedepends on mutation rate, population size, etc.

    (gene,genome) - Principle #5: Positive or negative selection?

    rt = (gene family) (genome) (gene,genome) t

    Evolutionary Distance =

    ?

    Overview of the Method

    Seed possible orthologs:

    Single copy ubiquitous COGs

    Align and build trees

    Compare to species phylogeny

    KH-test

    Reject outliers

    Normalize against family

    rate

    and molecular clock

    Read out terminal

    branch lengths

    ~1000 gene families

    744 gene families

    rt = (gene family) (genome) (gene,genome) t

    Evolutionary Distance =

    =

    Clock Explain Most Distance

    Variation

    Predicted branch length log2(t)

    Observedbranchlengthlog2

    (r

    t)

    -10 -5 0 5

    -10

    -5

    0

    5

    Residual variation is an estimate of

    What Can We Learn From

    Residual Variation?

    Noise? Environment-specific selective

    pressures

    Positive selection

    Negative selection

    Relaxed negative selection

    Similar patterns in similar genes?

    FAST

    Lost ?

    Fisher's exact test:Odds Ratio = 3.1,P = 2.4e-7

    Odds Ratio = 0.55,P = 0.01

    Fast: > 4.0

    Slow: < 0.25

    Outgroup

    Negative Selection

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    4/11

    Selective Sweeps

    P=0.05

    Positive Selection?

    Hypergeometric test for enrichment of COG functions in

    fast/slow (top 10% of genes)

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    5/11

    Type II secretory pathway,prepilin signal

    peptidase PulO and related peptidases

    *PulO1989

    Tfp pilus assembly protein, major pilin*PilA4969

    Tfp pilus assembly protein, pilus retraction

    ATPase

    **PilT2805

    Flagellar hook basal body protein*FliE1677

    Flagellar basal body P-ring protein*FlgI1706

    Flagellar basal body rod protein*FlgG4786

    Flagellin-specific chaperone**FliS1516

    Flagellar capping protein*FliD1345

    Flagellar basal body protein*FlgB1815

    Tfp pilus assembly protein**PilV4967

    Flagellar biosynthesis protein*FliO3190

    Flagellar basal body P-ring biosynthesis

    protein

    **FlgA1261

    Flagellar basal body rod protein**FlgF4787

    Flagellar biosynthesis/type III secretory

    pathway chaperone

    ***FlgN3418

    Flagellar biosynthesis pathway**FliR1684

    Flagellar biosynthesis pathway**FlhB1377

    YersiniaPhotor.E. coliNameCOG

    Analysis of Patterns of Selection

    Genomes/Experiments

    Gene

    s

    Do correlations in between rows (genes) indicate

    similar functional roles?

    Selection Acts Coherently Across

    Pathways/FunctionsAnalysis of Patterns of Selection

    Genomes/Experiments

    Gene

    s

    Do correlations in between columns (genomes)

    indicate similar ecology?

    Evolution of Evolutionary Rates

    Correlation of across all

    genes (orthologs) for eachpair of genomes

    Deep-branching clades show

    significant correlation in

    genome-wide selectivepatterns

    No Correlation With Phylogeny

    Over Shorter Timespans

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    6/11

    EnvironmentEnvironment

    Novel genes

    retained in genome

    gene evolutionary rate

    variation

    gene evolutionary rate

    variation

    Responses to Natural Selection

    Native genes

    HGT

    A critique of the adaptionist programmeGould & Lewontin, 1979

    Front legs a puzzle: how

    Tyrannosaurus used its tiny front

    legs is a scientific puzzle; theywere too short even to reach the

    mouth. They may have been

    used to help the animal rise from

    a lying position.

    - Explanatory information,

    Museum of Science, Boston

    c. 1979

    EnvironmentEnvironment

    Novel genes

    retained in genome

    gene evolutionary rate

    variation

    gene evolutionary rate

    variation

    Direct vs. Indirect Selection

    HGT Direct selectionDirect

    selection

    Indirect selection

    QuickTime andaTIFF(Uncompressed)decompressor

    areneededtosee this picture.

    Gene Content Influences Selection on

    Genes?

    0.020.11(v X time | g-c)

    ns0.09(v X dist | g-c)

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    7/11

    ac groun : e own ass

    Algorithm in Phylogenetic

    Inference

    What information ispassed from leaves to

    parents?

    sequence and score

    Reconciliation proceeds by labeling each node ingene tree as HGT, Dup, or Speciation (loss isimplied)

    Pass LCA (and score) of each subtree fromleaves to root

    A Downpass Algorithm for Reconciliation

    5

    42

    4414

    5

    42

    441 4

    Species Gene

    1 2 3 4 1 2 41 4

    The Algorithm

    Calculate optimal scenario resulting in each possible LCA

    1 2 3 4

    1 1 2 4 4

    0 3 33

    3

    3

    1

    Species tree

    Gene tree

    Downpass species tree

    The Algorithm

    Calculate optimal scenario resulting in each possible LCA

    1 2 3 4

    1 1 2 4 4

    Species tree

    Gene tree

    The Algorithm

    Calculate optimal scenario resulting in each possible LCA

    1 2 3 4

    1 1 2 4 4

    Species tree

    Gene tree

    For all LCAs at parent:

    For allLCAs at left child:

    For all LCAs at right child:

    O(ngns3)

    Real Data

    COG100: 30S ribosomal subunit proteinS11

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    8/11

    Species Species Gene

    32 transfers!!

    Even with a good species phylogeny, gene trees may havesignificant uncertainty

    Bootstrap trees are a convenient but very limited sample ofdifferent topologies

    Consensus trees discard information

    Uncertainty in Gene Trees Love the Bootstrap

    Dont fear the bootstrap -embrace it!

    ReconcileALL bootstraps: For each subtree

    reconciliation, check other

    bootstraps for more

    efficient reconciliation

    The Idea The Idea

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    9/11

    The Idea

    Reconciliation as a tool fortree construction

    Incorporation of bootstrapsubtrees explores a very

    large region of plausibletree space

    Constructed tree is mostparsimonious, plausiblegene tree

    Reconciliation meets construction

    Each internal node of eachbootstrap has three

    potential parents

    For each node, threetables of potential LCAs

    must be maintained

    The Algorithm The Algorithm

    Bootstrap trees1. Reconcile children

    The Algorithm

    Bootstrap trees1. Reconcile children

    2. Reconcile same node

    in bootstrap trees

    The Algorithm

    Bootstrap trees1. Reconcile children

    2. Reconcile same node

    in bootstrap trees 3. Return best answer

    and merge tables

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    10/11

    The Algorithm

    Bootstrap trees1. Reconcile children

    2. Reconcile same node

    in bootstrap trees 3. Return best answer

    and merge tables

    4. Return table to parent

    link subtrees across bootstraps Find path through all bootstrap trees optimizing

    reconciliation

    After all subtrees reconciled, select bestreconciliation to represent linked subtrees.

    It Gets Messy

    different entries in the same table can have different

    subtree topologies!

    5

    42

    4414

    Iterate through all branches Root at branch with best reconciliation

    Rooting trees is easy! Real Data Revisited!

    COG100: 30S ribosomal subunit protein

    S11

    Species

    Reconciliation

    7 transfers

    Reconciliation events

  • 8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm

    11/11

    Possible to reconcile gene andspecies trees efficiently

    Uncertainty in gene trees can hamperreconciliation

    Use bootstraps to sample reasonablesubsets of tree space

    Are there 7 transfers for COG100? Wrong species phylogeny Need more bootstraps Gold-standard?

    Next steps? All metabolic genes Co-evolution among genes with similar

    function?

    SummaryAcknowledgements

    Jesse Shapiro (Evolutionary rates)

    Lawrence David (Reconciliation)

    Sonia Timberlake (Evolution of regulation)

    Sean Clarke (HGT in the laboratory)

    Arne materna (Experimental evolution)