dna sequencing and the modern revolution in studies of microbial diversity
DESCRIPTION
Talk by Jonathan Eisen at the California Academy of Sciences December 16, 2010TRANSCRIPT
1
DNA Sequencingand the
Modern Revolution in Studies of Microbial Diversity
Jonathan A. EisenUC Davis
Talk at CalacademyDecember 17, 2010
Monday, November 26, 12
Monday, November 26, 12
Social Networking in Science
Monday, November 26, 12
Bacterial evolve
Monday, November 26, 12
5
Outline
• Introduction: Diversity of microbes
• I: The Tree of Life
• II: Genome Sequencing
• III: Microbes in the Field
• IV: Metagenomics
Monday, November 26, 12
Introduction
Diversity of Microbes
6Monday, November 26, 12
Diversity of function
7
D. Diversity of form
Monday, November 26, 12
Many major pathogens are bacteria
Monday, November 26, 12
Bacteria and archaea are key commensals of many eukaryotes
Monday, November 26, 12
Extreme conditions are dominated by bacteria and archaea
Monday, November 26, 12
Microbes run global cycles
Monday, November 26, 12
The first photosynthetic cells were similar to cyanobacteria.
Photosynthetic Organisms Changed Earth’s Atmosphere
Monday, November 26, 12
13
6.15 Metabolic Pathways
Monday, November 26, 12
Diversity of form: prokaryotes
14
D. Diversity of form
Monday, November 26, 12
More shape diversity
15Monday, November 26, 12
16Monday, November 26, 12
Diversity of form II: complexity and size
17Monday, November 26, 12
Fruiting bodies
Photo 26.24 Fruiting body of gliding bacterium Stigmatella aurantiaca. SEM. 18Monday, November 26, 12
Diversity of form III: biofilms
Growth and division,formation of matrix
Mature biofilm
Binding to surface
Irreversible attachment
Matrix
Free-swimmingprokaryotes
Single-species biofilm
Signalmolecules
Signalmolecules
Attraction ofother organisms
19Monday, November 26, 12
Diversity of form: microbial eukaryotes
20
D. Diversity of form
Monday, November 26, 12
Part I:
The Tree of Life
21Monday, November 26, 12
22
Darwin and a Single Tree of Life
George Richmond. Darwin Heirlooms Trust
Darwin Origin of Species 1859
Set stage for “tree thinking”
Monday, November 26, 12
23
Ernst Haeckel 1866
www.mblwhoilibrary.org
PlantaeProtistaAnimalia
Monday, November 26, 12
24
MoneraProtistaPlantaeFungi
Animalia
Whittaker – Five Kingdoms 1969
Monday, November 26, 12
The Microbe Problem
Most trees of life did not deal with microbes very well
Trees were not based on comparing homologous traits between all organisms
Monday, November 26, 12
26http://mcb.illinois.edu/faculty/profile/1204
Carl Woese
Monday, November 26, 12
12.3 From Gene to Protein
27Monday, November 26, 12
28
The Ribosome
Monday, November 26, 12
29
rRNA Systematics
• All cellular organisms have ribosomes
• All have homologous subunits of the ribosomes including specific ribosomal proteins and ribosomal RNAs (i.e., these are universally homologous genes)
• Woese determined the sequences of ribosomal RNAs from different species
• The sequences are highly similar but have some variation
• Each position in a rRNA can be considered a distinct character trait
• Each position has multiple possible character states (A, C, U, G)
Monday, November 26, 12
Alignments
• Method of assigning homology to individual residues in different sequences
• Allows one to have multiple traits within individual genes
• Each column in alignment = a different character
• Each residue (ACTG) = state
30Monday, November 26, 12
Alignments
• Similar in concept to lining up bones from different species
31Monday, November 26, 12
Woese 1987 - rRNA
Microbiological Reviews 51:22132
Monday, November 26, 12
33
4.7 Eukaryotic Cells (Part 1)
Monday, November 26, 12
34
4.4 A Prokaryotic Cell
Monday, November 26, 12
35
26.23 Some Would Call It Hell; These Archaea Call It Home
Monday, November 26, 12
36
The Tree of Life2006
adapted from Baldauf, et al., in Assembling the Tree of Life, 2004Monday, November 26, 12
37
The Tree of Life2006
adapted from Baldauf, et al., in Assembling the Tree of Life, 2004Monday, November 26, 12
Why tree useful?
• Reclassification of many organisms, including diversity of pathogensChanges how to design treatments
• Interpret comparative dataConvergence vs. homology
38Monday, November 26, 12
Part II:
Genome Sequencing
39Monday, November 26, 12
Fleischmann et al. 1995
Monday, November 26, 12
Whole Genome Shotgun Sequencing
Monday, November 26, 12
Whole Genome Shotgun Sequencing
Monday, November 26, 12
Whole Genome Shotgun Sequencing
Warner Brothers, Inc.
Monday, November 26, 12
Whole Genome Shotgun Sequencing
shotgun
Warner Brothers, Inc.
Monday, November 26, 12
Whole Genome Shotgun Sequencing
shotgun
Warner Brothers, Inc.
Monday, November 26, 12
Whole Genome Shotgun Sequencing
shotgun
sequenceWarner Brothers, Inc.
Monday, November 26, 12
Whole Genome Shotgun Sequencing
shotgun
sequenceWarner Brothers, Inc.
Monday, November 26, 12
Assemble Fragments
Monday, November 26, 12
Assemble Fragments
sequencer output
Monday, November 26, 12
Assemble Fragments
sequencer output
Monday, November 26, 12
Assemble Fragments
sequencer output
assemble fragments
Monday, November 26, 12
Assemble Fragments
sequencer output
assemble fragments
Closure &
Annotation
Monday, November 26, 12
Microbial genomes
From http://genomesonline.orgMonday, November 26, 12
General Steps in Analysis of Complete Genomes
• Identification/prediction of genes• Characterization of gene features• Characterization of genome features• Prediction of gene function• Prediction of pathways• Integration with known biological data• Comparative genomics
44
Monday, November 26, 12
Vibrio cholerae Metabolism
Monday, November 26, 12
Genome Sequences Have Revolutionized Microbiology
• Predictions of metabolic processes
• Better vaccine and drug design
• New insights into mechanisms of evolution
• Genomes serve as template for functional studies
• New enzymes and materials for engineering and synthetic biology
Monday, November 26, 12
Genome Size
Monday, November 26, 12
Genome Structure:
More Variable
than Once
Monday, November 26, 12
Monday, November 26, 12
Figure 7.6 - Gene content
Monday, November 26, 12
Figure 7.7 - Gene content E. coli
Monday, November 26, 12
Figure 7.10 - K12 vs O157H7
Monday, November 26, 12
Lateral Transfer
from Doolittle, 1999Monday, November 26, 12
from Lerat et alMonday, November 26, 12
Part III:
Microbes in the field
55
A. Studying microbes
Monday, November 26, 12
How to study microbes
• Key questions about microbes in environment:Who are they? (i.e., what kinds of microbes are they)What are they doing? (i.e., what functions and processes do they possess)
56Monday, November 26, 12
57Monday, November 26, 12
Figure 26.24 Extreme Halophiles
58Monday, November 26, 12
Deep Sea Ecosystems
59Monday, November 26, 12
• For any particular environment, there are many different ways one could go about characterizing the microbes there
• 1. Observe directly in the field
• 2. Grow in the laboratory
• 3. CSI Microbiology (collect & analyze DNA from field)
60Monday, November 26, 12
Method 1:Observe in the field
61
A. Method 1
Monday, November 26, 12
Field Observations an Important Tool
62Monday, November 26, 12
Field Observations an Important Tool
62Monday, November 26, 12
Field Observations an Important Tool
62Monday, November 26, 12
Field Observations an Important Tool
62Monday, November 26, 12
Field Observations an Important Tool
62Monday, November 26, 12
Field Observations an Important Tool
62Monday, November 26, 12
Field Observations an Important Tool
62Monday, November 26, 12
Field Observations an Important Tool
63Monday, November 26, 12
Field Observations an Important Tool
63Monday, November 26, 12
Field Observations an Important Tool
63Monday, November 26, 12
Field Observations an Important Tool
63Monday, November 26, 12
Field Observations an Important Tool
63Monday, November 26, 12
Field Observations an Important Tool
63Monday, November 26, 12
Field Observations an Important Tool
63Monday, November 26, 12
Field Observations an Important Tool
64Monday, November 26, 12
Field Observations an Important Tool
64Monday, November 26, 12
Field Observations an Important Tool
64Monday, November 26, 12
Field Observations an Important Tool
64Monday, November 26, 12
Field Observations an Important Tool
64Monday, November 26, 12
Field Observations an Important Tool
64Monday, November 26, 12
Field Observations an Important Tool
64Monday, November 26, 12
Field Observations an Important Tool
64Monday, November 26, 12
Method 2:Culturing
65
B. Method 2
Monday, November 26, 12
Method 2: Culturing
66Monday, November 26, 12
Examples of Benefits of Culturing:
• Allows one to connect processes and properties to single types of organisms
• Enhances ability to do experiments from genetics, to physiology to genomics
• Provides possibility of large volumes of uniform material for study
• Can supplement appearance based classification with other types of data. Many types are useful, though the standard is analysis of rRNA sequences.
67Monday, November 26, 12
Optimal salt concentration for different species
68Monday, November 26, 12
• Some stresses of high saltOsmotic pressure on cellsDesiccation
Halophile adaptations
69
H20
Monday, November 26, 12
• Some stresses of high saltOsmotic pressure on cellsDesiccation
• Halophile adaptationsIncreased osmolarity inside cell
ProteinsCarbohydratesSalts
Membrane pumpsDesiccation resistance
Halophile adaptations
70
H20
H20
Monday, November 26, 12
• Some stresses of high saltOsmotic pressure on cellsDesiccation
• Halophile adaptationsIncreased osmolarity inside cell
ProteinsCarbohydratesSalts - only done in extremely halophilic archaea
Membrane pumpsDesiccation resistance
Halophile adaptations
71Monday, November 26, 12
• Some stresses of high saltOsmotic pressure on cellsDesiccation
• Halophile adaptationsIncreased osmolarity inside cell
ProteinsCarbohydratesSalts - only done in extremely halophilic archaea
Membrane pumpsDesiccation resistance
Halophile adaptations
72
High internal salt requires ALL cellular components to be adapted to salt, charge. For example, all proteins must change surface charge and other properties.
Monday, November 26, 12
Extreme halophiles are a monophyletic group
73Monday, November 26, 12
74
Uses of extremophiles
Type of environment
Examples Example of mechanism of survival
Practical Uses
High temp (thermophiles)
Deep sea vents, hotsprings
Amino acid changes
Heat stable enzymes
Low temp (psychrophile)
Antarctic ocean, glaciers
Antifreeze proteins
Enhancing cold tolerance of crops
High pressure (barophile)
Deep sea vents, hotsprings
Solute changes Industrial processes
High salt (halophiles
Evaporating pools Incr. internal osmolarity
Soy sauce production
High pH (alkaliphiles)
Soda lakes Transporters Detergents
Low pH (acidophiles)
Mine tailings Transporters Bioremediation
Desiccation (xerophiles)
Deserts Spore formation Freeze-drying additives
High radiation (radiophiles)
Nuclear reactor waste sites
Absorption, repair damage
Bioremediation, space travel
Monday, November 26, 12
Method III:CSI Microbiology
75Monday, November 26, 12
Culturing Microscopy
CountCount
Great Plate Count Anomaly
76Monday, November 26, 12
<<<<
Great Plate Count Anomaly
77
Culturing Microscopy
CountCountMonday, November 26, 12
Great Plate Count Anomaly
78
Problem because appearance not
effective for “who is out there?” or “what are they
doing?”
<<<<
Culturing Microscopy
CountCountMonday, November 26, 12
Great Plate Count Anomaly
79
Problem because appearance not
effective for “who is out there?” or “what are they
doing?”
<<<<
Culturing Microscopy
CountCount
Solution?
Monday, November 26, 12
Great Plate Count Anomaly
80
Problem because appearance not
effective for “who is out there?” or “what are they
doing?”
<<<<
Culturing Microscopy
CountCount
Solution?
DNA
Monday, November 26, 12
Collect from environment
Analysis of uncultured microbes
81Monday, November 26, 12
Collect from environment
Analysis of uncultured microbes
81Monday, November 26, 12
Polymerase Chain Reaction- PCR
82Monday, November 26, 12
DNA extraction
PCRSequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
Yeast
Makes lots of copies of the rRNA genes in sample
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
CrRNA1
E. coli Humans
Yeast
83
rRNA1 5’
...TACAGTATAGGTGGAGCTAGCGATCGATC
GA... 3’
PCR and phylogenetic analysis of rRNA genes
Monday, November 26, 12
Deep Sea Ecosystems
84Monday, November 26, 12
Chemosymbionts
Monday, November 26, 12
Analysis of uncultured microbes
86
NOTES 3419
A. pisum P
A. piswn S Tx. nivea
L awaaa symLL equizenata syrCud orbgcdar s,ym
rs. gesgosterorn - I/\ -- V IN. gonorrhoeae
B. Uhar.opkiuns sym
5% C. magncisca sym
Tns. sp. L-12
A. tnefaciens
R. ricketsil
FIG. 4. Unrooted phylogenetic tree showing the position of the S. velum symbionts in relation to that of other Proteobacteria species onthe basis of 16S rRNA gene sequences. The tree was constructed from evolutionary distances in Table 1. Members of the alpha and betasubclasses of the Proteobacteria are bracketed; all others are of the gamma subclass. Chemoautotrophic symbionts (sym) are listed inboldface type. Full species names listed in Table 1. Scale bar represents percent similarity.
dicted size bands for S. velum genomic DNA (Fig. 3A): AvaIand BclI, 1,080 bp; EcoNI, 1,109 bp; and Nco and Stul, 998bp (data not shown). We suggest that this technique isgenerally useful for the confirmation of the presence ofPCR-generated sequences in cells with multiple types ofDNA.The restriction patterns of 16S rRNA coding regions for
DNA extracted from S. velum gills were identical for all nineclams examined; representative results are shown in Fig. 3.This, along with the lack of variability in the partial sequenceof 16S rDNA for three individuals, suggests that there is asingle dominant bacterial species within S. velum and thatthe host-symbiont association is species specific. This resultis in agreement with the findings of Distel et al. (12) forlamellibranch bivalve and tubeworm chemoautotrophic sym-bionts.
Single bands were evident for all enzymes predicted to cutoutside or near the ends of the gene such as AvaI, Bcll,EcoNI, PvuII, XhoI (Fig. 3), and NcoI (band size, 9,600 bp;data not shown). Some of these enzymes generated restric-tion fragments larger than that of a typical bacterial ribo-somal operon (which includes the 5S, 16S, and 23S rRNAgenes [-5 kb]), indicating that the single bands observedwere not generated by double cuts within multiple operons.Furthermore, only two bands were observed for enzymespredicted to cut near the middle of the 16S rRNA gene suchas EcoRI (Fig. 3B) and StuI (bands of 4,400 and 19,500 bp;data not shown). Thus, all enzymes in all animals generatedpatterns consistent with the presence of only one copy of the16S rRNA gene in the symbiont genome (Fig. 3). However,
it should be noted that a large duplication of the regioncontaining the rRNA operon with no subsequent changes atany of the nine restriction sites could escape detection bythis analysis.These results suggest that the symbiont genome contains
but a single rRNA operon. Bacterial rRNA operons (rm),which include the 5S, 16S, and 23S rRNA genes, varyconsiderably in number among bacteria. In contrast tofree-living species of Proteobacteria, which have 4 to 7 rmloci (18), only one copy has been detected in other endosym-bionts including both the primary (P) and secondary (S)symbionts of the pea aphid, Acyrthosiphon pisum (33) (in-cluded in Fig. 4). Multiple rRNA operons have generallybeen thought necessary to support a high rate of rRNAsynthesis in rapidly dividing cells (3, 22). Unterman andBaumann (32) suggested that the aphid symbionts thereforegrow slowly, with doubling times of 2 days to parallel thegrowth rate of the aphid host. They further speculated thatthe single rRNA operon in the aphid symbiont genome is aconsequence of the adaptation to a symbiotic existence,which necessitates a slow growth rate. Although the divisionrate of S. velum symbionts is not known, it is unlikely thatthey grow slowly, since they must produce all of the biomassfor their invertebrate host. Studies of rn copy number andgrowth rates of endosymbionts and their free-living relativesfrom a variety of phylogenetic groups may help resolve thesignificance of rRNA operon redundancy.
Phylogenetic analysis of the S. velum symbionts. Phyloge-netic analysis was conducted using the Genetic Data Envi-ronment program (Steve Smith, Harvard Genome Laborato-
VOL. 174, 1992
JOURNAL OF BACTERIOLOGY, May 1992, p. 3416-3421 Vol. 174, No. 100021-9193/92/103416-06$02.00/0Copyright © 1992, American Society for Microbiology
Phylogenetic Relationships of Chemoautotrophic BacterialSymbionts of Solemya velum Say (Mollusca: Bivalvia) Determined
by 16S rRNA Gene Sequence AnalysisJONATHAN A. EISEN,lt STEVEN W. SMITH,2 AND COLLEEN M. CAVANAUGH`*Department of Organismic and Evolutionary Biology, 1 and Harvard Genome Laboratory,2
Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138
Received 4 November 1991/Accepted 9 March 1992
The protobranch bivalve Solemya velum Say (Mollusca: Bivalvia) houses chemoautotrophic symbiontsintracellularly within its gills. These symbionts were characterized through sequencing of polymerase chainreaction-amplified 16S rRNA coding regions and hybridization of an Escherichia coli gene probe to S. velumgenomic DNA restriction fragments. The symbionts appeared to have only one copy of the 16S rRNA gene. Thelack of variability in the 16S sequence and hybridization patterns within and between individual S. velumorganisms suggested that one species of symbiont is dominant within and specific for this host species.Phylogenetic analysis of the 16S sequences of the symbionts indicates that they lie within the chemoautotrophiccluster of the gamma subdivision of the eubacterial group Proteobacteria.
Procaryote-eucaryote associations in which marine inver-tebrates harbor chemoautotrophic bacteria as endosym-bionts appear to be widespread in marine habitats such asdeep-sea hydrothermal vents and coastal sediments (8, 15).In such symbioses, the procaryotes utilize the energy re-leased by the oxidation of reduced inorganic substrates, suchas hydrogen sulfide, to fix carbon dioxide via the Calvin-Benson cycle (7, 13). The hosts appear to derive nutritionfrom their endosymbionts and in turn provide the symbiontssimultaneous access to the substrates from anoxic and oxicenvironments which are necessary for energy generation.Maintenance of such intracellular symbionts presents anovel metazoan acquisition of procaryotic energy generationand autotrophic carbon fixation.While the existence of chemoautotroph-invertebrate sym-
bioses is now generally accepted, little is actually knownabout the symbionts observed in the tissues of any of thehosts because none have been cultured. Comparison ofrRNA sequences has greatly facilitated the identification ofbacteria, including unculturable microorganisms, and theelucidation of their natural relationships (38). Phylogeneticanalysis of 16S rRNA sequences enabled Distel et al. (12) toestablish that the chemoautotrophic symbionts of the hydro-thermal vent tubeworm and five species of bivalves of thesubclass Lamellibranchia are related and cluster in thegamma subdivision of the Proteobacteria (formerly purplephotosynthetic bacteria), one of the 11 major groups ofeubacteria (30).
In this investigation we sought to establish the phyloge-netic relationships and the species specificities of the sym-bionts of the protobranch bivalve Solemya velum Say, anAtlantic coast clam which has been studied as a shallow-water model of invertebrate-chemoautotroph associations(7, 9, 10). The phylogenetic placement of the S. velumsymbionts, to date limited to sequence analysis of the 5SrRNA, indicates that these symbionts also fall in the Proteo-bacteria gamma subdivision (31). However, the small size of
* Corresponding author.t Present address: Department of Biological Sciences, Stanford
University, Stanford, CA 94305.
the 5S rRNA molecule (-120 bp) precludes resolution thatcan be attained with larger molecules such as 16S rRNA(-1,550 bp) (16). Species of the genus Solemya are, to date,the only bivalves of the subclass Protobranchia in whichchemoautotrophic symbiosis has been documented. Theprotobranchs represent an important component of studiesof chemoautotrophic symbioses, since they may be theclosest living group to the ancestral bivalve condition, be-cause they dominate the deep sea and are present along agradient from the deep sea bottom to the shore (1).PCR amplification. We used the polymerase chain reaction
(PCR) (28) to amplify 16S rRNA coding regions from amixture of procaryotic and eucaryotic DNA extracted fromthe symbiont-containing gills of S. velum. S. velum werecollected from eelgrass beds near Woods Hole, Mass., andplaced in filtered (passed through filters with a pore size of0.2 ,um) seawater to cleanse body surfaces prior to dissec-tion. The gills, which contain -109 bacterial symbionts per g(wet weight), and feet, in which symbionts have not beenobserved (7), were dissected, frozen in liquid nitrogen, andstored at -85° C. Frozen tissue was homogenized in lysisbuffer, and DNA was isolated by using hexadecyltrimethy-lammonium bromide (4). DNA from Escherichia coli JM109,prepared by the miniprep method (4), was used as a positivecontrol.
Amplification of 16S rRNA genes by PCR was carried outessentially by the method of Weisburg et al. (34) usingeubacterial universal primers and 200 ng of template DNA.DNA products (Fig. 1) amplified from S. velum gill tissue(lane 1) and from the positive-control E. coli (lane 4) wereprominent single bands of approximately 1,500 bp. Amplifi-cation was not detected when DNA template was not added(lane 2), nor when DNA from S. velum foot tissue was usedas the template (lane 3).The strong amplification from gill tissue DNA and lack of
amplification from foot tissue DNA (Fig. 1) supports theconclusions from studies of enzyme activity, electron mi-croscopy (9), and 5S rRNA sequences (31) that the bacteriaare abundant within, and specific to, the gill tissue. Thisconclusion was further supported by lack of hybridization of
3416
Monday, November 26, 12
Collect from environment
Analysis of uncultured microbes
87Monday, November 26, 12
Collect from environment
Analysis of uncultured microbes
87Monday, November 26, 12
DNA extraction
PCRSequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
rRNA2
Makes lots of copies of the rRNA genes in sample
rRNA1 5’
...ACACACATAGGTGGAGCTAGCGATCGAT
CGA... 3’
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
CrRNA1
E. coli Humans
rRNA2
88
rRNA2 5’
...TACAGTATAGGTGGAGCTAGCGATCGATC
GA... 3’
PCR and phylogenetic analysis of rRNA genes
Yeast T A C A G TYeast
Monday, November 26, 12
DNA extraction
PCRSequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
rRNA2
Makes lots of copies of the rRNA genes in sample
rRNA1 5’...ACACACATAGGTGGAGCTA
GCGATCGATCGA... 3’
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
CrRNA1
E. coli Humans
rRNA2
89
rRNA2 5’..TACAGTATAGGTGGAGCTAG
CGACGATCGA... 3’
PCR and phylogenetic analysis of rRNA genes
rRNA3 5’...ACGGCAAAATAGGTGGATT
CTAGCGATATAGA... 3’
rRNA4 5’...ACGGCCCGATAGGTGGATT
CTAGCGCCATAGA... 3’
rRNA3 C A C T G T
rRNA4 C A C A G T
Yeast T A C A G T
Yeast
rRNA3 rRNA4
Monday, November 26, 12
Major phyla of bacteria and archaea (as of 2002)
No cultures
Some cultures90
Monday, November 26, 12
Uses of rDNA PCRBohannan and Hughes 2003
Hugenholtz 2002
91
Monday, November 26, 12
92
Monday, November 26, 12
Censored
Censored
93Monday, November 26, 12
94Monday, November 26, 12
Part IV:
Metagenomics
95Monday, November 26, 12
4.
Microbes in the world I:rRNA PCR
Perna et al. 2003Monday, November 26, 12
Metagenomics
shotgun
clone
Monday, November 26, 12
Novel Form of Phototrophy
Beja et al. 2000
Monday, November 26, 12
Monday, November 26, 12
Acid Mine Drainage 2004
environmental sample, however, variation within each speciespopulation might complicate assembly. If intraspecies variation isdominated by limited local polymorphism or homologous recom-bination, it should be possible to define a composite genome foreach species population. Conversely, if the genomic heterogeneitywithin a species is dominated by large rearrangements, deletions, orinsertions, it may be impossible to define composite genomes forspecies populations from natural communities.A small insert plasmid library (average insert size 3.2 kilobases
(kb)) was constructed from the biofilm DNA for random shotgunsequencing (see Supplementary Information). A total of 76.2million base pairs (bp) of DNA sequence was generated from103,462 high-quality reads (averaging 737 bp per read). Analysisof raw shotgun data (Supplementary Figs S1–5) indicated thepresence of both bacterial and archaeal genomes at sequencecoverages of up to 10£, which would be sufficient to produce ahigh-quality assembly from a conventional microbial genomeproject20,21. The shotgun data set was assembled with JAZZ, awhole-genome shotgun assembler22. Anticipating polymorphisms,we permitted alignment discrepancies beyond those expected fromsequencing error if they were consistent with end-pairing con-straints. Over 85% of the shotgun reads were assembled intoscaffolds longer than 2 kb (a scaffold is a reconstructed genomicregion that may contain gaps of a known size range). The combinedlength of the 1,183 scaffolds is 10.83 megabases (Mb). The assemblyis internally self consistent, with 97.2% of end pairs from the sameclone assembled with the appropriate orientation and separation, asexpected for a low rate of mispairing error (tracking and chimaericclones).The first step in assignment of scaffolds to organism types was to
separate the scaffolds by average G!C content. These were sub-sequently subdivided using read depth (coverage). Dinucleotidefrequencies did not allow for further subdivision. Notably, separa-tion of scaffolds into low G!C (,43.5%; Supplementary Fig. S3a)and high G!C ($43.5%) content ‘bins’ was not significantlycompromised by local heterogeneities in G!C content becausethe scaffolds were binned after assembly. As the scaffolds aretypically tens of kilobases long, local fluctuations in G!C contentare averaged over the length of each scaffold, allowing, in most cases(.99%), clear assignment to bins of high or low G!C content.
The high G!C scaffolds at approximately 10£ coverage (70scaffolds up to 137 kb in length, totalling 2.23Mb) were identifiedby the presence of a single 16S rRNA gene as belonging to thegenome of a Leptospirillum group II species. The average G!Ccontent (55.8%) is comparable to the G!C content (54.9–58%) ofL. ferriphilum19. The total high G!C scaffold length is close to theestimated genome size of Leptospirillum ferrooxidans23 (1.9Mb).This suggests that essentially the entire Leptospirillum group IIgenome was recovered from the community DNA.
The low G!C scaffolds at approximately 10£ coverage wereassembled into 59 scaffolds of up to 138 kb in length, totalling1.82Mb. The single 16S rRNA gene identified in these scaffolds was99% identical to that of the fer1 isolate; however, alignment of thescaffolds to the fer1 genome revealed an average of 22% divergenceat the nucleotide level (Supplementary Fig. S6). The total scaffoldlength is close to the genome size of fer1 (1.9Mb; Allen et al.,unpublished data), and local gene order and content are highlyconserved (Supplementary Fig. S7). Therefore, these 59 scaffoldsrepresent a nearly complete genome of a previously unknown,uncultured Ferroplasma species distinct from fer1. We designatethis as Ferroplasma type II. The dominance of this organism typewas unexpected before the genomic analysis.
We assigned the roughly 3£ coverage, high G!C scaffolds toLeptospirillum group III on the basis of rRNAmarkers (474 scaffoldsup to 31 kb, totalling 2.66Mb). Comparison of these scaffolds withthose assigned to Leptospirillum group II indicates significantsequence divergence and only locally conserved gene order, con-firming that the scaffolds belong to a relatively distant relative ofLeptospirillum group II. A partial 16S rRNA gene sequence fromSulfobacillus thermosulfidooxidans was identified in the un-assembled reads, suggesting very low coverage of this organism. Ifany Sulfobacillus scaffolds .2 kb were assembled, they would begrouped with the Leptospirillum group III scaffolds.
We compared the 3£ coverage, low G!C scaffolds (580 scaffolds,4.12Mb) to the fer1 genome in order to assign them to organismtypes (Supplementary Fig. S6). Scaffolds with $96% nucleotideidentity to fer1 were assigned to an environmental Ferroplasma typeI genome (170 scaffolds up to 47 kb in length and comprising1.48Mb of sequence). The remaining low-coverage, low G!Cscaffolds are tentatively assigned to G-plasma. The largest scaffoldin this bin (62 kb) contains the G-plasma 16S rRNA gene. The 410scaffolds assigned to G-plasma comprise 2.65Mb of sequence. Apartial 16S rRNAgene sequence fromA-plasmawas identified in theunassembled reads, suggesting low coverage of this organism. Anyscaffolds from A-plasma.2 kb would be included in the G-plasmabin. Although eukaryotes are present in the AMD system, they werein low abundance in the biofilm studied. So far, no scaffolds fromeukaryotes have been detected.
As independent evidence that the Leptospirillum group II andFerroplasma type II genomes are nearly complete, we located a fullcomplement of transfer RNA synthetases in each genome data set.An almost complete set of these genes was also recovered fromLeptospirillum group III. TheG-plasma bin containsmore than a fullset of tRNA synthetases, consistent with inclusion of some A-plasmascaffolds. In addition, we established that the Leptospirillumgroup II, Leptospirillum group III, Ferroplasma type I, Ferroplasmatype II and G-plasma bins contained only one set of rRNA genes.
Figure 1 The pink biofilm. a, Photograph of the biofilm in the Richmond mine (hand
included for scale). b, FISH image of a. Probes targeting bacteria (EUBmix; fluoresceinisothiocyanate (green)) and archaea (ARC915; Cy5 (blue)) were used in combination with a
probe targeting the Leptospirillum genus (LF655; Cy3 (red)). Overlap of red and green
(yellow) indicates Leptospirillum cells and shows the dominance of Leptospirillum.
c, Relative microbial abundances determined using quantitative FISH counts.
articles
NATURE | doi:10.1038/nature02340 | www.nature.com/nature2 © 2004 Nature Publishing GroupMonday, November 26, 12
inputs of fixed carbon or nitrogen from external sources. As withLeptospirillum group I, both Leptospirillum group II and III have thegenes needed to fix carbon by means of the Calvin–Benson–Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy-lase–oxygenase). All genomes recovered from the AMD system
contain formate hydrogenlyase complexes. These, in combinationwith carbon monoxide dehydrogenase, may be used for carbonfixation via the reductive acetyl coenzyme A (acetyl-CoA) pathwayby some, or all, organisms. Given the large number of ABC-typesugar and amino acid transporters encoded in the Ferroplasma type
Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs
identified in the Leptospirillum group II genome (63% with putative assigned function) and
1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell
cartoons are shown within a biofilm that is attached to the surface of an acid mine
drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation,
pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate
carboxylase–oxygenase. THF, tetrahydrofolate.
articles
NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5© 2004 Nature Publishing Group
Monday, November 26, 12
Metagenomics Challenge
Monday, November 26, 12
Metagenomics Challenge
Who is out there?What are they doing?
Monday, November 26, 12
Glassy Winged Sharpshooter
• Feeds on xylem sap• Vector for Pierce’s
Disease • Potential bioterror agent• Collaboration with Nancy
Moran to sequence symbiont genomes
• Funded by NSF• Published in PLOS
Biology 2006
Monday, November 26, 12
Wu et al. 2006 PLoS Biology 4: e188.Monday, November 26, 12
Sharpshooter Shotgun Sequencing
shotgun
Wu et al. 2006 PLoS Biology 4: e188.Collaboration with Nancy Moran’s lab
Monday, November 26, 12
Monday, November 26, 12
ABCDEFG
TUVWXYZ
Binning challenge
No reference genome? What do you do?
Phylogeny ....Monday, November 26, 12
CFB Phyla
Monday, November 26, 12
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes amino acids
Sulcia makes vitamins and cofactors
110
Monday, November 26, 12
Part V:
Knowing What We Don’t Know
111Monday, November 26, 12
112Monday, November 26, 12
112Monday, November 26, 12
113Monday, November 26, 12
113Monday, November 26, 12
113Monday, November 26, 12
113Monday, November 26, 12
113Monday, November 26, 12
113Monday, November 26, 12
113Monday, November 26, 12
113Monday, November 26, 12
113Monday, November 26, 12
114Monday, November 26, 12
114Monday, November 26, 12
114Monday, November 26, 12
114Monday, November 26, 12