microbiomes and dna based studies of microbial diversity - talk by jonathan eisen at singularity...

39
DNA based Studies of Microbial Diversity Jonathan A. Eisen University of California, Davis 1 Microbiomes and DNA based Studies of Microbial Diversity Jonathan A. Eisen University of California, Davis Friday, March 15, 13

Upload: jonathan-eisen

Post on 10-May-2015

1.948 views

Category:

Health & Medicine


0 download

DESCRIPTION

Talk by Jonathan Eisen 3/15/13

TRANSCRIPT

Page 1: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

DNA based Studies of Microbial Diversity

Jonathan A. Eisen

University of California, Davis

1

Microbiomes and DNA based Studies of Microbial Diversity

Jonathan A. EisenUniversity of California, Davis

Friday, March 15, 13

Page 2: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Sequencing and Microbes

• Four major “ERAs” in use of sequencing for microbial diversity studies

• Each area represented by the Eras is being revolutionized by new sequencing methods

2Friday, March 15, 13

Page 3: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Era I: rRNA Tree of Life

3

Era I: rRNA Tree of Life

Friday, March 15, 13

Page 4: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

4

Ernst Haeckel 1866

www.mblwhoilibrary.org

PlantaeProtistaAnimalia

Friday, March 15, 13

Page 6: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Woese

6Friday, March 15, 13

Page 7: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Woese 1987 - rRNA

Microbiological Reviews 51:2217

Woese

Friday, March 15, 13

Page 8: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Tree of Life

• Three main kinds of organisms Bacteria Archaea Eukaryotes

• Viruses not alive, but some call them microbes

• Many misclassifications occurred before the use of molecular methods

8Friday, March 15, 13

Page 9: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Era II: rRNA in the Environment

9

Era II: rRNA in the Environment

Friday, March 15, 13

Page 10: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Great Plate Count Anomaly

10Friday, March 15, 13

Page 11: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Culturing Microscopy

Great Plate Count Anomaly

11Friday, March 15, 13

Page 12: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Culturing Microscopy

CountCount

Great Plate Count Anomaly

12Friday, March 15, 13

Page 13: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

<<<<

Great Plate Count Anomaly

13

Culturing Microscopy

CountCountFriday, March 15, 13

Page 14: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Great Plate Count Anomaly

14

<<<<

Culturing Microscopy

CountCount

Solution?

Friday, March 15, 13

Page 15: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Great Plate Count Anomaly

15

<<<<

Culturing Microscopy

CountCount

Solution?

DNA

Friday, March 15, 13

Page 16: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Collect from environment

Analysis of uncultured microbes

16Friday, March 15, 13

Page 17: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

DNA extraction

PCR SequencerRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

Yeast

Makes lots of copies of the rRNA genes in sample

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

Yeast

17

rRNA1 5’

...TACAGTATAGGTGGAGCTAGCGATC

GATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

Friday, March 15, 13

Page 18: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

DNA extraction

PCR SequencerRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’

...ACACACATAGGTGGAGCTAGCGATC

GATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2

18

rRNA2 5’

...TACAGTATAGGTGGAGCTAGCGATC

GATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

Yeast T A C A G TYeast

Friday, March 15, 13

Page 19: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

DNA extraction

PCR SequencerRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’...ACACACATAGGTGGAGC

TAGCGATCGATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2

19

rRNA2 5’..TACAGTATAGGTGGAGCT

AGCGACGATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

rRNA3 5’...ACGGCAAAATAGGTGGA

TTCTAGCGATATAGA... 3’

rRNA4 5’...ACGGCCCGATAGGTGGATTCTAGCGCCATAGA... 3’

rRNA3 C A C T G T

rRNA4 C A C A G T

Yeast T A C A G T

Yeast

rRNA3 rRNA4

Friday, March 15, 13

Page 20: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

PCR

20

PCR and phylogenetic analysis of rRNA genes

Friday, March 15, 13

Page 21: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Major phyla of bacteria & archaea (as of 2002)

No cultures

Some cultures21

Friday, March 15, 13

Page 22: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

The Hidden Majority Richness estimates

Bohannan and Hughes 2003Hugenholtz 2002

22Friday, March 15, 13

Page 23: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Censored

Censored

Example: Human biogeography

23Friday, March 15, 13

Page 24: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Era III: Genome Sequencing

24

Era III:Genome Sequencing

Friday, March 15, 13

Page 25: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

1st Genome Sequence

Fleischmann et al. 1995 25

Friday, March 15, 13

Page 26: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Genomes Revolutionized Microbiology

• Predictions of metabolic processes

• Better vaccine and drug design

• New insights into mechanisms of evolution

• Genomes serve as template for functional studies

• New enzymes and materials for engineering and synthetic biology

26Friday, March 15, 13

Page 27: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Lateral Gene Transfer

Perna et al. 2003

27Friday, March 15, 13

Page 28: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Era IV: Genomes in the environment

28

Era IV:Genomes in the Environment

Friday, March 15, 13

Page 29: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Delong Lab

tion with multiple sequence alignments, indi-cates that the majority of active site residuesare well conserved between proteorhodopsinand archaeal bacteriorhodopsins (15).

A phylogenetic comparison with archaealrhodopsins placed proteorhodopsin on an in-dependent long branch, with moderate statis-tical support for an affiliation with sensoryrhodopsins (16) (Fig. 1B). The finding ofarchaeal-like rhodopsins in organisms as di-verse as marine proteobacteria and eukarya(6) suggests a potential role for lateral genetransfer in their dissemination. Available ge-nome sequence data are insufficient to iden-tify the evolutionary origins of the proteo-rhodopsin genes. The environments fromwhich the archaeal and bacterial rhodopsinsoriginate are, however, strikingly different.Proteorhodopsin is of marine origin, whereasthe archaeal rhodopsins of extreme halophilesexperience salinity 4 to 10 times greater thanthat in the sea (14).

Functional analysis. To determinewhether proteorhodopsin binds retinal, weexpressed the protein in Escherichia coli(17). After 3 hours of induction in the pres-ence of retinal, cells expressing the proteinacquired a reddish pigmentation (Fig. 3A).When retinal was added to the membranes ofcells expressing the proteorhodopsin apopro-tein, an absorbance peak at 520 nm wasobserved after 10 min of incubation (Fig.3B). On further incubation, the peak at 520nm increased and had a !100-nm half-band-width. The 520-nm pigment was generatedonly in membranes containing proteorhodop-sin apoprotein, and only in the presence ofretinal, and its !100-nm half-bandwidth istypical of retinylidene protein absorptionspectra found in other rhodopsins. The red-shifted "max of retinal ("max # 370 nm in thefree state) is indicative of a protonated Schiffbase linkage of the retinal, presumably to thelysine residue in helix G (18).

Light-mediated proton translocation was de-termined by measuring pH changes in a cellsuspension exposed to light. Net outward trans-port of protons was observed solely in proteor-hodopsin-containing E. coli cells and only inthe presence of retinal and light (Fig. 4A).Light-induced acidification of the medium wascompletely abolished by the presence of a 10$M concentration of the protonophore carbonylcyanide m-chlorophenylhydrazone (19). Illumi-nation generated a membrane electrical poten-tial in proteorhodopsin-containing right-side-out membrane vesicles, in the presence of reti-nal, reaching –90 mV 2 min after light onset(20) (Fig. 4B). These data indicate that proteo-rhodopsin translocates protons and is capable ofgenerating membrane potential in a physiolog-ically relevant range. Because these activitieswere observed in E. coli membranes containingoverexpressed protein, the levels of proteorho-dopsin activity in its native state remain to be

determined. The ability of proteorhodopsin togenerate a physiologically significant mem-brane potential, however, even when heterolo-gously expressed in nonnative membranes, isconsistent with a postulated proton-pumpingfunction for proteorhodopsin.

Archaeal bacteriorhodopsin, and to a less-er extent sensory rhodopsins (21), can bothmediate light-driven proton-pumping activi-ty. However, sensory rhodopsins are general-ly cotranscribed with genes encoding theirown transducer of light stimuli [for example,Htr (22, 23)]. Although sequence analysis ofproteorhodopsin shows moderate statisticalsupport for a specific relationship with sen-

sory rhodopsins, there is no gene for an Htr-like regulator adjacent to the proteorhodopsingene. The absence of an Htr-like gene inclose proximity to the proteorhodopsin genesuggests that proteorhodopsin may functionprimarily as a light-driven proton pump. It ispossible, however, that such a regulatormight be encoded elsewhere in the proteobac-terial genome.

To further verify a proton-pumping func-tion for proteorhodopsin, we characterizedthe kinetics of its photochemical reaction cy-cle. The transport rhodopsins (bacteriorho-dopsins and halorhodopsins) are character-ized by cyclic photochemical reaction se-

Fig. 1. (A) Phylogenetic tree of bacterial 16S rRNA gene sequences, including that encoded on the130-kb bacterioplankton BAC clone (EBAC31A08) (16). (B) Phylogenetic analysis of proteorhodop-sin with archaeal (BR, HR, and SR prefixes) and Neurospora crassa (NOP1 prefix) rhodopsins (16).Nomenclature: Name_Species.abbreviation_Genbank.gi (HR, halorhodopsin; SR, sensory rhodopsin;BR, bacteriorhodopsin). Halsod, Halorubrum sodomense; Halhal, Halobacterium salinarum (halo-bium); Halval, Haloarcula vallismortis; Natpha, Natronomonas pharaonis; Halsp, Halobacterium sp;Neucra, Neurospora crassa.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 289 15 SEPTEMBER 2000 1903

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

29Friday, March 15, 13

Page 30: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Delong Lab

tion with multiple sequence alignments, indi-cates that the majority of active site residuesare well conserved between proteorhodopsinand archaeal bacteriorhodopsins (15).

A phylogenetic comparison with archaealrhodopsins placed proteorhodopsin on an in-dependent long branch, with moderate statis-tical support for an affiliation with sensoryrhodopsins (16) (Fig. 1B). The finding ofarchaeal-like rhodopsins in organisms as di-verse as marine proteobacteria and eukarya(6) suggests a potential role for lateral genetransfer in their dissemination. Available ge-nome sequence data are insufficient to iden-tify the evolutionary origins of the proteo-rhodopsin genes. The environments fromwhich the archaeal and bacterial rhodopsinsoriginate are, however, strikingly different.Proteorhodopsin is of marine origin, whereasthe archaeal rhodopsins of extreme halophilesexperience salinity 4 to 10 times greater thanthat in the sea (14).

Functional analysis. To determinewhether proteorhodopsin binds retinal, weexpressed the protein in Escherichia coli(17). After 3 hours of induction in the pres-ence of retinal, cells expressing the proteinacquired a reddish pigmentation (Fig. 3A).When retinal was added to the membranes ofcells expressing the proteorhodopsin apopro-tein, an absorbance peak at 520 nm wasobserved after 10 min of incubation (Fig.3B). On further incubation, the peak at 520nm increased and had a !100-nm half-band-width. The 520-nm pigment was generatedonly in membranes containing proteorhodop-sin apoprotein, and only in the presence ofretinal, and its !100-nm half-bandwidth istypical of retinylidene protein absorptionspectra found in other rhodopsins. The red-shifted "max of retinal ("max # 370 nm in thefree state) is indicative of a protonated Schiffbase linkage of the retinal, presumably to thelysine residue in helix G (18).

Light-mediated proton translocation was de-termined by measuring pH changes in a cellsuspension exposed to light. Net outward trans-port of protons was observed solely in proteor-hodopsin-containing E. coli cells and only inthe presence of retinal and light (Fig. 4A).Light-induced acidification of the medium wascompletely abolished by the presence of a 10$M concentration of the protonophore carbonylcyanide m-chlorophenylhydrazone (19). Illumi-nation generated a membrane electrical poten-tial in proteorhodopsin-containing right-side-out membrane vesicles, in the presence of reti-nal, reaching –90 mV 2 min after light onset(20) (Fig. 4B). These data indicate that proteo-rhodopsin translocates protons and is capable ofgenerating membrane potential in a physiolog-ically relevant range. Because these activitieswere observed in E. coli membranes containingoverexpressed protein, the levels of proteorho-dopsin activity in its native state remain to be

determined. The ability of proteorhodopsin togenerate a physiologically significant mem-brane potential, however, even when heterolo-gously expressed in nonnative membranes, isconsistent with a postulated proton-pumpingfunction for proteorhodopsin.

Archaeal bacteriorhodopsin, and to a less-er extent sensory rhodopsins (21), can bothmediate light-driven proton-pumping activi-ty. However, sensory rhodopsins are general-ly cotranscribed with genes encoding theirown transducer of light stimuli [for example,Htr (22, 23)]. Although sequence analysis ofproteorhodopsin shows moderate statisticalsupport for a specific relationship with sen-

sory rhodopsins, there is no gene for an Htr-like regulator adjacent to the proteorhodopsingene. The absence of an Htr-like gene inclose proximity to the proteorhodopsin genesuggests that proteorhodopsin may functionprimarily as a light-driven proton pump. It ispossible, however, that such a regulatormight be encoded elsewhere in the proteobac-terial genome.

To further verify a proton-pumping func-tion for proteorhodopsin, we characterizedthe kinetics of its photochemical reaction cy-cle. The transport rhodopsins (bacteriorho-dopsins and halorhodopsins) are character-ized by cyclic photochemical reaction se-

Fig. 1. (A) Phylogenetic tree of bacterial 16S rRNA gene sequences, including that encoded on the130-kb bacterioplankton BAC clone (EBAC31A08) (16). (B) Phylogenetic analysis of proteorhodop-sin with archaeal (BR, HR, and SR prefixes) and Neurospora crassa (NOP1 prefix) rhodopsins (16).Nomenclature: Name_Species.abbreviation_Genbank.gi (HR, halorhodopsin; SR, sensory rhodopsin;BR, bacteriorhodopsin). Halsod, Halorubrum sodomense; Halhal, Halobacterium salinarum (halo-bium); Halval, Haloarcula vallismortis; Natpha, Natronomonas pharaonis; Halsp, Halobacterium sp;Neucra, Neurospora crassa.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 289 15 SEPTEMBER 2000 1903

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

30Friday, March 15, 13

Page 31: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

31Friday, March 15, 13

Page 32: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

!"##"$% #& '(#)$"

!"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189: *+*

;694796<9=9;>?2 :9@785@7> 367 ;67>724 ?2 4<7 :7:A6327> 9B 234?C7:36?27 A38476?9;@32D4921

#<7 E3><F;<949@G>?> =343 ;76:?4 7>4?:34?92 9B 4<7 87@@5@36 892F8724634?92 9B ;694796<9=9;>?21 ">>5:?2H I,J 4<7 E3>< G?7@= 9B 4<772C?692:7243@ ;?H:724 ?> >?:?@36 49 4<34 9B ;694796<9=9;>?27K;67>>7= ?2 !" #$%& I4<34 ?>L ,/1MN 3A>96;4?92 8<32H7 34 M// 2:;76 3A>96;4?92 52?4 9B ;?H:724 34 M.O 2:JL 07 83@85@347 /1/PP3A>96;4?92 52?4> 9B ;694796<9=9;>?2 34 M.O 2: I/1PM!H 9B ;69F4796<9=9;>?2 ;6947?2 ;76 @?467 9B >73 03476J1 Q@3>< >;78469>89;G 03>2787>>36G 49 =74784 4<7 ;694796<9=9;>?2 ;?H:724> A7835>7 4<767 ?>:58< H673476 3A>96;4?92 ?2 4<7 C?>?A@7 632H7 AG 94<76 ;?H:724> ?24<7 >3:;@7L 0<?8< ><90 ;73D> 9B ,1RL ,1PL /1M. 32= /1OP 3A>96;4?9252?4>L 34 +POL +RSL R+O 32= ROP 2:L 67>;784?C7@G1

">>5:?2H 3@>9 I.J 3 :9@36 3A>96;4?92 897BT8?724 9B M/L///U!,

8:!, 34 4<7 3A>96;4?92 :3K?:5:L IPJ E5967>87287 &' (&)* <GA6?=?FV34?92 89524> 9B 3 4943@ 9B M1R " ,/,/ W"%SR 87@@> ?2 4<7 89287246347=>3:;@7 I!,/N 9B 4<7 4943@ A38476?3X >77U74<9=>JL I+J M/N 6789C76G9B :7:A6327> B69: 4<7 87@@>L 32= IMJ 4<34 4<7 5285@4?C347= W"%SRH695; ?> 4<7 ;6?28?;3@ A38476?9;@32D492 H695; 5>?2H 4<7>7 ;?H:724>L07 83@85@347 4<34 4<767 367 .1+ " ,/+ ;694796<9=9;>?2 :9@785@7> ;76W"%SR 87@@1

#<?> C3@57 ?> 92 4<7 96=76 9B 4<7 8928724634?92 9B A38476?96<9F=9;>?2 ?2 3 +" (,%&',-*. 87@@L ?2 0<?8< >5A>4324?3@ ;964?92> 9B 4<7:7:A6327 >56B387 3673 9B 4<7 87@@ 892>?>4 9B A38476?96<9=9;>?2 ?2 34?H<4@G ;38D7= 86G>43@@?27 3663GS1 Q96 89:;36?>92L .1+ " ,/+ A38476F?96<9=9;>?2 :9@785@7> =72>7@G ;38D7= ?2 4<7 ;56;@7F:7:A6327@344?87 095@= 89C76 3 /1RF!: =?3:7476L E34 8?685@36 3673 9B :7:FA6327Y3 >?H2?T8324 ;964?92 9B 4<7 >56B387 9B 3 >?2H@7 87@@P1 #<?>25:A76 9B:9@785@7> ?> >5BT8?724 49 ;69=587 >5A>4324?3@ 3:9524> 9B"#Z 52=76 ?@@5:?234?92[1 #<767B967L 4<7 <?H< =72>?4G 9B ;694796<9F=9;>?2 ?2 4<7 W"%SR :7:A6327 ?2=?8347= AG 956 83@85@34?92>>4692H@G >5HH7>4> 4<34 4<?> ;6947?2 <3> 3 >?H2?T8324 69@7 ?2 4<7

;<G>?9@9HG 9B 4<7>7 A38476?3 &' (&)*1#9 7K;@967 4<7 ;9>>?A@7 7K?>47287 9B 94<76 ;694796<9=9;>?2>L 07

>867727= 4<7 >3:7 :?K7=F;9;5@34?92 A38476?3@ 364?T8?3@ 8<69:9F>9:7 I\"]J @?A636G,/ ?2 0<?8< ;694796<9=9;>?2 03> ?2?4?3@@G =?>F89C767=L 0?4< 292F=7H7276347 ;9@G:763>7 8<3?2 67384?92 IZ]%J;6?:76>,1 W7C763@ 3==?4?923@ ;694796<9=9;>?2F89243?2?2H \"]8@927> 0767 B952= ?2 4<7 @?A636G1 #<7>7 ;694796<9=9;>?2> 0767>?:?@36L A54 =?= =?BB76 ?2 4<7?6 3:?29F38?= >7^57287> 0<72 89:F;367= 0?4< 4<7 96?H?23@ ;694796<9=9;>?2 IB96 7K3:;@7L >77 8@927>P,"SL R+"M 32= +/&SX Q?H> P 32= +J1 $>?2H 4<7 >3:7 292F=7H7276347;6?:76>L 07 895@= 3@>9 3:;@?BG AG Z]% ;694796<9=9;>?2 H727> B69:A38476?9;@32D492_!" 7K46384>L ?28@5=?2H 4<9>7 B69:U924767G \3GIU\ 8@927>X Q?H1 PJL 4<7 W954<762 98732 IZ3@:76 >434?92X Z"*8@927>J 32= 03476> 9B 4<7 872463@ !964< Z38?T8 98732 I`303??)8732 #?:7 >76?7> >434?92,,X `)# 8@927>J1

a7 =747847= ,M =?BB76724 C36?324> 9B ;694796<9=9;>?2 ?2 4<7 Z]%FH7276347= U924767G \3G ;694796<9=9;>?2 H727 @?A636GL B3@@?2H ?2494<677 8@5>476> IQ?H1 PJ 4<34 ><367 34 @73>4 [ON ?=724?4G 9C76 .+S 3:?2938?=> IQ?H1 +J 32= [PN ?=724?4G 34 4<7 _!" @7C7@1 #09 ;694796<9F=9;>?2 H727> B69: U924767GL +/&S 32= R+"ML 0767 7K;67>>7= ?2!" #$%& 32= ;69=587= 3A>96;4?92 >;78463 C76G >?:?@36 49 4<7 96?H?23@;694796<9=9;>?2, I8@927 P,"SJ ?>9@347= B69: 4<7 >3:7 03476> I=343294 ><902J1

%7:36D3A@GL 3@@ 4<7 ;694796<9=9;>?2 H727> 4<34 0767 3:;@?T7= AGZ]% B69: "243684?8 :36?27 A38476?9;@32D492 0767 =?BB76724 B69:4<9>7 9B U924767G \3G IQ?H1 PJL ><36?2H 3 :3K?:5: 9B OSN ?=724?4G9C76 .+S 3:?29 38?=> 0?4< 4<7 U924767G 8@3=7 IQ?H1 +J1 #<7 8<32H7>?2 3:?29F38?= >7^57287> 0767 294 67>46?847= 49 4<7 <G=69;<?@?8

Laserflash

Untreated membranes

Hydroxylamine-treated membranes

Retinal-reconstituted membranes

10–3

AU

10–1 s

!"#$%& ' !"#$% &"#'()*+,-$+ .%"*#)$*.# ". /00 *1 23 " 42*.$%$5 6"5 7"-.$%)289"*:.2*

1$17%"*$ 8%$8"%".)2*; <28= 7$32%$ "++).)2* 23 '5+%2>59"1)*$? 1)++9$= "3.$% 0;@4

'5+%2>59"1)*$ .%$".1$*. ". 8A B;0= CD !E= F).' /00(*1 )99,1)*".)2* 32% G01)*? 72..21=

"3.$% -$*.%)3,H)*H .F)-$ F).' %$#,#8$*#)2* )* C0014 8'2#8'".$ 7,33$%= 8A B;0= 32992F$+

75 "++).)2* 23 /!4 "99(!"#$% %$.)*"9 "*+ )*-,7".)2* 32% C ';

MB 0m2

MB 0m1

MB 20m2

MB 20m5

MB 40m12

MB 100m9

HOT 75m3

HOT 75m8

0.01

Mon

tere

y B

ay a

nd s

hallo

w H

OT

Ant

arct

ica

and

deep

HO

T

HOT 75m1

PAL B1

PAL B5PAL B6

PAL E7

PAL E1PAL B7

PAL B2PAL B8

MB 100m10

MB 100m5

MB 100m7

MB 20m12

MB 40m1

MB 40m5BAC 40E8

BAC 31A8

PAL E6

BAC 64A5

HOT 0m1

HOT 75m4

!"#$%& ( I'592H$*$.)- "*"95#)# 23 .'$ )*3$%%$+ "1)*2("-)+ #$J,$*-$ 23 -92*$+

8%2.$2%'2+28#)* H$*$#; K)#."*-$ "*"95#)# 23 @@0 82#).)2*# F"# ,#$+ .2 -"9-,9".$ .'$ .%$$

75 *$)H'72,%(L2)*)*H ,#)*H .'$ I",8M$"%-' 8%2H%"1 23 .'$ N)#-2*#)* I"-:"H$ O$%#)2*

C0;0 PQ$*$.)-# E218,.$% Q%2,8? 4"+)#2*= N)#-2*#)*R; &' %#()$#"*+ 7"-.$%)2%'2+28#)*

F"# ,#$+ "# "* 2,.H%2,8= "*+ )# *2. #'2F*; M-"9$ 7"% %$8%$#$*.# *,17$% 23 #,7#.).,.)2*#

8$% #).$; 629+ *"1$# )*+)-".$ .'$ 8%2.$2%'2+28#)*# .'". F$%$ #8$-.%"995 -'"%"-.$%)S$+ )*

.')# #.,+5;

© 2001 Macmillan Magazines Ltd

32Friday, March 15, 13

Page 33: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Sargasso Sea

assembly to identify a set of large, deeply as-sembling nonrepetitive contigs. This was used toset the expected coverage in unique regions (to23!) for a final run of the assembler. This al-lowed the deep contigs to be treated as uniquesequence when they would otherwise be labeledas repetitive. We evaluated our final assemblyresults in a tiered fashion, looking at well-sampledgenomic regions separately from those barelysampled at our current level of sequencing.

The 1.66 million sequences from theWeatherbird II samples (table S1; samples 1 to4; stations 3, 11, and 13), were pooled andassembled to provide a single master assemblyfor comparative purposes. The assembly gener-ated 64,398 scaffolds ranging in size from 826bp to 2.1 Mbp, containing 256 Mbp of uniquesequence and spanning 400 Mbp. After assem-bly, there remained 217,015 paired-end reads,or “mini-scaffolds,” spanning 820.7 Mbp aswell as an additional 215,038 unassembled sin-gleton reads covering 169.9 Mbp (table S2,column 1). The Sorcerer II samples providedalmost no assembly, so we consider for thesesamples only the 153,458 mini-scaffolds, span-ning 518.4 Mbp, and the remaining 18,692singleton reads (table S2, column 2). In total,1.045 Gbp of nonredundant sequence was gen-erated. The lack of overlapping reads within theunassembled set indicates that lack of addition-al assembly was not due to algorithmic limita-tions but to the relatively limited depth of se-quencing coverage given the level of diversitywithin the sample.

The whole-genome shotgun (WGS) assemblyhas been deposited at DDBJ/EMBL/GenBankunder the project accession AACY00000000,and all traces have been deposited in a corre-sponding TraceDB trace archive. The versiondescribed in this paper is the first version,AACY01000000. Unlike a conventional WGSentry, we have deposited not just contigs andscaffolds but the unassembled paired singletonsand individual singletons in order to accurate-ly reflect the diversity in the sample andallow searches across the entire sample with-in a single database.Genomes and large assemblies. Our

analysis first focused on the well-sampled ge-nomes by characterizing scaffolds with at least3! coverage depth. There were 333 scaffoldscomprising 2226 contigs and spanning 30.9Mbp that met this criterion (table S3), account-ing for roughly 410,000 reads, or 25% of thepooled assembly data set. From this set of well-sampled material, we were able to cluster andclassify assemblies by organism; from the rarespecies in our sample, we used sequence similar-ity based methods together with computationalgene finding to obtain both qualitative and quan-titative estimates of genomic and functional diver-sity within this particular marine environment.

We employed several criteria to sort themajor assembly pieces into tentative organism“bins”; these include depth of coverage, oligo-

nucleotide frequencies (7), and similarity topreviously sequenced genomes (5). With thesetechniques, the majority of sequence assignedto the most abundant species (16.5 Mbp of the30.9 Mb in the main scaffolds) could be sepa-rated based on several corroborating indicators.In particular, we identified a distinct group ofscaffolds representing an abundant populationclearly related to Burkholderia (fig. S2) andtwo groups of scaffolds representing two dis-tinct strains closely related to the published

Shewanella oneidensis genome (8) (fig. S3).There is a group of scaffolds assembling at over6! coverage that appears to represent the ge-nome of a SAR86 (table S3). Scaffold setsrepresenting a conglomerate of Prochlorococ-cus strains (Fig. 2), as well as an unculturedmarine archaeon, were also identified (table S3;Fig. 3). Additionally, 10 putative mega plasmidswere found in the main scaffold set, coveredat depths ranging from 4! to 36! (indicatedwith shading in table S3 with nine depicted in

Fig. 1. MODIS-Aqua satellite image ofocean chlorophyll in the Sargasso Sea gridabout the BATS site from 22 February2003. The station locations are overlainwith their respective identifications. Notethe elevated levels of chlorophyll (greencolor shades) around station 3, which arenot present around stations 11 and 13.

Fig. 2. Gene conser-vation among closelyrelated Prochlorococ-cus. The outermostconcentric circle ofthe diagram depictsthe competed genom-ic sequence of Pro-chlorococcus marinusMED4 (11). Fragmentsfrom environmentalsequencing were com-pared to this complet-ed Prochlorococcus ge-nome and are shown inthe inner concentriccircles and were givenboxed outlines. Genesfor the outermost cir-cle have been as-signed psuedospec-trum colors based onthe position of thosegenes along the chro-mosome, where genesnearer to the start ofthe genome are col-ored in red, and genesnearer to the end of the genome are colored in blue. Fragments from environmental sequencingwere subjected to an analysis that identifies conserved gene order between those fragments andthe completed Prochlorococcus MED4 genome. Genes on the environmental genome segmentsthat exhibited conserved gene order are colored with the same color assignments as theProchlorococcus MED4 chromosome. Colored regions on the environmental segments exhibitingcolor differences from the adjacent outermost concentric circle are the result of conserved geneorder with other MED4 regions and probably represent chromosomal rearrangements. Genes thatdid not exhibit conserved gene order are colored in black.

R E S E A R C H A R T I C L E

www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004 67

33Friday, March 15, 13

Page 34: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Functional Diversity of Proteorhodopsins?

Venter et al., 2004

34Friday, March 15, 13

Page 35: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

using the curated TIGR role categories (5). Abreakdown of predicted genes by category isgiven in Table 1.

The samples analyzed here represent onlyspecific size fractions of the sampled environ-ment, dictated by the pore size of the collectionfilters. By our selection of filter pore sizes, wedeliberately focused this initial study on theidentification and analysis of microbial organ-isms. However, we did examine the data for thepresence of eukaryotic content as well. Al-though the bulk of known protists are 10 !mand larger, there are some known in the rangeof 1 to 1.5 !m in diameter [for example, Os-treococcus tauri (15) and the Bolidomonas spe-cies (16)], and such organisms could potentiallywork their way through a 0.80 !m prefilter. Aninitial screening for 18S ribosomal RNA(rRNA), a commonly used eukaryotic marker,identified 69 18S rRNA genes, with 63 of theseon singletons and the remaining 6 on verysmall, lowcoverage assemblies. These 18SrRNAs are similar to uncultured marine eu-karyotes and are indicative of a eukaryotic pres-ence but inconclusive on their own. Becausebacterial DNA contains a much greater densityof genes than eukaryotic DNA, the relativeproportion of gene content can be used as an-other indicator to distinguish eukaryotic mate-rial in our sample. An inverse relation wasobserved between the pore size of the pre-filtersand collection filters and the fraction of se-quence coding for genes (table S5). This rela-tion, together with the presence of 18S rRNAgenes in the samples, is strong evidence thateukaryotic material was indeed captured.Diversity and species richness. Most

phylogenetic surveys of uncultured organismshave been based on studies of rRNA genesusing polymerase chain reaction (PCR) withprimers for highly conserved positions in thosegenes. More than 60,000 small subunit rRNAsequences from a wide diversity of prokaryotictaxa have been reported (17). However, PCR-based studies are inherently biased, because notall rRNA genes amplify with the same “univer-sal” primers. Within our shotgun sequence dataand assemblies, we identified 1164 distinctsmall subunit rRNA genes or fragments ofgenes in the Weatherbird II assemblies andanother 248 within the Sorcerer II reads (5).Using a 97% sequence similarity cutoff to dis-tinguish unique phylotypes, we identified 148previously unknown phylotypes in our samplewhen compared against the RDP II database(17). With a 99% similarity cutoff, this numberincreases to 643. Though sequence similarity isnot necessarily an accurate predictor of func-tional conservation and sequence divergencedoes not universally correlate with the biologi-cal notion of “species,” defining species (alsoknown as phylotypes) by sequence similaritywithin the rRNA genes is the accepted standardin studies of uncultured microbes. All sampledrRNAs were then assigned to taxonomic groups

using an automated rRNA classification pro-gram (5). Our samples are dominated by rRNAgenes from Proteobacteria (primarily membersof the ", #, and $ subgroups) with moderatecontributions from Firmicutes (low-GC Grampositive), Cyanobacteria, and species in theCFB phyla (Cytophaga, Flavobacterium, andBacteroides) (fig. S4A; Fig. 6). The patterns wesee are similar in broad outline to those ob-served by rRNA PCR studies from the SargassoSea (18), but with some quantitative differencesthat reflect either biases in PCR studies or dif-ferences in the species found in our sampleversus those in other studies.

An additional disadvantage associated withrelying on rRNA for estimates of species diver-sity and abundance is the varying number ofcopies of rRNA genes between taxa (more thanan order of magnitude among prokaryotes)(19). Therefore, we constructed phylogenetictrees (fig. S4, B to E) using other representedphylogenetic markers found in our data set,[RecA/RadA, heat shock protein 70 (HSP70),elongation factor Tu (EF-Tu), and elongationfactor G (EF-G)]. Each marker gene interval inour data set (with a minimum length of 75amino acids) was assigned to a putative taxo-nomic group using the phylogenetic analysisdescribed for rRNA. For example, our data set

contains over 600 recA homologs fromthroughout the bacterial phylogeny, includingrepresentatives of Proteobacteria, low- andhigh-GC Gram positives, Cyanobacteria, greensulfur and green nonsulfur bacteria, and othergroups. Assignment to phylogenetic groupsshows a broad consensus among the differentphylogenetic markers. For most taxa, therRNA-based proportion is the highest or lowestin comparison to the other markers. We believethis is due to the large amount of variation incopy number of rRNA genes between species.For example, the rRNA-based estimate of theproportion of $Proteobacteria is the highest,while the estimate for cyanobacteria is the low-est, which is consistent with the reports thatmembers of the $-Proteobacteria frequentlyhave more than five rRNA operon copies,whereas cyanobacteria frequently have fewerthan three (19).

Just as phylogenetic classification isstrengthened by a more comprehensive markerset, so too is the estimation of species richness.In this analysis, we define “genomic” species asa clustering of assemblies or unassembled readsmore than 94% identical on the nucleotide lev-el. This cutoff, adjusted for the protein-codingmarker genes, is roughly comparable to the97% cutoff traditionally used for rRNA. Thus

Fig. 6. Phylogenetic diversity of Sargasso Sea sequences using multiple phylogenetic markers. Therelative contribution of organisms from different major phylogenetic groups (phylotypes) wasmeasured using multiple phylogenetic markers that have been used previously in phylogeneticstudies of prokaryotes: 16S rRNA, RecA, EF-Tu, EF-G, HSP70, and RNA polymerase B (RpoB). Therelative proportion of different phylotypes for each sequence (weighted by the depth of coverageof the contigs from which those sequences came) is shown. The phylotype distribution wasdetermined as follows: (i) Sequences in the Sargasso data set corresponding to each of these geneswere identified using HMM and BLAST searches. (ii) Phylogenetic analysis was performed for eachphylogenetic marker identified in the Sargasso data separately compared with all members of thatgene family in all complete genome sequences (only complete genomes were used to control forthe differential sampling of these markers in GenBank). (iii) The phylogenetic affinity of eachsequence was assigned based on the classification of the nearest neighbor in the phylogenetic tree.

R E S E A R C H A R T I C L E

2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org70

35Friday, March 15, 13

Page 36: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

ARTICLES

A human gut microbial gene catalogueestablished by metagenomic sequencingJunjie Qin1*, Ruiqiang Li1*, Jeroen Raes2,3, Manimozhiyan Arumugam2, Kristoffer Solvsten Burgdorf4,Chaysavanh Manichanh5, Trine Nielsen4, Nicolas Pons6, Florence Levenez6, Takuji Yamada2, Daniel R. Mende2,Junhua Li1,7, Junming Xu1, Shaochuan Li1, Dongfang Li1,8, Jianjun Cao1, Bo Wang1, Huiqing Liang1, Huisong Zheng1,Yinlong Xie1,7, Julien Tap6, Patricia Lepage6, Marcelo Bertalan9, Jean-Michel Batto6, Torben Hansen4, Denis LePaslier10, Allan Linneberg11, H. Bjørn Nielsen9, Eric Pelletier10, Pierre Renault6, Thomas Sicheritz-Ponten9,Keith Turner12, Hongmei Zhu1, Chang Yu1, Shengting Li1, Min Jian1, Yan Zhou1, Yingrui Li1, Xiuqing Zhang1,Songgang Li1, Nan Qin1, Huanming Yang1, Jian Wang1, Søren Brunak9, Joel Dore6, Francisco Guarner5,Karsten Kristiansen13, Oluf Pedersen4,14, Julian Parkhill12, Jean Weissenbach10, MetaHIT Consortium{, Peer Bork2,S. Dusko Ehrlich6 & Jun Wang1,13

To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Herewe describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundantmicrobial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set,,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent)microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. Thegenes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entirecohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which arealso largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms offunctions present in all individuals and most bacteria, respectively.

It has been estimated that the microbes in our bodies collectivelymake up to 100 trillion cells, tenfold the number of human cells,and suggested that they encode 100-fold more unique genes thanour own genome1. The majority of microbes reside in the gut, havea profound influence on human physiology and nutrition, and arecrucial for human life2,3. Furthermore, the gut microbes contribute toenergy harvest from food, and changes of gut microbiome may beassociated with bowel diseases or obesity4–8.

To understand and exploit the impact of the gut microbes onhuman health and well-being it is necessary to decipher the content,diversity and functioning of the microbial gut community. 16S ribo-somal RNA gene (rRNA) sequence-based methods9 revealed that twobacterial divisions, the Bacteroidetes and the Firmicutes, constituteover 90% of the known phylogenetic categories and dominate thedistal gut microbiota10. Studies also showed substantial diversity ofthe gut microbiome between healthy individuals4,8,10,11. Although thisdifference is especially marked among infants12, later in life the gutmicrobiome converges to more similar phyla.

Metagenomic sequencing represents a powerful alternative torRNA sequencing for analysing complex microbial communities13–15.Applied to the human gut, such studies have already generated some3 gigabases (Gb) of microbial sequence from faecal samples of 33

individuals from the United States or Japan8,16,17. To get a broaderoverview of the human gut microbial genes we used the IlluminaGenome Analyser (GA) technology to carry out deep sequencing oftotal DNA from faecal samples of 124 European adults. We generated576.7 Gb of sequence, almost 200 times more than in all previousstudies, assembled it into contigs and predicted 3.3 million uniqueopen reading frames (ORFs). This gene catalogue contains virtuallyall of the prevalent gut microbial genes in our cohort, provides abroad view of the functions important for bacterial life in the gutand indicates that many bacterial species are shared by differentindividuals. Our results also show that short-read metagenomicsequencing can be used for global characterization of the geneticpotential of ecologically complex environments.

Metagenomic sequencing of gut microbiomes

As part of the MetaHIT (Metagenomics of the Human IntestinalTract) project, we collected faecal specimens from 124 healthy, over-weight and obese individual human adults, as well as inflammatorybowel disease (IBD) patients, from Denmark and Spain (Supplemen-tary Table 1). Total DNA was extracted from the faecal specimens18

and an average of 4.5 Gb (ranging between 2 and 7.3 Gb) of sequencewas generated for each sample, allowing us to capture most of the

*These authors contributed equally to this work.{Lists of authors and affiliations appear at the end of the paper.

1BGI-Shenzhen, Shenzhen 518083, China. 2European Molecular Biology Laboratory, 69117 Heidelberg, Germany. 3VIB—Vrije Universiteit Brussel, 1050 Brussels, Belgium. 4HagedornResearch Institute, DK 2820 Copenhagen, Denmark. 5Hospital Universitari Val d’Hebron, Ciberehd, 08035 Barcelona, Spain. 6Institut National de la Recherche Agronomique, 78350Jouy en Josas, France. 7School of Software Engineering, South China University of Technology, Guangzhou 510641, China. 8Genome Research Institute, Shenzhen University MedicalSchool, Shenzhen 518000, China. 9Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark. 10Commissariat a l’EnergieAtomique, Genoscope, 91000 Evry, France. 11Research Center for Prevention and Health, DK-2600 Glostrup, Denmark. 12The Wellcome Trust Sanger Institute, Hinxton, CambridgeCB10 1SA, UK. 13Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark. 14Institute of Biomedical Sciences, University of Copenhagen & Faculty of HealthScience, University of Aarhus, 8000 Aarhus, Denmark.

Vol 464 | 4 March 2010 | doi:10.1038/nature08821

59Macmillan Publishers Limited. All rights reserved©2010

36Friday, March 15, 13

Page 37: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Almost all (99.96%) of the phylogenetically assigned genes belongedto the Bacteria and Archaea, reflecting their predominance in the gut.Genes that were not mapped to orthologous groups were clusteredinto gene families (see Methods). To investigate the functional con-tent of the prevalent gene set we computed the total number oforthologous groups and/or gene families present in any combinationof n individuals (with n 5 2–124; see Fig. 2c). This rarefaction ana-lysis shows that the ‘known’ functions (annotated in eggNOG orKEGG) quickly saturate (a value of 5,569 groups was observed): whensampling any subset of 50 individuals, most have been detected.However, three-quarters of the prevalent gut functionalities consistsof uncharacterized orthologous groups and/or completely novel genefamilies (Fig. 2c). When including these groups, the rarefaction curveonly starts to plateau at the very end, at a much higher level (19,338groups were detected), confirming that the extensive sampling of alarge number of individuals was necessary to capture this considerableamount of novel/unknown functionality.

Bacterial functions important for life in the gut

The extensive non-redundant catalogue of the bacterial genes fromthe human intestinal tract provides an opportunity to identify bac-terial functions important for life in this environment. There arefunctions necessary for a bacterium to thrive in a gut context (thatis, the ‘minimal gut genome’) and those involved in the homeostasisof the whole ecosystem, encoded across many species (the ‘minimalgut metagenome’). The first set of functions is expected to be presentin most or all gut bacterial species; the second set in most or allindividuals’ gut samples.

To identify the functions encoded by the minimal gut genome weuse the fact that they should be present in most or all gut bacterialspecies and therefore appear in the gene catalogue at a frequencyabove that of the functions present in only some of the gut bacterialspecies. The relative frequency of different functions can be deducedfrom the number of genes recruited to different eggNOG clusters,after normalization for gene length and copy number (Supplemen-tary Fig. 10a, b). We ranked all the clusters by gene frequencies anddetermined the range that included the clusters specifying well-known essential bacterial functions, such as those determined experi-mentally for a well-studied firmicute, Bacillus subtilis27, hypothe-sizing that additional clusters in this range are equally important.As expected, the range that included most of B. subtilis essentialclusters (86%) was at the very top of the ranking order (Fig. 5).Some 76% of the clusters with essential genes of Escherichia coli28

were within this range, confirming the validity of our approach.This suggests that 1,244 metagenomic clusters found within the range(Supplementary Table 10; termed ‘range clusters’ hereafter) specifyfunctions important for life in the gut.

We found two types of functions among the range clusters: thoserequired in all bacteria (housekeeping) and those potentially specificfor the gut. Among many examples of the first category are thefunctions that are part of main metabolic pathways (for example,central carbon metabolism, amino acid synthesis), and importantprotein complexes (RNA and DNA polymerase, ATP synthase, generalsecretory apparatus). Not surprisingly, projection of the range clusterson the KEGG metabolic pathways gives a highly integrated picture ofthe global gut cell metabolism (Fig. 6a).

The putative gut-specific functions include those involved in adhe-sion to the host proteins (collagen, fibrinogen, fibronectin) or inharvesting sugars of the globoseries glycolipids, which are carriedon blood and epithelial cells. Furthermore, 15% of range clustersencode functions that are present in ,10% of the eggNOG genomes(see Supplementary Fig. 11) and are largely (74.3%) not defined(Fig. 6b). Detailed studies of these should lead to a deeper compre-hension of bacterial life in the gut.

To identify the functions encoded by the minimal gut metagenome,we computed the orthologous groups that are shared by individuals ofour cohort. This minimal set, of 6,313 functions, is much larger than theone estimated in a previous study8. There are only 2,069 functionallyannotated orthologous groups, showing that they gravely underesti-mate the true size of the common functional complement among indi-viduals (Fig. 6c). The minimal gut metagenome includes a considerablefraction of functions (,45%) that are present in ,10% of thesequenced bacterial genomes (Fig. 6c, inset). These otherwise rare func-tionalities that are found in each of the 124 individuals may be necessaryfor the gut ecosystem. Eighty per cent of these orthologous groupscontain genes with at best poorly characterized function, underscoringour limited knowledge of gut functioning.

Of the known fraction, about 5% codes for (pro)phage-relatedproteins, implying a universal presence and possible important eco-logical role of bacteriophages in gut homeostasis. The most strikingsecondary metabolism that seems crucial for the minimal metage-nome relates, not unexpectedly, to biodegradation of complex sugarsand glycans harvested from the host diet and/or intestinal lining.Examples include degradation and uptake pathways for pectin(and its monomer, rhamnose) and sorbitol, sugars which are omni-present in fruits and vegetables, but which are not or poorly absorbedby humans. As some gut microorganisms were found to degrade bothof them29,30, this capacity seems to be selected for by the gut ecosystemas a non-competitive source of energy. Besides these, capacity toferment, for example, mannose, fructose, cellulose and sucrose is alsopart of the minimal metagenome. Together, these emphasize the

40

30

20

10

0

Clu

ster

(%)

1 2,001 4,001 6,001 8,001 10,001Cluster rank

Range

Figure 5 | Clusters that contain the B. subtilis essential genes. The clusterswere ranked by the number of genes they contain, normalized by averagelength and copy number (see Supplementary Fig. 10), and the proportion ofclusters with the essential B. subtilis genes was determined for successivegroups of 100 clusters. Range indicates the part of the cluster distributionthat contains 86% of the B. subtilis essential genes.

• •

• •

••

••

• •

• •

••

••

Healthy

Crohn’s disease

Ulcerative colitis

P value: 0.031

PC2

PC1

Figure 4 | Bacterial species abundance differentiates IBD patients andhealthy individuals. Principal component analysis with health status asinstrumental variables, based on the abundance of 155 species with $1%genome coverage by the Illumina reads in at least 1 individual of the cohort,was carried out with 14 healthy individuals and 25 IBD patients (21 ulcerativecolitis and 4 Crohn’s disease) from Spain (Supplementary Table 1). Two firstcomponents (PC1 and PC2) were plotted and represented 7.3% of wholeinertia. Individuals (represented by points) were clustered and centre ofgravity computed for each class; P-value of the link between health status andspecies abundance was assessed using a Monte-Carlo test (999 replicates).

ARTICLES NATURE | Vol 464 | 4 March 2010

62Macmillan Publishers Limited. All rights reserved©2010

37Friday, March 15, 13

Page 38: Microbiomes and DNA based studies of microbial diversity - talk by Jonathan Eisen at Singularity University

Woese Tree of Life

adapted from Baldauf, et al., in Assembling the Tree of Life, 2004

??????

38Friday, March 15, 13