robustness, reproducibility & ecological consistency in the demarcation of operational taxonomic...

19
Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units Sebastian Schmidt Institute for Molecular Life Sciences University of Zürich [email protected]

Upload: tsbschm

Post on 10-Jul-2015

123 views

Category:

Science


8 download

TRANSCRIPT

Page 1: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

Robustness, Reproducibility!& Ecological Consistency!

in the Demarcation of Operational Taxonomic Units

Sebastian Schmidt!Institute for Molecular Life Sciences!

University of Zü[email protected]

Page 2: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

ISME15, Seoul, 2014/08/29 [email protected]

A general workflow in (targeted) metagenomics

Page 3: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

Jean Tinguely, “Heureka”!Lake Zürich

ISME15, Seoul, 2014/08/29

Sampling &!Sequencing “Making OTUs”

Understanding!your data!(hopefully)

[email protected]

A general workflow in (targeted) metagenomics

Page 4: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

Concepts

ISME15, Seoul, 2014/08/29 [email protected]

replicability!!robustness!!reproducibility!!ecological consistency

Page 5: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

Concepts

ISME15, Seoul, 2014/08/29 [email protected]

replicability!!robustness!!reproducibility!!ecological consistency

42!

Life, the Universe and Everything?

42!

Life, the Universe and Everything?

Page 6: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

Concepts

ISME15, Seoul, 2014/08/29 [email protected]

replicability!!robustness!!reproducibility!!ecological consistency

42!

Life, the Universe and Everything?

42!

Life, Microbial Ecology and Everything?

Page 7: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

Concepts

ISME15, Seoul, 2014/08/29 [email protected]

replicability!!robustness!!reproducibility!!ecological consistency

42!

Life, the Universe and Everything?

Life, the Universe and Everything?

42!

Page 8: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

Grice et al, Science, 2009

The Human Skin Microbiome (HSM) dataset:!!~115,000 full-length 16S sequences!!sampled from 21 distinct body sites!

!clustered to 97% sequence identity

ISME15, Seoul, 2014/08/29 [email protected]

Page 9: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

OTU

Aall methods

agree (almost)perfectly5,423 SEQ.

SMAL

L OTU

S

õ����4EQPER OTU

methods providedifferent # of “small”

OTUsõ����TFR�QFS�056

OTU

D 2,692 SEQ.

TQMJUUJOHby Uclust

OTU

C

� ����4EQ.TQMJUUJOHby CL

OTU

B

8,465 SEQ.

MVNQJOHby SL

UPARSE

� ��� OTUS

UCLUST

3,282 OTUS

CD-HIT

� ��� OTUS

SINGLE LINKAGE

� ��� OTUS

COMPLETE LINKAGE

� ��� OTUS

AVERAGE LINKAGE

� ��� OTUS

ISME15, Seoul, 2014/08/29 Schmidt et al, Environ Microbiol, in press

Page 10: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

0.682-0.051

0.932-0.095

0.920-0.075

0.9880.3870.9690.3000.9910.150

0.5760.116

0.981-0.008

0.7940.079

0.991-0.2990.858-0.2610.966-0.136

0.545-0.131

0.928-0.060

0.772-0.099

0.9860.5220.7730.4630.9530.216

0.5510.167

0.9220.087

0.7490.154

0.973-0.6860.817-0.5610.949-0.286

0.358-0.207

0.513-0.358

0.9840.204

0.672-0.163

0.5840.7800.6650.350

0.802-0.194

0.9180.128

0.8051.511

0.855-0.181

0.9480.3900.9120.427

0.472-0.325

0.9530.064

0.7852.033

0.694-0.280

0.6680.8530.7990.642

0.643-0.158

0.9200.151

0.8811.347

0.884-0.126

0.9220.2920.9050.356

0.791-0.209

0.981-0.056

0.8381.734

0.862-0.201

0.9450.5920.9120.506

0.614-0.091

0.482-0.366

0.984-0.095

0.764-0.084

0.6130.5180.6080.214

0.7620.036

0.9930.027

0.977-0.164

0.9450.055

0.989-0.0980.998-0.071

0.464-0.040

0.558-0.271

0.978-0.482

0.759-0.009

0.5840.2190.5740.063

0.630-0.076

0.552-0.298

0.972-0.318

0.793-0.064

0.5690.3170.5700.134

0.5200.118

0.436-0.422

0.837-1.829

0.6170.117

0.559-0.0730.434-0.292

0.886-0.015

0.937-0.068

0.9930.224

0.957-0.020

0.9740.2020.9950.079

CHAO1

INV SIMPSON

SHANNON

SØRENSEN

JABD

MORISITA-HORN

CHAO1

INV SIMPSON

SHANNON

SØRENSEN

JABD

MORISITA-HORN

CHAO1

INV SIMPSON

SHANNON

SØRENSEN

JABD

MORISITA-HORN

CHAO1

INV SIMPSON

SHANNON

SØRENSEN

JABD

MORISITA-HORN

CHAO1

INV SIMPSON

SHANNON

SØRENSEN

JABD

MORISITA-HORN

CHAO1

INV SIMPSON

SHANNON

SØRENSEN

JABD

MORISITA-HORN

CD-HIT UCLUST UPARSECOMPLETE LINKAGE SINGLE LINKAGEAVERAGE LINKAGE

AL

CL

SL

CD

-HIT

UC

LUS

TU

PA

RS

E

B

significanceof mean shift

red: shift towards higher values

blue: shift towards lower values

0.5510.167

0.9220.087

0.7490.154

0.973-0.6860.817-0.5610.949-0.286

PEARSON CORRELATIONRELATIVE SHIFT (LOG2)

RELATIVE SHIFT (LOG2)PEARSON CORRELATION

PEARSON CORRELATIONRELATIVE SHIFT (LOG2)

Q�ö�����

Q�ö�����

Q�������

Q������� Q�������

Q������� Q��������

Q��������

ISME15, Seoul, 2014/08/29Schmidt et al, Environ Microbiol, in press!

(data from Grice et al, Science, 2009)

Page 11: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

0.8 0.9 1.00.6 0.70.5

90

95

100

90

95

10090

95

10090

95

10090

95

100

90

95

100

90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100

AVERAGE LINKAGE

AVER

AGE L

INKA

GE

COM

PLET

E LIN

KAGE

SING

LE L

INKA

GE

UCLU

STCD

-HIT

COMPLETE LINKAGE SINGLE LINKAGE UCLUST UPARSECD-HIT

UPAR

SE

ADJUSTEDMUTUAL INF

A ‘global’ 16S dataset!~1.1M full-length sequences!≥30k samples, diverse environments!!Adjusted Mutual Information (AMI), a measure of partition similarity!!high replicability!…when clustering twice to the exact same threshold!!

differential robustness!…to slight threshold changes

Schmidt et al, Environ Microbiol,!in press

Page 12: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

0.8 0.9 1.00.6 0.70.5

90

95

100

90

95

10090

95

10090

95

10090

95

100

90

95

100

90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100

AVERAGE LINKAGE

AVER

AGE L

INKA

GE

COM

PLET

E LIN

KAGE

SING

LE L

INKA

GE

UCLU

STCD

-HIT

COMPLETE LINKAGE SINGLE LINKAGE UCLUST UPARSECD-HIT

UPAR

SE

ADJUSTEDMUTUAL INF

A ‘global’ 16S dataset!~1.1M full-length sequences!≥30k samples, diverse environments!!Adjusted Mutual Information (AMI), a measure of partition similarity!!high replicability!…when clustering twice to the exact same threshold!!

differential robustness!…to slight threshold changes!

!differential reproducibility!pairwise similarity maxima between methods off-diagonal!comparability of results across studies?

Schmidt et al, Environ Microbiol,!in press

Page 13: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

0.8 0.9 1.00.6 0.70.5

90

95

100

90

95

10090

95

10090

95

10090

95

100

90

95

100

90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100

AVERAGE LINKAGE

AVER

AGE L

INKA

GE

COM

PLET

E LIN

KAGE

SING

LE L

INKA

GE

UCLU

STCD

-HIT

COMPLETE LINKAGE SINGLE LINKAGE UCLUST UPARSECD-HIT

UPAR

SE

ADJUSTEDMUTUAL INF

“Greengenes 97”!vs.!

“SILVA 99”!AMI ~ 0.65

Schmidt et al, Environ Microbiol,!in press

A ‘global’ 16S dataset!~1.1M full-length sequences!≥30k samples, diverse environments!!Adjusted Mutual Information (AMI), a measure of partition similarity!!high replicability!…when clustering twice to the exact same threshold!!

differential robustness!…to slight threshold changes!

!differential reproducibility!pairwise similarity maxima between methods off-diagonal!comparability of results across studies?

Page 14: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

0.8 0.9 1.00.6 0.70.5

90

95

100

90

95

10090

95

10090

95

10090

95

100

90

95

100

90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100

AVERAGE LINKAGE

AVER

AGE L

INKA

GE

COM

PLET

E LIN

KAGE

SING

LE L

INKA

GE

UCLU

STCD

-HIT

COMPLETE LINKAGE SINGLE LINKAGE UCLUST UPARSECD-HIT

UPAR

SE

ADJUSTEDMUTUAL INF

A ~1.1M ≥environments!Adjusted Mutual Information (AMI)measure of partition similarity!!high …the exact same threshold!!

differential …to slight threshold changes!

!differential pairwise similarity maxima between comparability of results across studies?

Schmidt et al, Environ Microbiol,!in press

But which method makes the ‘best’ O

TUs?

Page 15: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

‘Good’ OTUs should correspond to ‘true’ bacterial lineages (‘species’)!they should comply with evolutionary theory of bacterial speciation!BUT: no unifying / commonly accepted bacterial species concept!

!!Two main criteria for theory-compliant OTUs!

phylogenetic consistency (represent monophyletic lineages)!ecological consistency (represent ecologically homogenous groups of organisms)

Gevers et al., Nat Rev Microbiol, 2005!Cohan, Philos T R Soc B, 2006!

Koeppel et al., PNAS, 2008!Hunt et al., Science, 2008!

Fraser et al., Science, 2009!Vos, Trends Microbiol, 2011!Koeppel & Wu, NAR, 2013!

Preheim et al, Appl Env Microbiol, 2013!!

[and many more…]

ISME15, Seoul, 2014/08/29 [email protected]

Page 16: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

daydeep

mat

high

cold

milksoildiversity

sediment

water

community marine

associated

acidplant

sludge

anaerobic

field

searhizosphere

lake

gut

spring

halophilic

culture

activity

rootsurface

productioncontaminated

thermophilic

wastewater

structure

degradation

degrading

seawater

producing

treatment

hydrothermal

oil

feces

hotbiofilm

waste activatedendophytic

nodule

deepseafreshwater

reactor

vent

enrichment

microbiota

growth

disease

pathogen

salt

patient

aerobic

coastal

mine

host

fermented

culturable

archaealhabitat actinomycete

respond

lactic

environmental

diverse

forest

regionclinical

symbiont

biodegradation

temperature

skin

moderately

antarctic

methanogenic

swab

revealzone

ocean

tract

infectionintestinalrum

en

natural

control

bioreactor

river

sponge

producedcarbon

blood

fluid

coral

mud

foodshift

highly

leaf

ice

organicrock

draft

dietoral

tree

solar

stream

iron

coast

wild

core

fed

low

grown

tidal

fecal

mineral

flat

compostsaline

symbiotic

content

saltern

pathogenic

alkaline

diseased

rhizobia

woundactive

intestine

traditional

sand hypersaline

subsurface

antimicrobial

fermentation

effluent

comb

condition

caused

product

sewage

treatingsulfatereducing

ecology

purification

station

hydrocarbon

nitrogen

coidentity

degrade

resistance endosymbiont

mangrove

metal

methane

polluted

acidic

antibiotic

oxidation

probiotic

cultured

cultivation

methanogen

processpesticide

revealed

tissue

agricultural

chemical

heterotrophic

biocontrol

alkaliphilicarchaeon

consortium

legume

denitrifying

indigenous

industrial

correlate

defense

cluster

heavy

reductiontolerantaquifer

extremely

reservoirwetland

diabetic

enriched

chloroplast

cultivated

cultureindependent

nitrogenfixing

prolonged

protease

basin

compound

halotolerant

mesophilicresistant

microbiom

e

removal

formation

laboratory

adult

anoxicpaddy

petroleum

termite

functional

aquatic

association

factory

fresh

antifungalkorean

terrestrial

involved

promoting

geothermal

bay

black

island

sulfur

drainage

farm

groundwater

hydrogen

ISME15, Seoul, 2014/08/29 [email protected]

Page 17: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

100000 10000 1000

NUMBER OF OTUS

6000

5500

5000

4500

4000

3500

3000

2500

2000

1500

1000

EC

OLO

GIC

AL

CO

NS

IST

EN

CY S

CO

RE (ECS

)

ACOMPLETE LINKAGE

UCLUST

CD-HIT

SINGLE LINKAGE

AVERAGE LINKAGE

97% NOMINAL SIMILARITY

ISME15, Seoul, 2014/08/29 Schmidt et al, PLOS Comp Biol, 2014

Page 18: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

100000 10000 1000

NUMBER OF OTUS

6000

5500

5000

4500

4000

3500

3000

2500

2000

1500

1000

EC

OLO

GIC

AL

CO

NS

IST

EN

CY S

CO

RE (ECS

)

ACOMPLETE LINKAGE

UCLUST

CD-HIT

SINGLE LINKAGE

AVERAGE LINKAGE

97% NOMINAL SIMILARITY

F

100000 10000 1000

5000

4000

3000

2000

1000

BACTERIA, HOST TAXONOMYE

100000 10000 1000

2500

2000

1500

1000

500

0

BACTERIA, SAMPLING SITESD

1000 10000 100000

2500

2000

1500

1000

500

BACTERIA, ENVO TERMS

C

10000 1000 100

400

300

200

100

EUKARYA, ECOLOGICAL TERMS

10000 1000 100

700

600

500

400

300

ARCHAEA, ECOLOGICAL TERMSB

ISME15, Seoul, 2014/08/29 Schmidt et al, PLOS Comp Biol, 2014

Page 19: Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

Conclusions

ISME15, Seoul, 2014/08/29 [email protected]

replicability!clustering was generally replicable!!

robustness!AL, CL & CD-HIT were highly robust to (slightly) changing thresholds, UCLUST, UPARSE & SL more sensitive!similar trends for robustness to clustering context and choice of subregion (not shown)!

!reproducibility!

surprisingly discordant partitions by different methods!similarity maxima generally off-diagonal!AL and CD-HIT most similar pair!implications for reference-based OTU-binning: choice of reference clustering determines quality!!

!ecological consistency!

CL provided most consistent OTU sets!implications for taxonomy and species definitions?