computational metagenomics and the human microbiome

24
Computational metagenomics and the human microbiome Curtis Huttenhower 01-21-11 rvard School of Public Health partment of Biostatistics

Upload: pearly

Post on 24-Feb-2016

115 views

Category:

Documents


0 download

DESCRIPTION

Computational metagenomics and the human microbiome. Curtis Huttenhower 01-21-11. Harvard School of Public Health Department of Biostatistics. What to do with your metagenome?. Reservoir of gene and protein functional information. Comprehensive snapshot of microbial ecology and evolution. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational metagenomics and the human microbiome

Computational metagenomics andthe human microbiome

Curtis Huttenhower

01-21-11Harvard School of Public HealthDepartment of Biostatistics

Page 2: Computational metagenomics and the human microbiome

2

What to do with your metagenome?

(x1010)

Diagnostic or prognostic

biomarker for host disease

Public health tool monitoring

population health and interactions

Comprehensive snapshot of

microbial ecology and evolution

Reservoir of gene and protein

functional informationWho’s there?

What are they doing?

What do functional genomic data tell us about microbiomes?

What can our microbiomes tell us about us?*

*Using terabases of sequence and thousands of experimental results

Page 3: Computational metagenomics and the human microbiome

3

The Human Microbiome Project

2007 - ongoing

• 300 “normal” adults, 18-40

• 16S rDNA + WGS• 5 sites/18 samples +

blood• Oral cavity: saliva, tongue,

palate, buccal mucosa, gingiva,

tonsils, throat, teeth• Skin: ears, inner elbows• Nasal cavity• Gut: stool• Vagina: introitus, mid, fornix

• Reference genomes (~200+800)

All healthy subjects; followup projects in psoriasis, Crohn’s,

colitis, obesity, acne, cancer, antibiotic

resistant infection…

Hamady, 2009

Kolenbrander, 2010

Page 4: Computational metagenomics and the human microbiome

4

HMP Organisms: Everyone andeverywhere is different

← Body sites + individuals →

← O

rgan

ism

s (ta

xa) →

ear gut nose mouth vaginaarmmucosa palate gingiva tonsils saliva sub. plaq. sup. plaq. throat tongue

Every microbiome is surprisingly different

Most organisms are rare in most places

Even common organisms vary tremendously in abundance

among individuals

Aerobicity, interaction with the immune system, and

extracellular medium appear to be major determinants

There are few organismal biotypes

in health

Page 5: Computational metagenomics and the human microbiome

5

HUMAnN: Community metabolic and functionalreconstruction

WGS reads

Pathways/modules

Genes(KOs)

Pathways(KEGGs)

Functional seq.KEGG + MetaCYC

CAZy, TCDB,VFDB, MEROPS…

BLAST → Genes

rra

r

raa

p

gap

ggc

)(

)(

1

)()1(

||1)(

Genes → PathwaysMinPath (Ye 2009)

SmoothingWitten-Bell

otherwiseTNNgcgcTNTVTN

gc)/()(

0)()/()/()(Gap filling

c(g) = max( c(g), median )

300 subjects1-3 visits/subject~6 body sites/visit

10-200M reads/sample100bp reads

BLAST

?Taxonomic limitation

Rem. paths in taxa < ave.

XipeDistinguish zero/low

(Rodriguez-Mueller in review)

HMPUnifiedMetabolicAnalysisNetwork

Page 6: Computational metagenomics and the human microbiome

6

HUMAnN: Community metabolic and functionalreconstruction

Pathway coverage Pathway abundance

Page 7: Computational metagenomics and the human microbiome

7

HUMAnN: Validating gene and pathwayabundances on synthetic data

Validated on individual genes, module coverage + abundance

• False negatives: short genes (<100bp),

taxonomically rare pathways • False positives: large and multicopy

(not many in bacteria)

Page 8: Computational metagenomics and the human microbiome

8

HUMAnN: The steps that didn’t make the cut

Abundance

Coverage

Page 9: Computational metagenomics and the human microbiome

9

Functional modules in 741 HMP samples

Coverage

Abundance

ANO(BM)PF O(SP)S RCO(TD)← Samples →

← P

athw

ays→

• Zero microbes (of ~1,000)

are core among body sites• Zero microbes are core

among individuals• 19 (of ~220) pathways are

present in every sample• 53 pathways are present in

90%+ samples

• Only 31 (of 1,110) pathways

are present/absent from

exactly one body site• 263 pathways are

differentially abundant in

exactly one body site

Page 10: Computational metagenomics and the human microbiome

10

Microbial environment trumpshost environment (in health)

HMP stool, colored by BMI MetaHIT stool, colored by IBD

← M

icro

bes→

← P

athw

ays→

Aerobic body sites

Gastrointestinal body sites

Pathways in all body sites (“core”) • Human microbiomestructure dictated

primarilyby microbial niche,

nothost (in health)

• Huge variation in who’s

there; small variation in

what they’re doing• Note: definitely variation in

how these functions are

implemented• Does not yet speak to

environment (diet!),genetics, or disease

Page 11: Computational metagenomics and the human microbiome

11

GeneexpressionSNPgenotypes

Metagenomic biomarker discovery

Healthy/IBDBMIDiet

Taxa &pathways

Batch effects?Populationstructure?

Niches &Phylogeny

Test for correlates

Multiplehypothesiscorrection

Featureselection

p >> n

Confounds/stratification/environment

Cross-validate

Biological story?

Independent sample

Intervention/perturbation

Page 12: Computational metagenomics and the human microbiome

12

LEfSe: Metagenomic classcomparison and explanation

LEfSe

http://huttenhower.sph.harvard.edu/lefse

Nicola Segata

LDA +Effect Size

Page 13: Computational metagenomics and the human microbiome

13

LEfSe: Evaluation on synthetic data

Page 14: Computational metagenomics and the human microbiome

14

Microbes characteristic of theoral and gut microbiota

Page 15: Computational metagenomics and the human microbiome

Aerobic, microaerobic and anaerobic communities

• High oxygen:skin, nasal• Mid oxygen:vaginal, oral• Low oxygen:gut

Page 16: Computational metagenomics and the human microbiome

16

LEfSe: The TRUC murine colitis microbiotaWith Wendy Garrett

Page 17: Computational metagenomics and the human microbiome

17

MetaHIT: The gut microbiome and IBD

WGS reads

Pathways/modules

124 subjects: 99 healthy21 UC + 4 CD

ReBLASTed against KEGG since published data

obfuscates read counts

Taxa

PhymmBrady 2009

Genes(KOs)

Pathways(KEGGs)

Qin 2010

With Ramnik Xavier, Joshua Korzenik

Page 18: Computational metagenomics and the human microbiome

18

MetaHIT: Taxonomic CD biomarkers

Firmicutes

Enterobacteriaceae

Up in CDDown in CD

UC

Page 19: Computational metagenomics and the human microbiome

19

MetaHIT: Functional CD biomarkers

Motility Transporters Sugar metabolism

Down in CD

Up in CD

Subset of enriched modules in CD patientsSubset of enriched pathways in CD patients

Growth/replication

Page 20: Computational metagenomics and the human microbiome

20

• Sleipnir C++ library for computational functional genomics

• Data types for biological entities• Microarray data, interaction data, genes and gene sets,

functional catalogs, etc. etc.• Network communication, parallelization

• Efficient machine learning algorithms• Generative (Bayesian) and discriminative (SVM)

• And it’s fully documented!

Sleipnir: Software forscalable functional genomics

Massive datasets require efficientalgorithms and implementations.

It’s also speedy: microbial data integration

computationtakes <3hrs.

http://huttenhower.sph.harvard.edu/sleipnirhttp://huttenhower.sph.harvard.edu/lefsehttp://huttenhower.sph.harvard.edu/humann

Page 21: Computational metagenomics and the human microbiome

21

Thanks!

Jacques IzardWendy Garrett

Pinaki SarderNicola Segata

Levi Waldron LarisaMiropolsky

Interested? We’re recruiting students and postdocs!

Human Microbiome Project

HMP Metabolic Reconstruction

George WeinstockJennifer WortmanOwen WhiteMakedonka MitrevaErica SodergrenVivien Bonazzi Jane PetersonLita Proctor

Sahar AbubuckerYuzhen Ye

Beltran Rodriguez-MuellerJeremy ZuckerQiandong Zeng

Mathangi ThiagarajanBrandi Cantarel

Maria RiveraBarbara Methe

Bill KlimkeDaniel Haft

Ramnik Xavier Dirk Gevers

Bruce Birren Mark DalyDoyle Ward Eric AlmAshlee Earl Lisa Cosimi

Sarah Fortune

http://huttenhower.sph.harvard.edu/

Page 22: Computational metagenomics and the human microbiome
Page 23: Computational metagenomics and the human microbiome

23

The LEfSe algorithm

Statisticalconsistency

Biologicalconsistency

Overalleffect size

Page 24: Computational metagenomics and the human microbiome

24

HMP: Metabolism, host-microbiome interactions, and microbial taxa

>3200 gene families differential in the

mucosa

>1500 upregulated outsidethe mucosa and not in any

Actinobacterial genome

16S

WGS