metagenomics. what is metagenomics cloning genes from the environment, screening for function 16s...

53
Metagenomics

Upload: oswin-neal

Post on 21-Jan-2016

236 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Metagenomics

Page 2: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

What is metagenomics

Cloning genes from the environment, screening for function

16S sequencing

Random community genomics

Eukaryotic metagenomics

Page 3: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Screening from the environment

Random fragments of DNA Clone into a vector

Low copy vectors BACs YACs

Page 4: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

BACs

Science Creative Quarterly

Page 5: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Screening from the environment

Random fragments of DNA Clone into a vector

Low copy vectors BACs YACs

Screen for a phenotype e.g. Diversa patents > 1,000 amylase

genesWhy did Diversa sequence whale-falls?

Page 6: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Screening from the environment

Expression host? Pathway or single gene? Get what you select

But remember …

A selection is worth a thousand screens

Page 7: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

16S sequencing

Catalogs the bacteria that are present

PCR amplify the 16S gene with standard primers

Sequence the primers

Compare to known databases

Page 8: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Prokaryotic ribosome:

Large subunit:50S

5S and 23S rRNA

Small subunit:

30S

16S rRNA

Ribosomes

Ribosomes are made of proteins and RNA

Page 9: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Blue: proteinOrange: rRNA

30S Thermus aquaticus subunit

Page 10: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

E. coli16S rRNA secondary structure

Highly conserved Base pairs = stems No pairing = loops

Page 11: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Variable regions inthe 16S rRNA. Vn – 9 regionsforward/rev primers

V1

V2

V3

V4

V5V6

V7

V8

V9

E. coli16S rRNA secondary structure

Page 12: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

16S Primers

27F – 1492R full length

967F – 1046R V6 region

1380F – 1510R V9 region

1,465 base pairs

130 base pairs

79 base pairs

Page 13: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Variable regions = Variable results!

V1-V3V1-V3

V3-V5V3-V5V6-V9V6-V9

Page 14: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

16S databases Greengenes

http://greengenes.lbl.gov/ Gary Andersen, Lawrence Berkeley National Laboratory

SILVA – ARB http://www.arb-silva.de/ Frank Oliver Glöckner, MPI, Bremen, Germany

VAMPS http://vamps.mbl.edu/ Mitch Sogin, Woods Hole, USA

Ribosomal Database Project (RDP) http://rdp.cme.msu.edu/ James Cole, Michigan State University, USA

Page 15: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

16S sequencing

Cheap Easy Portable

PCR bias Variable regions give

variable answers Only tells you which

organisms are present & abundance

Does not explain much of the variance of the data

What does 16S sequencing actually tell you?

Page 16: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

What does 16S sequencing tell you?

Page 17: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

What does 16S sequencing tell you?

Page 18: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

What is metagenomics

Cloning genes from the environment, screening for function

16S sequencing

Random community genomics

Eukaryotic metagenomics

Page 19: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

16S sequencing is not good for functions

Page 20: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

How much of the data?

Findley et al, Nature 2013doi: 10.1038/nature12171

Topography of [fungi and] bacteria on the skin

Study = 5,000 taxa14 skin sites10 people3 skin types

5,000 variables

They don't explain the meaning of j-q

The remainder of the variance (85.1%) is explained by a few taxa

each

Each dimension only adds marginal information

Page 21: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

How much of the data?

Nine biomes paper

Dinsdale et al., Nature 2008doi:10.1038/nature06810

Variance:1,040,665 reads total(from 45 samples)30 subsystems9 biomes

30 variables

Fewer of the variables explainmore of the data

The variables are distinctive for each environment

Page 22: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Shotgun sequencing (HiSeq)

Movies courtesy Will Trimble, Argonne National Labshttp://www.mcs.anl.gov/~trimble/flowcell/

Page 23: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

16S sequencing (MiSeq)

Movies courtesy Will Trimble, Argonne National Labshttp://www.mcs.anl.gov/~trimble/flowcell/

Page 24: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Shotgun + 16S (HiSeq)

Movies courtesy Will Trimble, Argonne National Labshttp://www.mcs.anl.gov/~trimble/flowcell/

Page 25: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

There is no 16S for viruses

Rohwer and Edwards, 2002. The phage proteomic tree.

doi: 10.1128/JB.184.16.4529-4535.2002

Page 26: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

200 liters water 5-500 g fresh fecal matter

DNA/RNA LASL

Sequence

Epifluorescent Microscopy

Extract nucleic acids

Concentrate and purify viruses or bacteria

Random community genomics

Page 27: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Extract DNA Soil extraction kit Water extraction kit

Create library LASLs fosmids

Sequence fragments

How do you sequence the environment?

Page 28: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Hydroshear

Blunt-ending

Addition of Linkers

Amplification of Fragments

HydroshearBlunt-ending

Addition of LinkersAmplification of Fragments

This method produces high coverage libraries of over 1 million clones from as little as 1 ng DNA

Soil Extraction Kit

David Mead -

Breitbart (2002) PNAS

Linker-Amplified Shotgun Libraries (LASLs)

Page 29: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

• http://phage.sdsu.edu/~rob/cgi-bin/remoteblast.cgi

• Submit BLAST to local and remote databases– Local (as fast as possible)– NCBI (one search every 3 seconds)

• Many concurrent searches– One search versus 1,000 searches

• Parse data into tables for Excel– Access to taxonomy etc

Early Attempts at a Metagenomics Platform

Page 30: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

More bacteria than somatic cells by at least an order of magnitude

More phages than bacteria by an order of magnitude

Sample the bacteria in the intestine by sampling their phage

Human-associated viruses

Page 31: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Known40%

Unknown60%

Breitbart (2003) J. Bacteriol.

Phages94%

Eukaryotic Viruses 6%

Most Viral DNA Sequences in Adult Human Feces are Unknown Phages

Page 32: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Abundance of viruses in twins

Reyes et al,Nature 2010

Page 33: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Microbial samples in gutsdon't change very much

Reyes et al,Nature 2010

Abundance of viruses in twins

Page 34: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Phage samplesin guts changea lot

Reyes et al,Nature 2010

Abundance of viruses in twins

Page 35: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Microbial Phage

Reyes et al,Nature 2010

Abundance of viruses in twins

Page 36: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Known92%

Unknown8%

Pepper Mild

Mottle Virus65%

Other Plant

Viruses9%

Other26%

Zhang (2006) PLoS Biology

Most Human RNA Viruses are Known

Page 37: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

ssRNA virus; ≈6 kb genome Related to Tobacco Mosaic Virus Infects members of Capsicum family Widely distributed – spread through seeds Fruits are small, malformed, mottled Rod-shaped virions

TOBACCO MOSAIC VIRUS http://www.rothamsted.bbsrc.ac.uk/ppi/links/pplinks/virusems/

Viral particles in fecal sample

Pepper Mild Mottle Virus (PMMV)

Page 38: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

S1

S2

S3

S4

S5

S6

S7

S8

S9

PMMV

Fecal samples

Extract total RNA

RT-PCR for PMMV

San Diego : 78% people are positiveSingapore : 67% people are positive

10-50 fold increase in feces compared to food106-109 PMMV copies per gram dry weight of feces

PMMV is common in Human Feces

Page 39: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

India

n

curr

yPork

noodle

red

chili

Chic

ken

rice Chin

ese

fo

od

Hong K

ong c

hili

sauce

Hong K

ong

gre

en c

hili

Vegeta

rian

chili

Chili powder

Chili sauces

NOT FOUND IN FRESH PEPPERS

Which Foods Contain PMMV?

Page 40: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Rosario et al. AEM (2009)

PMMV is Present at High Concentrations in Raw Sewage and Treated Wastewater

Page 41: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

0.1

Lib3 Contig[0064]

Lib2 Contig[0070]

AB084456.1

AB062049.1

AB062051.1

AF103778

AY632863.1

AB119482.1

AJ429088.1

AB062054.1

CoatProtein

AB069853.1

AB062052.1

AB000709.2

M87827.1

AJ429087.1

AF525080.1Lib2_2217

Lib3_Contig[0494]Lib3_Contig[1213]

Lib2_Contig[0458]

Lib2_Contig[1099]

Lib3_65

Lib3_Contig[0273]Lib3_Contig[0078]

Lib3_Contig[0863]

AJ308228.1

AB062053.1

AJ429089.1

X72587.1

Lib2_1377

Lib2_2914

Lib1_2299

Lib3_928

Lib2_1656

Lib2_2549

Lib3_462

Lib2_492Lib3_Contig[0655]

Lib2_133

Lib1_Contig[0253]

Lib1_Contig[0123]

Lib1_Contig[0279]

Lib1_Contig[0107]Lib1_Contig[0052]Lib1_Contig[0004] Lib2_Contig[0995]

Lib1_Contig[0009]

Lib1_Contig[0166]

Lib1_Contig[0657]

Lib1_1449

Lib1_2211

Lib1_Contig[0029]

Lib1_1733

Lib1_Contig[0076]

Lib1_1168

Lib1_Contig[0261]

Lib1_2361

Lib2 1468

Lib2 Contig[0031]

Lib2 Contig[1202]

Lib1_Contig[0005]

Lib1_Contig[0558]

AF103776.1

AB062050.1

I

II

III

IV

V

0.1

Lib3 Contig[0064]

Lib2 Contig[0070]

AB084456.1

AB062049.1

AB062051.1

AF103778

AY632863.1

AB119482.1

AJ429088.1

AB062054.1

CoatProtein

AB069853.1

AB062052.1

AB000709.2

M87827.1

AJ429087.1

AF525080.1Lib2_2217

Lib3_Contig[0494]Lib3_Contig[1213]

Lib2_Contig[0458]

Lib2_Contig[1099]

Lib3_65

Lib3_Contig[0273]Lib3_Contig[0078]

Lib3_Contig[0863]

AJ308228.1

AB062053.1

AJ429089.1

X72587.1

Lib2_1377

Lib2_2914

Lib1_2299

Lib3_928

Lib2_1656

Lib2_2549

Lib3_462

Lib2_492Lib3_Contig[0655]

Lib2_133

Lib1_Contig[0253]

Lib1_Contig[0123]

Lib1_Contig[0279]

Lib1_Contig[0107]Lib1_Contig[0052]Lib1_Contig[0004] Lib2_Contig[0995]

Lib1_Contig[0009]

Lib1_Contig[0166]

Lib1_Contig[0657]

Lib1_1449

Lib1_2211

Lib1_Contig[0029]

Lib1_1733

Lib1_Contig[0076]

Lib1_1168

Lib1_Contig[0261]

Lib1_2361

Lib2 1468

Lib2 Contig[0031]

Lib2 Contig[1202]

Lib1_Contig[0005]

Lib1_Contig[0558]

AF103776.1

AB062050.1

I

II

III

IV

V

Library 1

Library 2

Library 3

Same person 6 months

apart

• Diverse populations

• Differences between individuals and over time

Different PMMV families

Page 42: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Infected leaf Control

Fecal sample

Total RNA

PMMV RT-PCR

Viral concentrate

Plant leaf inoculation

• Spread of infection to Hungarian wax pepper evident within 1 week

• Infected leaf was positive by RT-PCR for PMMV

• Animals may serve as vectors for plant viruses

Human-fecal borne PMMV can infect plants

Page 43: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

T

he

su

nm

ac

hin

e.n

et

http://www.sweatnspice.com

Koch’s Postulates

Page 44: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Random community genomics

Page 45: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Eukaryotic metagenomics ITS sequences

Internal transcribed spacer regions http://www.ncbi.nlm.nih.gov/pmc/articles/

PMC4113289/

Individual genes Cox1

Exome sequencing Pull out ESTs and sequence

Page 46: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

What is there?

How many are there?

What are they doing?

Experimental manipulations

Diagnostics

Why Metagenomics?

Page 47: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Sequencing costs decreasing

http://genome.gov/sequencingcosts

Page 48: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Firstbacterial genome

100bacterial genomes

1,000bacterial genomes

Num

ber

of

know

n s

equence

s

Year

Environmentalsequencing

How much has been sequenced?

Page 49: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

One genome fromevery species

Most majormicrobial environments

Year

How much will be sequenced?

Page 50: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Most pipelines work the same way!

Page 51: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Metagenomics ProcessingBinning reads

Contamination removal

Contig Clustering

Functional Assignments

Gene

Pred

ictio

n

Mer

ge p

aire

d-en

d re

ads

Preprocessing

Taxonomic assignments

Page 52: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Metagenomics Quality control –

Prinseq Deconseq Annotation

FOCUS Real time

metagenomics mg-rast Super FOCUS

Statistics STAMP

Population genomes crAss metabat ContigClustering

Page 53: Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Metagenomics Processing

AbundanceBinCompostBinconcoctcrAsstetra

Contig clustering FragGeneScan

GlimmerMGMetaGeneAnnotatorMetaGeneMarkMetaGunOrpheliaProdigal

Gene PredictionFASTQC

FastX ToolkitfitGCPNGS QC ToolkitNon-pareilPrinseqQC-ChainStreaming Trim

Preprocessing

CARMA myTaxaFOCUS PhylopythiaSKRAKEN phymmblLMAT RAIphyMEGAN TACOAMetaplan Taxy

Taxonomic assignment CLAMS Sequedex

DiScRIBinATE SORT-ITEMSgenometa SPANNERGSMer SPHINXPPLACER TaxSOMRTMg Treephyler

Functional assignment