metagenomics - goseqit · metagenomics 2 metagenomics is the study of genetic material recovered...

35
Workshop on Whole Genome Sequencing and Analysis, 19-21 Sep. 2016 Metagenomics

Upload: others

Post on 26-May-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Workshop on Whole Genome Sequencing and Analysis, 19-21 Sep. 2016

Metagenomics

Page 2: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Metagenomics

2

Metagenomicsisthestudyofgeneticmaterialrecovereddirectlyfromenvironmentalsamples.

Ex:Soil(rainforrest)Fecalsamples(humangutmicrobiome)Water(deepocean)

Onlyasmallfractionofbacteriacanbeeasilyculturedinthelab(<5%).Metagenomicsgivesapictureofallspeciesfromanenvironment.

Page 3: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 4: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 5: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 6: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Hasman et al 2014. J. Clinic. Microbiol. 52(1). 139-146.

Page 7: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 8: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Traditional analysis

- Species identification

- MIC testing

Bioinformatics analysis

- KmerFinder: Species identification

- MG-RAST: Used to estimate the level of host contamination and the distribution of bacterial species

- Chainmapper (a predecessor of MGMapper). Uses mapping (BWA and Bowtie) for identifying species

- MLST/ResFinder

- SNPTree (a predecessor of CSIPhylogeny): Used for constructing phylogenetic trees

35

35

19

19

19

23

Page 9: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Results, species identification

Page 10: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

For samples that were unculturable, species could not be determined using conventional or WGS-based identification

Results, species identification

Page 11: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Results, species identification

Sequencing directly on clinical samples often resulted in identification of multiple species

Page 12: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Results, antimicrobial resistance

Direct sequencing did not miss any resistance genes, rather it led to an overestimation of the occurrence of resistance

Page 13: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 14: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Conclusions

• Decreasingtheoveralltimespendonanalysis

• Detectionofpathogens

• Bacterialtyping

• Detectionofantimicrobialresistancegenes

Directsequencingofa(simple)clinicalsamplecanbeusedfor

Thiswillbefurtherimprovedwhen

• BettermethodsforextractingDNAfromsampleswithlittleDNAexist

• Cut-offcriteriafordetectingpathogens/geneshavebeendetermined

Page 15: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

MGmapperAtooltomapMetaGenomicsdata

▪ AtooltomapFASTQfilesfrommetagenomicsamplesagainstoneormorereferencesequencedatabases

▪ Makeoutputthatis“fairly”easytounderstand

▪ Giveanoverviewofabundance,depthandcoverageinrelationtorefseq

Page 16: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Referencebasedmappingoffastqreads

Page 17: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

name_R1.fastq@HWUSI-EAS664L:24:64FGCAAXX:4:1:2853:1232 1:N:0:CTTGTACCTCGGACGATTGCCGNATAATTTCTGGGTACCACGATGCTTGTTTTCACCACAAGAATGAATGTTTTCGGCACATTTCTCCCCAGAGTGTTATAATTGCG+HHHHHGHHHHHHHHHD#BDDABCCAGHHHHHFEHHHHHHHHHHHHHHHHHHHHHGHBFGGGGGGGGHHEGDHCE<EEEBEDDDC7A-@7@?B=A?BEEBAE@HWUSI-EAS664L:24:64FGCAAXX:4:1:5315:1234 1:N:0:CTTGTACAGTGCCATCGTAATANTGAGTGCTGGCTCGAAGATGGAGAGCGTTAAGGCGATCCGATTTTGTTGGAGTGTCTCCTGGTTATCTGCGGCTCTGACCATTA+IIIIIIIIIIIIIIIF#FFEFAFFEIIIIIIEIIFCGG?EEGDGEIHHIIGHEEGGIEGIHGGACCEAEBFB@EEBBDE@B??>AB@AAA>>:@:==8=@@

name_R2.fastq@HWUSI-EAS664L:24:64FGCAAXX:4:1:2853:1232 2:N:0:CTTGTATCACTACCGTAATTTGAACCGGCAAGATAATGCCGAAGTTCTGTAAATAAGTAAAGATTTGCGCGCTAAATCGCAACAAACAGGTTCGGCACATTACTCCG+IIIIDIIIIIHHIIIIHIIIIFHFHIHIGIGHIII>DGBGGGGDFBCGDDFEDFFFBFFHDICHFBDDEBEFBHEGGGEEAGG<?@BBBB8BBB/?6?;86@HWUSI-EAS664L:24:64FGCAAXX:4:1:5315:1234 2:N:0:CTTGTACACTTTAAGTATTTTGCAATCCAGCGGCGTCCCTCTGCTGGATGGGATGAATTTGTCCACCGAAAGCCTCAACAACCTCGAACTTCGCCAGCGTCTGGCAA+IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIGIIGHHIIIIFIHHHIHIFIIHG@F@EFDE@EE8C8A>A;?1>@C8C?>?<?9<>:?<)?

In a FASTQ file, each sequence read covers four lines. • Line 1 begins with a ‘@‘ and is followed by a sequence identifier

• Line 2 is the raw sequence (ACTGN)

• Line3 begins with a ‘+‘ and is optionally followed by the same identifier as in the first line. Alternatively it is empty

• Line 4 encodes the quality values for the sequence in Line 2. Must have same number of symbols as Line 2.

FASTQ files

Page 18: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Adaptor removal/trimming

Identify paired reads

Reads don’t map to phiX

genome

Pre-processing of reads

AS=28 AS=45 AS=90

Bwamemmappingofallreadsagainstreferencedatabasesandremovebadhits

AS=45 AS=90

Filterhitsbasedonalignmentcriteria

AlignmentScore>=30

Bestmode:Re-arrangedatabasehitsandkeeponlythereadpairswiththebestsumofalignmentscores.fullmode:keepallhitsevenifpresentinseveraldatabases.

Bacteria pair1 forward AS=55 reverse AS=60Bacteria pair2 forward AS=90 reverse AS=100Human pair1 forward AS=60 reverse AS=60Fungi pair1 forward AS=50 reverse AS=55

Bacteria pair2 forward AS=90 reverse AS=100Human pair1 forward AS=60 reverse AS=60Fungi pair1 forward AS=50 reverse AS=55

FinaloutputAbundanceandreadcountstatistics,fastacontigsTaxonomyannotation,post-processing(confidence)

Page 19: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Properlypairedreads

RefSeq1

YES:InsertSizewithinupperandlowerboundariesdeterminedbybwamem

RefSeq2

RefSeq1

NO:Pairedbutmappedtodifferentrefsequenceentries

5´ 5´

|-----------------InsertSize------------------|

Page 20: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

MGmapperFindingthebesthit(bestmode)orfindingthemall(fullmode)

Bacteria Bacteria-draft Plasmid Virus ResFinder GreenGenes Silva … nt

ReferencesequencedatabasesBestmode:Areadpaircanmaptoonly1reference

sequenceineachoftheselectedreferencesequencedatabases.Fullmode:Allreadpairhitsarereported

55 60

Bacteria

60 60

Human

50 55

Fungi

AlignmentscoreforaReadpairisthesumofthealignmentscoresforeachread.

115

120

105

Page 21: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

FragmentationofDNAfromasamplegenomesizesdiffer

21

Genome1

Insertsafterfragmentation

Genome2

Insertsafterfragmentation

Manyreads(inserts)frombiggerDNApieces,fewerfromsmallgenomesorgenes

Page 22: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Abundance

22

Page 23: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

AbundanceHowmuchisthere?

23

Ifcatis4timesbiggerthanant,Whatisthemostabundantspecies?

Fastqreadsmappedtoaref.seqeunce

WhynormalizereadCountswithreferencesequencesize?

Page 24: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Numbers • Strainabundance(paired-end)

• Abundance(%)=100*readCount/size*2

• Strainabundance(Single-end)• Abundance(%)=100*readCount/size

• Abundancespecies(%)=ΣAbundancestrain• Covered_positions

• Numberofposistionsinarefseqthatareobservedat>=1X

• Coverage=covered_positions/size

• Depth=nucleotides/size

• ReadCountUniq=readswhereAS>XS,where• ASisthealignmentscoreand• XSissecondbesthit

Size=numberofbp’sinreferencesequence

Refseq

Page 25: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

25

MGmappersettings

Page 26: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 27: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 28: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 29: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 30: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 31: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human
Page 32: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

MGMapperoutput,continued

Page 33: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

MGMapperoutput,continued

Page 34: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

ContentofExcelfile

Page 35: Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

ResultsfrommappingtotheResFinderdatabase