machine learning and metagenome analysis€¦ · overview of metagenome analysis • what is...

26
Machine Learning and Metagenome Analysis Chris Fields’s slides presented by Amel Ghouila

Upload: others

Post on 26-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

MachineLearningandMetagenomeAnalysis

ChrisFields’sslidespresentedbyAmelGhouila

1

FASTQFILES

FASTQCQUALITYCONTROL

OFREADS

TRIMMINGFILTERINGBADQUALITYREADS

2 MAPPINGOFREADSTOA

REFERENCEGENOME

ASSEMBLY(DENOVO)RECONSTRUCTIONOF

AGENOME

3SAMFILES

BAMFILES4READDEPTH

VARIANTCALLING

STRUTURALVARIATIONS

GENE/CHRCNV

5VCFFILES

SNPSINDELS

ANNOTATIONVISUALIZATION

FASTAFILE GFFFILE

2

Overview of analysis workflow

Overviewofmetagenomeanalysis

•  Whatismetagenomics?– Thestudyofthecollectivegenomicmaterialfromenvironmentalsamples,forexample•  Environment:soil,water• Medical:fecal,skin,kidneystone•  Industrial:bioreactors,fermenters,enrichments•  Prettymuchanything

Overviewofmetagenomeanalysis

•  Why?– Characterizeasamplethatmaybeof“biologicalinterest”,but…

– Thevastmajorityofmicroorganismscannotbecultured

– Methodsusedtoculturefromenvironmentalsamplesmissthese

•  Solution:isolateDNAfromsamples,sequenceit,thenbreakdownwhatisthere.– Yes,it’sasdifficultasitsounds

Overviewofmetagenomeanalysis

•  Solution:isolateDNAfromsamples,sequenceit,thenbreakdownwhatisthere.– Taxonomic–whatispresent?– Functional–whatcanbedonemetabolically(e.g.metabolicpotential)?•  Note,thiscannotbedonewith16sdirectly

Overviewofmetagenomeanalysis

•  Note:dependingonthequestion,maybecomplementary(andsimilarlydifficult)data– Metatranscriptome–whatisbeingexpressedinenvironmentalsamples(RNA)

– Metabolome–metabolitesproduced– Proteome–proteinspresentinsample

Overviewofmetagenomeanalysis

•  Twogeneralapproaches– Targetedsequencing(e.g.16svariableregions)– Shotgun(whole)metagenomesequencing

TargetedanalysisMorganXC,HuttenhowerC(2012)Chapter12:HumanMicrobiomeAnalysis.PLOSComputationalBiology8(12):e1002808.

OTU:OperationalTaxonomicUnit(clusterofsimilarsequencevariants)usedtocategorizebacteria

TargetedanalysisMorganXC,HuttenhowerC(2012)Chapter12:HumanMicrobiomeAnalysis.PLOSComputationalBiology8(12):e1002808.

k-NNHierarchicalclusteringBayesianclusteringGreedyheuristicclustering

ToolsMothurUSEARCH/UCLUST/UPARSECD-HIT

TargetedanalysisMorganXC,HuttenhowerC(2012)Chapter12:HumanMicrobiomeAnalysis.PLOSComputationalBiology8(12):e1002808.

LinearmodelRandomforest

ToolsRDPClassifier16sClassifierPhyloSiftPhyloPithia

Shotgunmetagenomeanalysis•  Fullsequencingofthegenomiccontentofanenvironmentalsample.

•  Twogeneralmethodsinanalysis:– Assembly-based:assemblethesequences,thenclassifythecontigsfromtheassemblyinto‘bins’,followedbygeneprediction,annotation,andsomeformofquantifyingandnormalizingdataforcomparisonacrosssamples

– Read-based:analysetheunassembledreadsdirectlyagainstadatabaseofinterest,thenassigntaxonomyandfunctionwhenpossible

Shotgunmetagenomeanalysis

Quince,Cetal.Shotgunmetagenomics,fromsamplingtoanalysis,(2017)NatureBiotechnology(35):833–844

Metagenomeanalysis-Binning

Sedlar,Ketal,Bioinformaticsstrategiesfortaxonomyindependentbinningandvisualizationofsequencesinshotgunmetagenomics.ComputationalandStructuralBiotechnologyJournal15:48-55.2017

MLModelLinearregressionInt.MarkovModelPCASVDLotsofClustering!k-meansk-medioidsGaussianmixturemodelGreedyheuristicBayesianclusteringSpectralclustering

ToolsCONCOCTMetaBATMaxBin

Shotgunmetagenomeanalysis

http://armbrustlab.ocean.washington.edu/seastar

Shotgunmetagenomeanalysis

•  Let’ssayyouhaveametagenomeassembly•  Nowyouhavetoannotateittogetfunctionalinformation

ToolsMetaProdigalMetaGeneMarkFragGeneScan

MLModelHMMNeuralnetworkInt.Markovmodels

Sharpton,T.Anintroductiontotheanalysisofshotgunmetagenomicdata.Front.PlantSci.,16June2014

Whatnext?

•  Attheend,younormallyendupwithquantitativeinformationrelatedto:– Taxonomiccounts– Featurecounts(genes,proteinfamilies)

•  Thesecangointostandarddownstreampackagesforanalysis(phyloseq,MEGAN,etc)– Normallyinvolvesperformingsomeformofordination(PCoA,MDS,etc)

MLusedforclassification

Figure5:GutMLGsclassifycolorectalcarcinomaandadenomasamplesfromhealthycontrols.

Niceliteratureoverviewhttps://arxiv.org/pdf/1510.06621.pdf

ML–Overview

ML–OTUClustering

ML-Binning

ML–TaxonomicClassification

ML–GenePrediction