bioinformatics approaches for metagenomics data … work… · bioinformatics approaches for...

62
BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA ANALYSIS ADI DORON-FAIGENBOIM PLANT SCIENCES, VEGETABLE AND FIELD CROPS ARO, THE VOLCANI CENTER, ISRAEL RISHON LEZION 7528809

Upload: others

Post on 26-May-2020

9 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

B I O I NFOR MAT I C S A P P ROACH ES F O R M E TAG ENOM I C S D ATA A N A LYS I S

A D I D O R O N - FA I G E N B O I M

P L A N T S C I E N C E S , V E G E TA B L E A N D F I E L D C R O P S A R O , T H E V O L C A N I C E N T E R , I S R A E L R I S H O N L E Z I O N 7 5 2 8 8 0 9

Page 2: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Metagenomics

o“Metagenomics is the study of the collective genomes of all microorganisms from an environmental sample”o Community

o Environmental

o Ecological

Page 3: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

DNA sequencing & microbial profilingTraditional microbiology relies on isolation and culture of bacteria

o Cumbersome and labour intensive process

o Fails to account for the diversity of microbial life

o Great plate-count anomaly

Staley, J. T., and A. Konopka. 1985. Measurements of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annu. Rev. Microbiol. 39:321-346

Page 4: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Why environmental sequencing?Estimated 1000 trillion tons of bacterial/archeal life on Earth

o Only a small proportion of organisms have been grown in culture

o Species do not live in isolation

o Clonal cultures fail to represent the natural environment of a given organism

o Many proteins and protein functions remain undiscovered

Page 5: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Why environmental sequencing?

Human microbiomeRhizobiome Pollutant

sitesNon-human microbiomes

Page 6: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

The revolution in sequencing technologiesHigh throughput technologies promote the accumulation of enormous volumes of genomic and metagenomics data.

Next-Generation Sequencing: A Review of Technologies and Tools for Wound Microbiome Research Brendan P. Hodkinson and Elizabeth A. Grice*. Adv Wound Care (New Rochelle). 2015

HiSeqMiSeq

Page 7: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Experimental ApproachesCommunity composition

◦ Microbiome (16S rRNA gene, 18S, ITS, etc.)

Community composition and functional potential◦ Metagenomics

Functional genetic response◦ Metatranscriptomics

Page 8: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

16s Vs. Shotgun Metagenomico16s – targeted sequencing of a single gene

◦ Marker for identification

◦ Well established

◦ Cheap

◦ Amplified what you want

oShotgun sequencing – sequence all the DNA◦ No primer bias

◦ Can identify all microbes

◦ Function information

Page 9: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

16S rRNA sequencing

Erlandsen S L et al. J Histochem Cytochem2005;53:917-927

• 16S rRNA forms part of bacterial ribosomes.

• Contains regions of highly conserved and highly variable sequence.

• Variable sequence can be thought of as a molecular “fingerprint” can be used to identify bacterial genera and species.

• Large public databases available for comparison.–Ribosomal Database Project (RDP) currently contains >1.5 million rRNA sequences.

• Conserved regions can be targeted to amplify broad range of bacteria from environmental samples.

• Not quantitative due to copy number variation

Page 10: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

16S rRNA gene sequencingo Pros

◦ Well established

◦ Sequencing costs are relatively cheap (~50,000 reads/sample)

◦ Only amplifies what you want (no host contamination)

oCons◦ Primer choice can bias results towards certain organisms

◦ Usually not enough resolution to identify to the strain level

◦ Need different primers usually for archaea & eukaryotes (18S)

◦ Cannot identify viruses

◦ No direct functional profiling

Page 11: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Binning sequences to UTSoOperational Taxonomic Unit (OTU) An arbitrary definition of a taxonomic unit based on

sequence divergence

oComposition-based binning− GC content

− Di/Tri/Tetra/... nucleotide composition (kmer-based frequency comparison)

− Codon usage statistics

oSimilarity-based binning− Direct comparison of OTU sequence to a reference database

− Identity cut-off varies depending on resolution required Genus - 90% , Family - 80% , Species - 97%

Page 12: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

MEGAN Blast against NCBI database

Clustering of OTUs based on sequence similarity

Sample 2 Sample 1

OTU present 50:50 in both samples

Page 13: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Software for binningo Composition-based binning

o TETRA - Maximal-Order Markov Modelo PhyloPythia – Support Vectoro Seeded Growing Self-Organising Maps (S-GSOM)o TETRA + Codon based usage

o Similarity-based binningoRequires that most sequences in a sample are present in a primary or secondary reference

databaseoQIIME oMEGAN (comparison against Blast NCBI NR)oMothur (RDP)oCARMA (comparison against PFAM)oARB (linked with Silva database)

Page 14: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Sequences Databases

Page 15: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Measuring diversity of OTUsTwo primary measures for sequence based studies:

• Alpha diversity

−What is there? How much is there?

−Diversity within a sample

• Beta diversity

−How similar are two samples?

−Diversity between samples

Page 16: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Alpha diversity – human microbiome

C Huttenhower et al. Nature 486, 207-214 (2012) doi:10.1038/nature11234

Page 17: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Alpha diversityoSpecies count in the sampleowhat is a species ?

o OUTs

omissing level of evolutionary diversity

oPhylogenetic diversity (PD)o sum of the branch length covered by a sample

omissing the distribution of the species

Page 18: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Alpha diversityoSimpson’s diversity index (also Shannon, Chao indexes)o gives less weight to rarest species

S is the number of speciesN is the total number of organismsni is the number of organisms of species i

Whittaker, R.H. (1972). "Evolution and measurement of species diversity". Taxon(International Association for Plant Taxonomy (IAPT)) 21 (2/3): 213–251

Page 19: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Beta diversity – human microbiome

C Huttenhower et al. Nature 486, 207-214 (2012) doi:10.1038/nature11234

Page 20: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Beta diversityoDiversity between samples

oUnifrac distance

oPhytogenic-based beta diversity

oPercentage observed branch length unique to either sample

Lozupone and Knight, 2005. Unifrac: A new phylogenetic method for comparing microbial communitieis. Appl Environ Microbiol 71:8228

Page 21: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Other useful data representationsSimple bar charts - what species are present?

Page 22: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Other useful data representationsRarefaction curves - How much of a community have we sampled?

Nu

mb

er

of

OT

Us

Number of sequences

Adapted from Wooley et al. A Primer on Metagenomics, PLoS Computational Biology, Feb 2010, Vol 6(2)

Page 23: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Shotgun whole metagenomeoUnlike 16S, metagenomic sequencing is no targeted to

a specific gene, but does an unbiased sample of the entire genomic DNA.

oTypically shorter sequence reads are usedto obtain >5Gb of data per sample.

oHiSeq or NextSeq platform are typically more costeffective for metagenomic sequencing

Page 24: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Shotgun metagenomicsPros

◦ No primer bias

◦ Can identify all microbes (e.g. eukaryotes, viruses)

◦ Direct functional profiling

• Cons◦ More expensive (millions of sequences needed)

◦ Host/site contamination can be significant

◦ May not be able to sequence “rare” microbes

◦ Required computational resources can be restrictive

◦ More complex bioinformatic analyses required◦ Chimera, unknown function

Page 25: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Sequence coverageComplexity

Diversity & Coverage

Estimating coverage in metagenomic data sets and why it matters. ISME J. 2014Luis M Rodriguez-R and Konstantinos T Konstantinidis

Page 26: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Metagenomics' assembly

Page 27: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Metagenomics' assembly

Metagenomic Assembly: Overview, Challenges and Applications. Yale J Biol Med. 2016 Sep; 89(3): 353–362

Page 28: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Metagenomics' assembly

o Greedy assembler:o reads with maximum overlaps are iteratively merged into contigs

o Overlap-Layout-Consensus : o graph is constructed by finding overlaps between all pairs of reads

o Bruijn graph: o reads are chopped into short overlapping segments (k-mers) o K-mers are organized in a de Bruijn graph based on their co-occurrence across reads. o The graph is simplified to remove artifacts due to sequencing errors, o branch-less paths are reported as contigs.

Page 29: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

de Bruijn graph approacho Low abundance genomes may end up fragmented if overall sequencing depth is insufficient to form connections in the grapho Using a short k-mer size

oThe assembler must strike a balance between recovering low abundance genomes and obtaining long, accurate contigs for high abundance genomes

oComputational time and memory may be insufficient to complete such assemblies.

oMultiple k-mer approach

oSpread memory load over cluster of computer

Page 30: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Metagenome assembly tools

Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters!John Vollmers, Sandra Wiegand, Anne-Kristin Kaster

Page 31: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

What we do with the assemblyoCharacterizing the contigs/scaffolds oMapping statistics

o Compositions (%GC, codon usage)

o Annotation - taxonomy & function assignments

oBinning

oComparative genomics

oMetabolic pathways

Page 32: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Binning over read mappingoPartition the metagenome to specieso Read coverage (multiple samples)

o compositions

Metagenomic Assembly: Overview, Challenges and Applications. Yale J Biol Med. 2016 Sep; 89(3): 353–362

GC%sample3

sample2

sample1

3460727scaffold1

3361629scaffold2

5120215scaffold3

5022207scaffold4

Page 33: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Binning over read mappingGC%sample

3sample

2sample

1

3460727scaffold1

3361629scaffold2

5120215scaffold3

5022207scaffold4

0

10

20

30

40

50

60

70

GCsample3sample2sample1

scaffold1

scaffold2

scaffold3

scaffold4

Page 34: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Binning contigsoCompletely automated approacho CONCOCT

o GroopM

oMetaBAT

oCompleteness of metagenome assembled genomes (MAGs)o single-copy core genes (tRNA synthetases , ribosomal proteins)

Page 35: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Genes annotationsoFinds bacterial genes in the contigs/scaffolds

◦ Prodigal◦ Prokka

oAnnotation of the genes◦ By homology searches (DIAMOND)◦ Domains finding

o Comparisons◦ Gene family◦ Distribution among the samples (CD-HIT)

Functional potential - The annotations suggest the functional potential of the community

No sure about the biology activity (may not be transcribed an translates)

Page 36: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Common functional databasesoNCBI

oCOGo Well known but original classification (not updated since 2003)

o PFAMo Focused more on protein domains based on hidden Markov models

oKEGGo Very popular, each entry is well annotated, and often linked into “Modules” or “Pathways”o Full access now requires a license fee

o MetaCyco Similar to KEGG, but more microbe focused

o UniRefo Has clustering at different levels (e.g. UniRef100, UniRef90, UniRef50)o Most comprehensive and is constantly updatedo These gene families are typically less functionally informative

Page 37: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Metagenomic annotation systemWeb-based

◦ EBI

◦ MG-RAST

GUI-based◦ MEGAN

Local-based◦ Kraken

◦ MetAMOS

Page 38: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Post-processing analysisoData matrices of samples versus microbial featureso species

o genes

o Pathways

oUnsupervised methodso Clustering and correlations

o PCA

oStatistically different between sample typeso taxa or functional genes

Page 39: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence DataFront. Genet., 06 March 2017

Page 40: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Case study: the microbiome of fruit peel

Maria Vetcos Edoardo Piombo Shlomit Medina

Shiri Freilich

Samir Droby Michael Wisniewski

Page 41: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Case study: the microbiome of fruit peel

Page 42: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Read length: 150Total of 472 million quality reads

Sequencing output: files in FASTQ format

Page 43: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Assembly: MEGAHIT Format: FASTQTotal of 472 million quality reads Total of 71 Gbp

Format: FASTATotal number of contigs/contigs > 2k: 4,000,000/200,000Average contig length: 820/4,600 bpN50: 980/5000 bpTotal #bp: 3Gbp/1Gbp

Page 44: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Sample #raw reads #clean reads %clean reads #PE%mapping vs.

Filtered set

A1 26,692,151 22,638,404 84.81296243 45,276,808 75.59

A2 32,550,741 27,819,952 85.46641688 55,639,904 69.84

A3 24,083,541 20,677,583 85.85773579 41,355,166 82.77

C1W 29,722,008 25,416,861 85.51528887 50,833,722 78.32

C2W 24,125,961 20,451,024 84.76770728 40,902,048 76.01

C3W 24,956,733 21,353,952 85.56389172 42,707,904 87.48

M1 26,211,005 21,974,866 83.83831906 43,949,732 66.52

M2 5,640,819 4,765,939 84.49019548 9,531,878 62.97

M3 6,113,051 5,137,683 84.04449758 10,275,366 57.24

O1S 23,760,866 19,848,045 83.53249835 39,696,090 57.85

O2S 28,317,777 23,141,736 81.72158429 46,283,472 57.22

O3S 28,604,975 22,679,029 79.28351275 45,358,058 64.43

Total 280,779,628 235,905,074 84.02 471,810,148

Full contig set Contig > 2KTotal number of

sequences3,762,133 206,575

Total number of bps

3,085,995,440 945,480,334

Average sequence length

820.27 4,576.93

N50 979 4,926

Page 45: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Format: FASTATotal number of contigs > 2k pb: 200,000

Gene calling: Prodigal

Format: FASTATotal number of genes: 1,000,000

Page 46: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Genome/geneassembly

(pooled data)

Raw Genomic

Data

4 treatments X 3 repeats = 12 libraries

~45 million reads per libraryTotal of ~472 million quality

reads

~200,000 contigswith N50 of ~5000 bp

With 60% of reads mapped

Functional and taxonomic

annotations

AnnotationsGene calling

~1,000,000 genes

From sequence to gene: summary

Page 47: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

JGI annotation platform

Page 48: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Annotation in MEGAN based DIAMOND similarity search

1,000,000

genes

Ncbi NR

DIAMOND

Similarity search

Detection of homologs

for 75 % of genesCondensation into

DAA binary format

Page 49: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Input daa file

SEED

KEGG

Taxonomy

Output filesTaxonPathTaxon IDetc

Output files

Output files

KEGGPathKEGGNameetc

SEEDPathSEEDNameetc

MEGAN annotation platform

Page 50: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Taxonomic annotations

Page 51: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Krona chart: dynamic representationMegan file- Taxonomy ID

assigned_Krona_All.html

Page 52: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Annotations of most genes on the same contigare consistent

Page 53: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and
Page 54: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

SEED

KEGG

Functional annotations

Page 55: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Annotations statistic

%

genes Assigned assigned genes assigned genes

Taxa 759,353 570,702 0.75 75

Interpro2go 759,353 367,789 0.48 48

Eggnog 759,353 255,892 0.34 34

KEGG* 759,353 187,842 0.25 25

* from seed 2015 mapping file

Page 56: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Count data

The count data are presented as a table which reports, for each sample, the number of sequence fragments that have been assigned to each genes.

Page 57: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

PCA & correlationsIsrael organic

Israel conventional

US conventional

Page 58: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

compounds_contig_conventionalcompunds_contig_organic compunds_gene_conventional compunds_gene_organic

Cutin, suberine and wax biosynthesis 0 5 0 6

Biosynthesis of alkaloids derived from shikimate pathway 0 5 0 4

Drug metabolism - cytochrome P450 0 10 0 9

Glycerophospholipid metabolism 5 0 5 0

Tyrosine metabolism 2 6 2 6

Bisphenol degradation 0 4 0 4

Penicillin and cephalosporin biosynthesis 2 4 2 4

Chlorocyclohexane and chlorobenzene degradation 0 6 0 5

Steroid hormone biosynthesis 10 1 10 1

Inflammatory mediator regulation of TRP channels 3 1 3 0

Isoquinoline alkaloid biosynthesis 0 6 0 6

Arachidonic acid metabolism 17 0 17 0

Aminobenzoate degradation 0 7 0 7

Retinol metabolism 0 6 0 6

Flavonoid biosynthesis 8 0 8 0

Flavone and flavonol biosynthesis 7 1 6 1

Fluorobenzoate degradation 11 0 11 0

Anthocyanin biosynthesis 12 0 12 0

Betalain biosynthesis 8 0 8 0

Steroid biosynthesis 12 0 12 0

Polycyclic aromatic hydrocarbon degradation 0 21 0 21

Porphyrin and chlorophyll metabolism 14 0 14 0

Amino sugar and nucleotide sugar metabolism 0 9 0 9

Biosynthesis of plant secondary metabolites 4 2 4 1

Biosynthesis of type II polyketide products 5 0 5 0

Ubiquinone and other terpenoid-quinone biosynthesis 1 10 1 10

Linoleic acid metabolism 5 0 5 0

Biosynthesis of 12-, 14- and 16-membered macrolides 21 4 21 4

Glycine, serine and threonine metabolism 4 1 4 1

OrganicConventionalName

Page 59: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and
Page 60: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and
Page 61: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Differential abundance of enzymes in the KEGG metabolic pathway

Page 62: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and

Thank you