20120907 microbiome-intro

Code sharing for microbiomicsLeo, Wageningen 7.9.201 2

Challenges with computer code:- Analyses not standardized -> confusion & non-optimal choices- Poor documentation -> poor reproducibility & waste of time- Reinventing the wheel -> waste of time & resources

Solution:- Harmonized software libraries (e.g. R packages)- Easier to share tools (GitHub)

-> more reliable & reproducible-> more standardized-> avoid repetitive coding-> added value for publications-> distributed version control; all changes automatically tracked-> facilitates Helsinki-Wageningen collaboration

Wiki: various example analyses already implemented

- retrieve data from MySQL( H/M/PITChip)

- preprocessing (profiling & HITChip Atlas)

- analysis routines (diversities, tables,Wilcoxon tests etc.)

- visualization

-> improving through time

Stepbystep examples with source code andsimulated data

Common core microbiota:effect of analysis depth and prevalence

"Blanket analysis"github.com/microbiome

Estimate the frequency ofbelonging to the core foreach phylotype; confidenceintervals with bootstrap

Coresize

Abundance

PrevalenceSalonen A, et al. (2012) The adult intestinal coremicrobiota is determined by analysis depth and healthstatus, Clinical Microbiology and Infection 18:16–20.

Compatible with HITChip Atlas of Human GutMicrobiota (>3200 samples)

45 studies - Standardized Platform

>1000phylotypes

>3000 samples

-> Compare your own data to HITChip data collections?

Differences to the old profiling script?-> Separate preprocessing from analysis-> Support modularity

-> removed outdated options & outputs from profiling script

1. Preprocessing: minimal output from profiling script:- preprocessed data matrices (oligo/L1 /L2/species/absolutescale) with NMF/RPA/SUM- preprocessing log (parameter values etc.)- quality control plots (heatmap)

2. Analysis & visualization routines- based on profiling script output & done afterwards -> modular- used when needed, not run by default-> keeping it simple & storing disk space

Summary: code development & sharing through GitHubIn-house sharing infrastructure for code-> distributed package maintenance-> avoid bugs; facilitate transparency & reproducibility-> additional visibility & citations?

Avoid extra work and focus on the essential-> check for ready-made examples from the wiki!-> ask for help -> let's add examples to the wiki!

Manage and share your own code?-> GitHub and microbiome R package-> Version control

microbiome.github.com

To discussDo you have R code which could be useful for others?-> let's polish, document & add it in the package!

Which tools to include?- diversity/richness/evenness calculations- PCA, hierarchical clusterings, RDA etc.- Wilcoxon tests- Association (Spearman) tables phylotypes vs. phenotypes- Relative contributions from bg variables

-> ideally, only standard things should be standardized;for rare analyses just use basic R & other packages

HITChip preprocessing steps- Spatial correction

- Between array normalization

- Background correction

- Oligo summarization

1. Spatial correction

2. Betweenarray normalization: minmax vs. quantiles?

3. Background correction: skip!

4. Oligo summarization

NMF

RPA

SUM

AVE

Preprocessing: recommendations* Normalization:

- minmax: use by default- quantile: use if samples have 'similar' microbiota

* Background correction-> ignore

* Oligo summarization-> NMF: for L0/L1 /L2 levels-> RPA: if species level is also included-> (SUM: for comparison)-> AVE: deprecated

=> The defaults readily implemented in the pipeline

Diversity analysisRichness, evenness, diversity

Shannon vs. Inverse Simpson?

Detection threshold?

Richness with various indices and thresholds

Recommendation:- oligo level

- shannon diversity

- richness as speciescount with 80%quantile detectionthreshold

- evenness withPielou's index

Further analysis tools

microbiome.github.com

20120907 microbiome-intro

Technology

analysis tools

code sharing

computer code

code development sharing

separate preprocessing

source code andsimulated

effect of analysis depth

profiling script output