20120907 microbiome-intro
DESCRIPTION
Introduction to the microbiome R packageTRANSCRIPT
Code sharing for microbiomicsLeo, Wageningen 7.9.201 2
Challenges with computer code:- Analyses not standardized -> confusion & non-optimal choices- Poor documentation -> poor reproducibility & waste of time- Reinventing the wheel -> waste of time & resources
Solution:- Harmonized software libraries (e.g. R packages)- Easier to share tools (GitHub)
-> more reliable & reproducible-> more standardized-> avoid repetitive coding-> added value for publications-> distributed version control; all changes automatically tracked-> facilitates Helsinki-Wageningen collaboration
Wiki: various example analyses already implemented
- retrieve data from MySQL( H/M/PITChip)
- preprocessing (profiling & HITChip Atlas)
- analysis routines (diversities, tables,Wilcoxon tests etc.)
- visualization
-> improving through time
Stepbystep examples with source code andsimulated data
Common core microbiota:effect of analysis depth and prevalence
"Blanket analysis"github.com/microbiome
Estimate the frequency ofbelonging to the core foreach phylotype; confidenceintervals with bootstrap
Coresize
Abundance
PrevalenceSalonen A, et al. (2012) The adult intestinal coremicrobiota is determined by analysis depth and healthstatus, Clinical Microbiology and Infection 18:16–20.
Compatible with HITChip Atlas of Human GutMicrobiota (>3200 samples)
45 studies - Standardized Platform
>1000phylotypes
>3000 samples
-> Compare your own data to HITChip data collections?
Differences to the old profiling script?-> Separate preprocessing from analysis-> Support modularity
-> removed outdated options & outputs from profiling script
1. Preprocessing: minimal output from profiling script:- preprocessed data matrices (oligo/L1 /L2/species/absolutescale) with NMF/RPA/SUM- preprocessing log (parameter values etc.)- quality control plots (heatmap)
2. Analysis & visualization routines- based on profiling script output & done afterwards -> modular- used when needed, not run by default-> keeping it simple & storing disk space
Summary: code development & sharing through GitHubIn-house sharing infrastructure for code-> distributed package maintenance-> avoid bugs; facilitate transparency & reproducibility-> additional visibility & citations?
Avoid extra work and focus on the essential-> check for ready-made examples from the wiki!-> ask for help -> let's add examples to the wiki!
Manage and share your own code?-> GitHub and microbiome R package-> Version control
microbiome.github.com
To discussDo you have R code which could be useful for others?-> let's polish, document & add it in the package!
Which tools to include?- diversity/richness/evenness calculations- PCA, hierarchical clusterings, RDA etc.- Wilcoxon tests- Association (Spearman) tables phylotypes vs. phenotypes- Relative contributions from bg variables
-> ideally, only standard things should be standardized;for rare analyses just use basic R & other packages
HITChip preprocessing steps- Spatial correction
- Between array normalization
- Background correction
- Oligo summarization
1. Spatial correction
2. Betweenarray normalization: minmax vs. quantiles?
3. Background correction: skip!
4. Oligo summarization
NMF
RPA
SUM
AVE
Preprocessing: recommendations* Normalization:
- minmax: use by default- quantile: use if samples have 'similar' microbiota
* Background correction-> ignore
* Oligo summarization-> NMF: for L0/L1 /L2 levels-> RPA: if species level is also included-> (SUM: for comparison)-> AVE: deprecated
=> The defaults readily implemented in the pipeline
Diversity analysisRichness, evenness, diversity
Shannon vs. Inverse Simpson?
Detection threshold?
Richness with various indices and thresholds
Recommendation:- oligo level
- shannon diversity
- richness as speciescount with 80%quantile detectionthreshold
- evenness withPielou's index
Further analysis tools
microbiome.github.com