m2ia: a web server for microbiome and metabolome ... · associated with obesity from both...
TRANSCRIPT
M2IA: a Web Server for Microbiome and Metabolome Integrative Analysis
Yan Ni1,#, Gang Yu1, Huan Chen2, Yongqiong Deng3, Philippa M Wells4, ClaireJSteves4,5,Feng Ju6,7, Junfen Fu1,#
1. National Clinical Research Center for Child Health, The Children’s Hospital, Zhejiang
University School of Medicine, Hangzhou 310029, P.R. China.
2. Key laboratory of Microbial technology and Bioinformatics of Zhejiang Province,
NMPA Key laboratory for Testing and Risk Warning of Pharmaceutical Microbiology,
Zhejiang Institute of Microbiology, Hangzhou 310012, P.R. China.
3. Department of Dermatology and STD, the Affiliated Hospital of Southwest Medical
University, Luzhou, Sichuan, P.R. China.
4. The Department of Twin Research, Kings College London, South Wing, St Thomas'
Hospital, Westminster Bridge Road, London SE1 7EH, UK
5. Department of Ageing and Health, St Thomas' Hospital, 9th floor, North Wing,
Westminster Bridge Road, London SE1 7EH, UK
6. School of Engineering, Westlake University, 18 Shilongshan Road, Hangzhou 310024,
P.R. China.
7. Institute of Advanced Technology, Westlake Institute for Advanced Study, 18
Shilongshan Road, Hangzhou 310024, P.R. China.
Correspondence should be addressed to Dr. Yan Ni ([email protected]) and Dr.
Junfen Fu ([email protected]).
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
Abstract
Background: In the last decade, integrative studies of microbiome and metabolome have
experienced exponential growth in understanding their impact on human health and
diseases. However, analyzing the resulting multi-omics data and their correlations
remains a significant challenge in current studies due to the lack of a comprehensive
computational tool to facilitate data integration and interpretation. In this study, we have
developed a microbiome and metabolome integrative analysis pipeline (M2IA) to meet
the urgent needs for tools that effectively integrate microbiome and metabolome data to
derive biological insights.
Results: M2IA streamlines the integrative data analysis between metabolome and
microbiome, including both univariate analysis and multivariate modeling methods.
KEGG-based functional analysis of microbiome and metabolome are provided to explore
their biological interactions within specific metabolic pathways. The functionality of
M2IA was demonstrated using TwinsUK cohort data with 1116 fecal metabolites and 16s
rRNA microbiome from 786 individuals. As a result, we identified two important
pathways, i.e., the benzoate degradation and phosphotransferase system, were closely
associated with obesity from both metabolome and microbiome.
Conclusions: M2IA is a public available web server (http://m2ia.met-bioinformatics.cn)
with the capability to investigate the complex interplay between microbiome and
metabolome, and help users to develop mechanistic hypothesis for nutritional and
personalized therapies of diseases.
Keywords: M2IA, microbiome and metabolome correlation, data integration, network
analysis, metabolic function
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
Background
The study of microbiome in human health has experienced exponential growth over the
last decade with the advent of new sequencing technologies for interrogating complex
microbial communities[1]. Meanwhile, metabolomics has been an important tool for
understanding microbial community functions and their links to health and diseases
through the quantitation of dozens to hundreds of small molecules[2]. The gut microbiota
is considered a metabolic ‘organ’ to protect the host against pathogenic microbes,
modulate immunity, and regulate metabolic processes, including short chain fatty acid
production and bile acid biotransformation[3]. Conversely, these metabolites can modulate
gut microbial compositions and functions both directly and indirectly[4]. Thus, the
microbiota-metabolites interactions are important to maintain the host health and well-
being. Integrative data analysis of gut microbiome and metabolome can offer deep
insights on the impact of lifestyle and dietary factors on chronic and acute diseases (e.g.,
autoimmune diseases, inflammatory bowel disease[5], cancers[6], type 2 diabetes and
obesity[7], cardiovascular disorders, and non-alcoholic fatty liver disease[8]), and provide
potential diagnostic and therapeutic targets[9].
In the last decade, metabolomics studies in microbiota-related research have increased in
a wide range of research areas, such as gastroenterology, biochemistry, endocrinology,
microbiology, genetics, to nutrition, food science and pharmacology (Figure S1).
However, both metagenomics/16s rRNA-based high-throughput sequencing technologies
and mass spectrometry/nuclear magnetic resonance-based metabolomics platforms can
produce large and high-dimensional data, posing a major challenge for subsequent data
integration[10, 11]. Current integration analysis methods mainly focus on statistical
correlations between microbiome and metabolome, such as the spearman correlations and
partial least squares discriminant analysis (Table S1). Pedersen et al. recently
summarized a step-by-step computational protocol from their previous study that applied
WGCNA, a dimension reduction method, to measure the correlations among the host
phenotype, gut metagenome, and fasting serum metabolome. However, separate analyses
of -omics data through one or two statistical methods only provide fragmented
information and do not capture the holistic view of disease mechanisms. Moreover,
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
although much bioinformatics work has been done to process and analyze the individual
omics data, to date, it still lacks a comprehensive strategy or a computational tool to
analyze the correlations between microbiome and metabolome[12]. To rapidly advance
microbiome and metabolome data integration and understand their roles in diverse
diseases, advanced computational methods for multi -omics data integration and
interpretation need to be developed[2].
In this study, we have developed a comprehensive computational tool, a standardized
workflow for microbiome and metabolome integrative analysis (M2IA). M2IA combines
a variety of univariate and multivariate methods for correlation analysis and data
integration. In addition, functional network analysis implemented with KEGG metabolic
pathway database can provide deep insights of their biological correlations, i.e., the
possible participation of identified microbes in a specific metabolic reaction or metabolic
pathway. M2IA accepts the microbiome data from 16S rRNA gene or shotgun
metagenomic sequencing technologies, and metabolomics data from mass spectrometry
or NMR spectroscopy platforms. Other than current bioinformatics tools, M2IA is an
automated data analysis and reporting pipeline that can be applied by users with little
training in bioinformatics. M2IA is a web-based server with user-friendly interface, and
public available to academic users through http://m2ia.met-bioinformatics.cn.
Implementation
M2IA is developed in Java web system, with html, Javascript technology implemented for
interactive interface and JFinal and Mysql databases for data management in the backend.
All the statistical analyses and visualization were implemented in R programming
language. The entire system is deployed on a cloud server with 16 GB of RAM and four
virtual CPUs with 2.6 GHz each. Users can register an academic account to manage their
own research projects. M2IA has been tested with major modern browsers such as Google
Chrome, Safari, Mozilla Firefox and Microsoft Internet.
Result
Workflow
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
A standard data analysis workflow in M2IA contains six major steps: (1) data upload and
processing, (2) global similarity analysis between microbiome and metabolome data
matrix, (3) pairwise metabolite-microbe correlation analysis, (4) multivariate regression-
based data integration analysis, (5) functional network analysis, and (6) an auto-generated
html report for overview. Figure 1 summarizes the overall design and the flowchart of
M2IA. Each step offers multiple methods helping users to explore complex correlations
between microbiome and metabolome. In this section, each step will be described in
detail.
Data upload and processing
Data upload. The first step is to upload three different types of data: (1) microbiome
datasets: for 16s rRNA gene sequencing data, an OTU abundance table with taxonomic
annotations (.txt format) and a corresponding reference sequence file (.fna format) are
required. Similarly, a unigene table with taxonomic annotations and a table with KEGG
KO function annotations are required for metagenomic shotgun sequencing study. (2)
metabolome dataset: a metabolite abundance table (.csv format) with compound name,
HMDB database ID, and chemical class information, and (3) a metadata table containing
sample IDs and group information. Sample IDs from microbiome and metabolome
datasets may not be exactly the same, but should be paired correctly within the metadata
table. Once uploaded successfully, M2IA system examines the consistency of sample IDs
from two datasets and notifies users whether there exist unmatched samples. After that,
the metadata table is presented and can be modified by users interactively. For example,
instead of uploading a new table, when users want to change grouping information or
remove certain samples, they can modify group IDs or uncheck certain samples directly
on the table for downstream data analysis.
Missing value processing. This step includes data filtering and missing value imputation.
First, unqualified variables can be removed according to the criteria of sample prevalence
and variance across samples. For microbiome data, the minimal count number, the
percentage of samples with non-missing values, and the relative standard deviation (RSD)
are used for screening. As suggested by MicrobiomeAnalyst[10], features with very low
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
count (e.g., <2) in a few samples (e.g., <20%), or very stable with limited variations
across all the samples (e.g., RSD < 30%), are considered difficult to interpret their
biological significance in the community and can be removed directly. Similarly, our
previous work summarized that those metabolic features with missing values in more
than 80% of samples or RSD values smaller than 30% can be filtered at the beginning.
The sample prevalence is calculated based on all the samples or samples within each
group[13].
Second, both microbiome and metabolome have the characteristics of sparsity seen as the
absence of many taxa or metabolites across samples due to biological and/or technical
reasons[14]. Such missing values may pose general numerical challenges for traditional
statistical analysis, thus M2IA provides a variety of methods for missing value imputation,
aiming to remove zeros or NAs in the data matrix and facilitate subsequent statistical
analysis. There are two different approaches to handle missing values: one is to simply
replace missing values with a certain value (called pseudo count in microbiome),
including half of the minimum or the median value; another is to apply regression models
to impute missing values, such as random forest, k-nearest neighbors, Probabilistic PCA,
Bayesian PCA, singular value decomposition, and the quantile regression imputation of
left-censored data (QRILC)[13]. Users may choose an appropriate method for missing
value imputation.
Data normalization. After missing value processing, users can perform normalization
method in order to make more meaningful comparisons. M2IA provides the total sum
scaling that calculates the relative percentage of features, or the log transformation when
data does not follow normal distribution. To note, scaling or transformation methods are
also provided for specific statistical analysis, such as principal component analysis (PCA)
and PLS-DA analysis.
Global similarity between two datasets
After data processing, the first step is to apply Coinertia analysis (CIA) and Procrustes
analysis (PA) to evaluate the global similarity between metabolome and microbiome
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
dataset (Figure 1)[15]. CIA can be considered as a PCA model of the joint covariances of
two datasets. Similarly, PA measures the congruence of two-dimensional data
distributions from superimposition and scaling of PCA models of two datasets (Figure
2A). The correlation coefficient R is between 0 and 1, and the closer it is to 1 indicates
the greater similarity between two datasets.
Pairwise metabolite-microbe correlation analysis
Correlation analysis methods. M2IA provides five different types of correlation analysis,
including Pearson, Spearman, SparCC, CCLasso, and Maximal Information Coefficient
(MIC) analysis. Although each method has it pros and cons in different situations, users
can choose an appropriate one. Since Spearman correlation analysis outperforms other
methods due to their overall performances[16], it has been set as a default one in M2IA.
Microbes can be analyzed according to their different taxonomic annotations (i.e.,
phylum, genus, and species) and metabolites at different chemical classifications (e.g.,
amino acids, sugars, free fatty acids etc.). In addition, M2IA allows users to define the
specific criteria of significance between microbe-metabolite pairs for subsequent analysis,
e.g., absolute correlation coefficients > 0.3 and p value <0.05.
Visualization. Three different ways of visualization are provided in order to explore
complex correlations between microbes and metabolites. The first one is to apply circos
plot that can help users to quickly identify those microbes having close correlations with
different classes of metabolites (e.g., amino acids) (Figure 2C). The second one is to
apply a heat map to illustrate their relative positive/negative correlations between each
microbe and metabolite. Meanwhile, hierarchical clustering is used to analyze the
similarities among metabolites or microbes, in which closely correlated
metabolites/microbes are usually clustered together. The third one is to provide a
microbe-metabolite interaction network using cytoscape technique, which can help users
to explore complex relationship between microbes and metabolites (Figure 2D).
Multivariate regression–based data integration
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
Unsupervised multivariate analysis. M2IA provides canonical correlation analysis
(CCA)[17, 18] and O2PLS[19] in order to evaluate the inherent correlations between two
datasets without considering phenotype information. CCA aims to find two new bases
(canonical variate) in which the correlation between original parameters of two datasets is
maximized. O2PLS is capable of modeling both prediction and systematic variation, and
the joint score plot indicates their relationship between two data matrix.
Metabolites/microbes with the large canonical coefficients from CCA model or loading
values from O2PLS model are considered essential ones for their similarities (Figure 2B).
Supervised multivariate analysis. In comparison, supervised multivariate analysis
methods integrate two data matrix initially and identify differential variables (microbes &
metabolites) that significantly contribute to the discrimination of different groups. Here,
M2IA offers PCA score-based differential analysis, partial least squares discriminant
analysis (PLS-DA), orthogonal partial least squares discriminant analysis (OPLS-DA),
and random forest (RF). PCA is originally an unsupervised data mining method; here we
examine the differences of the first principal component score values from PCA model,
and their correlations with the phenotype information. PLS-DA and OPLS-DA have been
commonly applied in the field of metabolomics for data dimension reduction and feature
selection. OPLS-DA seeks to maximize the explained variance between groups in a
single dimension or the first latent variable (LV), and separate the within group variance
(orthogonal to group difference) into orthogonal LVs (Figure 3A). The variable loadings
and/or coefficient weights from a validated PLS-DA and OPLS-DA model are used to
rank variables with respect to their performance for discriminating between groups
(Figure 3B). Boruta algorithm-based RF classifier can identify important features by
shuffling samples and adding extra randomness to the system (Figure 3C).
Network analysis
WGCNA-based network analysis. Weighted gene co-expression network analysis
(WGCNA) was recently raised to integrate high-throughput ‘-omics’ datasets for
identifying the potential mechanistic links[11]. In M2IA, WGCNA algorithm is used to
collapse co-abundant metabolites into different clusters, and thus metabolites within a
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
cluster are highly correlated. One attractive feature is that both identified and unknown
metabolite features can be considered. KEGG microbiome functional modules are
annotated using Tax4fun2[20, 21] for 16s RNA sequencing data or HUMAnN pipeline[22]
for shotgun sequencing data, respectively. Finally, the correlations between metabolite
clusters and microbial functional modules with phenotype information are examined
using Spearman correlation analysis and further visualized using heat map and network.
Metabolic function analysis. The final step aims to analyze or predict the biological
function profiles using metabolome and microbiome data, respectively. The metabolic
pathway activity is analyzed using PAPi package in R software [23]. In parallel, function
profiles of microbial communities are predicted using Tax4Fun tool for 16s rRNA data
and HUMAnN for shotgun sequencing data. Linear discriminant analysis is further used
to identify differential KO identifiers from microbiome and metabolome (Figure 4A-C).
Their relationships are further investigated using a microbe-metabolite interaction
network by connecting a specific metabolite, its reactions and related enzymes, and the
genomes of organism (both host and microbial)(Figure 4D-E).
Automated html report
Once users upload the data and optimize parameters of data analysis, M2IA provides a
simplified user experience through a one-click button to submit a job. It may take a few
minutes for M2IA to perform data analysis and generate interpretable results. The exact
processing time depends on the sample size, number of variables, and number of pair-
wise group comparisons. Once the analysis is completed, M2IA will summarize key
results and produce an html report for users. In addition, a zip-compressed package
containing all the supporting data files (tables and figures) can be downloaded for future
use.
Application example
To validate the performance of M2IA, we analyzed the fecal metabolome and
microbiome of 786 individuals from TwinsUK cohort[24]. 16S rRNA was extracted from
fecal samples, PCR amplified, barcoded per sample and sequenced with the Illumina
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
MiSeq platform, as previously described[25]. A total of 1,116 different metabolites were
measured using UPLC-MS/MS platforms. First, the Procrustes Analysis revealed
moderate similarities between metabolome and microbiome with statistical significance
(Figure 2A, r=0.31, p<0.05). Since most of individuals at the metabolic level surrounded
those at the microbiome level, the variations of metabolic profiles among individuals
were larger in metabolome. The O2PLS analysis indicated microbial and metabolic data
matrixes were highly correlated with correlation coefficient 0.76 (p<0.05, Figure 2B),
and those variables that contributed to their similarities could be identified according to
their weights. Then, Spearman correlation analysis was performed to explore specific
microbe-metabolite correlations. The circos plot showed seven different classes of
metabolites (i.e., nucleotides, carbohydrates, peptides, cofactors and vitamins, amino
acids, xenobiotics, and lipids) were correlated with microbes mainly belonging to
Firmicutes, Actinobacteria, Bacteroidetes, and Protebacteria (Figure 2C). Among which,
four microbes at genus level belonging to the Firmicutes had the most significant
correlations with metabolites, and their specific correlations were further visualized in a
network (Figure 2D).
According to BMI, participants were divided into obese (OB, BMI >28) and healthy
control group (HC, BMI <24). The multivariate OPLS-DA model based on the integrated
data matrix of metabolome and microbiome indicated the obvious difference between OB
and HC group (Figure 3A). Differential metabolites and microbes contributing to their
separations were identified using both VIP values (VIP>1) and correlation coefficients
(p<0.05), as indicated by the V-plot (Figure 3B). Alternative multivariate models e.g., RF
model, could be further used to validate the relative contributions of microbes and
metabolites to the discrimination between two groups (Figure 3C). For example, arginine
was identified as a significant marker from both OPLS-DA and RF model. Then, we
performed functional network analysis to further interpret the biological significance of
their correlations. A total of 39 and 16 significant biological functions (KOs) were
identified in microbiome and metabolome between OB and HC group, respectively (p <
0.05, Figure 4A,B). Two important pathways (i.e., the benzoate degradation and
phosphotransferase system) were overlapped (Figure 4C), among which, the benzoate
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
degradation was the most significant metabolic pathway in both microbial and metabolic
function analysis. Then, each differential metabolite and relevant microbes that may
participate in its metabolic reactions were further identified through searching the KEGG
database. For example, 3,4-dihydroxybezoate (also called protocatechuic acid) in the
benzoate degradation was a major metabolite of antioxidant polyphenols formed by gut
microbiota, and might have beneficial effects against inflammation and insulin resistance
in obesity[26]. Our results indicated that 3,4-dihydroxybezoate was significantly reduced
in obesity, and eight different gut microbes might participate in its metabolism (Figure
4D). Similarly, a total of 28 different microbes identified in this study were involved in
the glycerol metabolism of phosphotransferase system (Figure 4E).
Discussion
We have implemented M2IA pipeline as a user-friendly web server that can provide
microbiome and metabolome data integration analysis to understand the important roles
of microbial metabolism in diverse disease contexts. This is a bioinformatics workflow
that integrates a wide range of univariate and multivariate methods, including PA/CIA-
based data matrix similarity analysis, univariate-based correlation analysis, multivariate
regression-based analysis, and knowledge-based function analysis. The advantage of
applying these methods simultaneously is to understand the inherent characteristics of the
high-dimensional omics data in different ways, and obtain reliable results from multiple
analyses. M2IA also implements KEGG database in order to link orthologous gene
groups to metabolic reactions and annotated compounds. So that researchers will have
better understanding of their origins of metabolites, from host or microbial, and what
microbes participate in specific metabolic reactions. To make it more convenient, M2IA
automatically produces a comprehensive report that summarizes and interprets the key
results.
Users can apply M2IA to their own microbiome and metabolome data. In terms of
analytical platforms, M2IA accepts the microbiome data from 16S rRNA gene sequencing
or shotgun metagenomic sequencing technologies, and metabolomics data from mass
spectrometry or NMR analytical platforms. For sample types, many human studies on gut
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
microbiota collect fecal samples, due to its non-invasive characteristics. The fecal
metabolome provides a functional readout of the gut microbiome, and the integrative
analysis can provide better understanding of correlations between gut microbiome and
metabolome. We recommend applying the same fecal samples of human subjects or
fecal/colon contents of animals for analysis. However, samples types can be from any site,
i.e., saliva, buccal mucosa, and colon tissue samples. Simultaneously, metabolomics
studies can analyze feces, plasma/serum, urine, saliva, exhaled breaths, cerebrospinal
fluid, and tissues of target organs. Thus, sample types may differ, but they need to require
originating from a same subject for both microbiome and metabolome analysis.
Finally, the limitations and future directions of this study disserve to be mentioned. M2IA
accepts annotated microbioa and metabolites as input data matrix. However, the raw data
preprocessing steps are not included in this pipeline, e.g., quality control and microbial
annotation for microbiome data, and peak detection and metabolite annotation for
metabolomics data. Currently, many sophisticated software tools or pipelines have been
developed for original data preprocessing, such as Qimme2[27, 28] for 16s RNA
sequencing, and XCMS for MS-based metabolic profiling[29]. M2IA focuses on providing
a standardized pipeline and a comprehensive workflow for the integrative analysis of
metabolome and microbiome data, not only exploring the statistically significant
correlations but also investigating the biological significance of their interaction network.
Moreover, KEGG database based functional network analysis can help to explain their
biological correlations and develop mechanistic hypothesis potentially applicable to the
development of nutritional and personalized therapies. In the future, more advanced
integration methods will be introduced in M2IA and more independent knowledge
databases (e.g., enzyme database and drug metabolism database) will be integrated within
M2IA for deeper biological interpretation.
Conclusions
In summary, M2IA is an effective and efficient computational tool for experimental
biologists to comprehensively analyze and interpret the important interactions between
microbiome and metabolome in the big data era.
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
Authors’ contributions
YN contributed to develop the methodology, algorithms, and write the manuscript. YG
contributed to the software development and implementation. YD provided valuable
testing datasets during method development of our pipeline. PW and CS contributed to
TwinsUK data preparation and processing for pipeline validation. HC and FJ provided
valuable suggestions on software development and manuscript modification. FF
supported this project and contributed to the review and modification of the manuscript.
Availability of data and materials
Data used in this study can be accessed at http://m2ia.met-bioinformatics.cn.
Funding
This work was fundedby National Natural Science Foundation of China (81900510), the
National Key Research and Development Program of China (No. 2016YFC1305301),
and the startup funding from the Children’s Hospital, Zhejiang University School of
Medicine.
Acknowledgements
Not applicable
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Reference
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
1. Cho I, BlaserMJ. APPLICATIONSOFNEXT-GENERATION SEQUENCINGThehuman microbiome: at the interface of health and disease. Nat Rev Genet2012,13(4):260-270.
2. ShafferM,ArmstrongAJS,PhelanVV,ReisdorphN,LozuponeCA.Microbiomeandmetabolome data integration provides insight into health and disease.TranslRes2017,189:51-64.
3. Wang WL, Xu SY, Ren ZG, Tao L, Jiang JW, Zheng SS. Application ofmetagenomics in the human gut microbiome. World J Gastroentero 2015,21(3):803-814.
4. Wahlstrom A, Sayin SI, Marschall HU, Backhed F. Intestinal CrosstalkbetweenBileAcidsandMicrobiotaandIts ImpactonHostMetabolism.CellMetab2016,24(1):41-50.
5. FranzosaEA,Sirota-MadiA,Avila-PachecoJ,FornelosN,HaiserH,ReinkerS,Vatanen T, Hall AB, Mallick H, Mclver LJ, Sauk JS,Wilson RG, Stevens BW,ScottJM,PierceK,DeikAA,BullockK,ImhannF,PorterJA,ZhernakovaA,FuJY,WeersmaRK,WijmengaC,ClishCB,VlamakisH,HuttenhowerC,XavierRJ.Gut microbiome structure and metabolic activity in inflammatory boweldisease.NatMicrobiol2019,4(2):293-305.
6. Louis P, Hold GL, Flint HJ. The gut microbiota, bacterial metabolites andcolorectalcancer.NatRevMicrobiol2014,12(10):661-672.
7. GuYY,WangXK,LiJH,ZhangYF,ZhongHZ,LiuRX,ZhangDY,FengQ,XieXY,HongJ,RenHH,LiuW,MaJ,SuQ,ZhangHM,YangJL,WangXL,ZhaoXJ,GuWQ,BiYF,PengYD,XuXQ,XiaHH,LiF,XuX,YangHM,XuGW,MadsenL,KristiansenK,NingG,WangWQ.Analysesofgutmicrobiotaandplasmabileacids enable stratification of patients for antidiabetic treatment. NatCommun2017,8.
8. CaussyC,HsuC,LoMT,LiuA,BettencourtR,AjmeraVH,BassirianS,HookerJ, Sy E, Richards L, Schork N, Schnabl B, Brenner DA, Sirlin CB, Chen CH,Loomba R, Consortium GNT. Link between gut-microbiome derivedmetabolite and shared gene-effects with hepatic steatosis and fibrosis inNAFLD.Hepatology2018,68(3):918-932.
9. Vernocchi P, Del Chierico F, Putignani L. Gut Microbiota Profiling:Metabolomics Based Approach to Unravel Compounds Affecting HumanHealth.FrontMicrobiol2016,7:1144.
10. DhariwalA,ChongJ,HabibS,KingIL,AgellonLB,XiaJ.MicrobiomeAnalyst:aweb-based tool for comprehensive statistical, visual and meta-analysis ofmicrobiomedata.NucleicAcidsRes2017,45(W1):W180-W188.
11. Pedersen HK, Forslund SK, Gudmundsdottir V, Petersen AO, Hildebrand F,HyotylainenT,NielsenT,HansenT,BorkP,EhrlichSD,BrunakS,OresicM,Pedersen O, Nielsen HB. A computational framework to integrate high-throughput '-omics' datasets for the identification of potential mechanisticlinks.NatProtoc2018,13(12):2781-2800.
12. Chong J, Xia JG. Computational Approaches for Integrative Analysis of theMetabolomeandMicrobiome.Metabolites2017,7(4).
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
13. WeiR,Wang J, SuM, JiaE, ChenS, ChenT,NiY.MissingValue ImputationApproach for Mass Spectrometry-based Metabolomics Data. Scientificreports2018,8(1):663.
14. TsilimigrasMC, Fodor AA. Compositional data analysis of themicrobiome:fundamentals,tools,andchallenges.AnnEpidemiol2016,26(5):330-335.
15. McHardy IH, Goudarzi M, Tong M, Ruegger PM, Schwager E, Weger JR,Graeber TG, Sonnenburg JL, Horvath S, Huttenhower C, McGovern DP,FornaceAJ, Jr.,Borneman J,Braun J. Integrativeanalysisof themicrobiomeandmetabolomeof the human intestinalmucosal surface reveals exquisiteinter-relationships.Microbiome2013,1(1):17.
16. YouYJ,LiangDD,WeiRM,LiMC,LiYT,WangJY,WangXY,ZhengXJ, JiaW,Chen TL. Evaluation of metabolite-microbe correlation detection methods.AnalBiochem2019,567:106-111.
17. Smolinska A, Tedjo DI, Blanchet L, Bodelier A, Pierik MJ, Masclee AAM,Dallinga J, Savelkoul PHM, Jonkers D, Penders J, van Schooten FJ. VolatilemetabolitesinbreathstronglycorrelatewithgutmicrobiomeinCDpatients.AnalChimActa2018,1025:1-11.
18. KosticAD,GeversD,SiljanderH,VatanenT,HyotylainenT,HamalainenAM,PeetA,TillmannV,PohoP,MattilaI,LahdesmakiH,FranzosaEA,VaaralaO,de Goffau M, Harmsen H, Ilonen J, Virtanen SM, Clish CB, Oresic M,Huttenhower C, KnipM, Group DS, Xavier RJ. The dynamics of the humaninfant gut microbiome in development and in progression toward type 1diabetes.CellHostMicrobe2015,17(2):260-273.
19. BylesjoM,ErikssonD,KusanoM,MoritzT,TryggJ.Dataintegrationinplantbiology: the O2PLS method for combined modeling of transcript andmetabolitedata.PlantJ2007,52(6):1181-1191.
20. Tax4Fun2.https://sourceforgenet/projects/tax4fun2/.21. Asshauer KP, Wemheuer B, Daniel R, Meinicke P. Tax4Fun: predicting
functional profiles frommetagenomic16S rRNAdata.Bioinformatics 2015,31(17):2882-2884.
22. AbubuckerS,SegataN,Goll J,SchubertAM, Izard J,CantarelBL,Rodriguez-MuellerB,ZuckerJ,ThiagarajanM,HenrissatB,WhiteO,KelleyST,MetheB,Schloss PD, GeversD,MitrevaM,Huttenhower C.Metabolic reconstructionfor metagenomic data and its application to the humanmicrobiome. PLoSComputBiol2012,8(6):e1002358.
23. KEGGdatabase.https://www.kegg.jp.24. ZiererJ,JacksonMA,KastenmüllerG,ManginoM,LongT,TelentiA,Mohney
RP, Small KS, Bell JT, Steves CJ, ValdesAM, Spector TD,Menni C. The fecalmetabolomeasafunctionalreadoutofthegutmicrobiome.2018,-50(-6):-795.
25. Goodrich JK, Davenport ER, Beaumont M, Jackson MA, Knight R, Ober C,Spector TD, Bell JT, Clark AG, Ley RE. Genetic Determinants of the GutMicrobiomeinUKTwins.CellHostMicrobe2016,19(5):731-743.
26. OrmazabalP,ScazzocchioB,VariR,SantangeloC,D'ArchivioM,SilecchiaG,IacovelliA,GiovanniniC,MasellaR.Effectofprotocatechuicacidon insulinresponsiveness and inflammation in visceral adipose tissue from obese
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
individuals:possiblerole forPTP1B. Int JObes(Lond)2018,42(12):2012-2021.
27. Qimme2.https://qiime2.org.28. CaporasoJG,KuczynskiJ,StombaughJ,BittingerK,BushmanFD,CostelloEK,
FiererN,PenaAG,Goodrich JK,GordonJI,HuttleyGA,KelleyST,KnightsD,KoenigJE,LeyRE,LozuponeCA,McDonaldD,MueggeBD,PirrungM,ReederJ, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T,Zaneveld J, KnightR.QIIME allows analysis of high-throughput communitysequencingdata.NatMethods2010,7(5):335-336.
29. ForsbergEM,HuanT,RinehartD,BentonHP,WarthB,HilmersB,SiuzdakG.Data processing, multi-omic pathway mapping, and metabolite activityanalysisusingXCMSOnline.NatProtoc2018,13(4):633-651.
Legends
Figure 1. The flowchart of M2IA.
Figure 2. Illustration of similarity analysis and spearman correlation analysis results. (A)
PA. The length of lines connecting two points indicates the agreement of samples
between two datasets. (B) O2PLS model. (C) Circos plot of spearman correlations
between microbes and metabolites. (D) Network of spearman correlations between
differential metabolites and microbes belonging to the Firmicutes.
Figure 3. (A) Score plot of an OPLS-DA model between OB and HC group. (B) Variable
importance plot (V-plot) of an OPLS-DA model between OB and HC group. (C) Feature
selection of Random Forest modeling.
Figure 4. (A-B) Bar plot of differential KO functions from metabolome and microbiome
data. (C) Venn diagram of differential KOs. (D-E) Interaction network of each metabolite
(3,4-dihydroxybenzoate and glycerol) with microbes that may participate its metabolism.
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
Figure 1
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
Figure 2
-20
-10
0
10
20
-5.0 -2.5 0.0 2.5Genus Joint PC1
Met
abol
ites J
oint
PC1
L
H
R= 0.76 ,P= 0e+00
-0.05
0.00
0.05
0.10
-0.10 -0.05 0.00 0.05 0.10PC1
PC2 Metabolites
Genus
R= 0.31 ,P= 1e-04A
B
C
D
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
Figure 3
-20
-10
0
10
20
-5 0 5P1(1.51%)
O1(9.11%)
L H
P=0.
1
P=0.
05
P=0.
1
P=0.
05
+
+
+
+
+
++++
+
++
++
+
++
++
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
++
++
+
+
+
+
+
++
+++
+
+
+
++
+
++
+
+
+++
+++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
++
+
+
+
+
+
++
+
+
+
++
+
+
+
++
+ ++
+
+
+
+++
+
+
+
+
+
++
+
+++
+
++
++
+
++
+
++
+ +
+
+
+
+
+
+
++
+
++
+
+++
+
++
+
+
+
+
++
+
+
++++
+
+
+
++
+++
+
+
++
+
+
+
+
++
+
+++
+
++
++
+++
+
+++
++
+
++
+
+
+
+++
+
+
+
++
+
++
++ +
++
+
+
+
+
+
++
+
+
+
++
++
+
+
++
+
+
+++
+ +
+ +
+++
++
++
+
+
+
+
++
++
+
+++
+
+
+
+
+
++
++
+
++
+
+
+
++
++
+ ++
+
+
+
+
+
++
++
+
+
+
+
+ ++
+ ++++
+
+
+
+ +
+
+
+
+++
+++
+
+
++
+
+
+++
++
+
+
++
+++
+ +
+
+
+
+
+
++
++
+
+ ++
+
+
++
+
+
+
+
+
+
+
++
+
+
++
++++
+
+ ++
++
+
+
++
+
+ ++
+
++
+
+++
+
+
+
+
+
+
++
+
+
++
++
++
+
++
+
+++
+
+++
++
+
++
+
+
+
+
++
+++
++
+
+
+
+ +
+
+
+
+
+
+
+
++
+
++
+
++
+++
+
++
++
+++
+
++
+
+
++
++
++
++
+
++
+
+
+
+
+
+
+ +++ ++
+
++
+
+
+
+
+
++
+
+
+
+
+
++
+
++
+
+
+++
+
+
+
+
++
+
+
+
++
+
+
++
+
+
++
++++
++
++
++
++
+
+
+
+
+
++
++
+
+
+
+ ++
++
++ +
++
+
++
++
+++
++++
++
+
+
++
+
++
+
++
+
+++
+++
+
+
+
++
+
+
++
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+++
+
++
+++
+
+ +
+
++
+
+
+
+
+
+
+
++
++
+++
+
++
+
+
+
+
+++
+ ++
++
+
+
+
++
+
+
+
++
+
++
++
+
+
+
+
++
+++
+
+
+
+
++
+
+
+
+
+ +++
+
+
+
++
+
+++
+
++
+
+ +
+
++
+
++
+
+
+
++
++
+
+
++++
+
+
+
+
+
+
+
+
+++
+
+
+
+
+ ++
+ +
+
+
+
+
+
++
+
+
+
+
+
+
++
+
+
++
+
+++
+++
+
++
+++
+
++
++
++
+
+
+
+
++
+
+
+
+
+
+
+
+
++
+
+
+
+
+ ++
+
+
+
++
+
+
+++
+++
+++
+
+
+
++++++
+
+++
++
+
+
+
+
+
+++
+
++
+
+
++
+++ ++
+
++
+
+ +
++
+
+++
+
++
++
+
+
+
++
+
+
+
+
++
+
+
++
++
++
+
+
+
+
+
+
+
++
++
++
+
+
++
++
++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+ +
+
Acidaminococcus
Actinomyces
Adlercreutzia
Akkermansia
Anaerofustis
Anaerostipes AnaerotruncusAtopobiumBacteroides
Bifidobacterium
BilophilaBlautia
Bulleidia
Butyricimonas
Campylobacter
Catenibacterium
Christensenella
Clostridium
Collinsella
Coprobacillus
Coprococcus
Dehalobacterium
Desulfovibrio
Dialister
Dorea
EggerthellaEnterococcus
Faecalibacterium
Fusobacterium
Haemophilus
Holdemania
Lachnobacterium
Lachnospira
Lactobacillus
Lactococcus
Megasphaera
MogibacteriumOdoribacter
Oscillospira
Others
Oxalobacter
Parabacteroides
Paraprevotella
Parvimonas
Phascolarctobacterium
Porphyromonas
Prevotella
RalstoniaRoseburia
Ruminococcus
Serratia
Slackia
Streptococcus
Sutterella
Turicibacter
Veillonella
WAL_1855D
[Eubacterium]
[Prevotella]
[Ruminococcus]
cc_115rc4-4
(R)-salsolinol
1-methylguanidine 1-methylhistamine
1-methylhistidine
1-methylimidazoleacetate
2,3-dimethylsuccinate
2-(4-hydroxyphenyl)propionate
2-aminoadipate
2-aminobutyrate
2-aminophenol
2-hydroxy-3-methylvalerate
2-hydroxybutyrate/2-hydroxyisobutyrate
2-methylserine
3-(3-hydroxyphenyl)propionate
3-(4-hydroxyphenyl)lactate (HPLA)
3-(4-hydroxyphenyl)propionate
3-hydroxyphenylacetate
3-methyl-2-oxobutyrate
3-methyl-2-oxovalerate
3-methylglutaconate
3-methylhistidine
3-phenylpropionate (hydrocinnamate)
3-sulfo-L-alanine
4-acetamidobutanoate
4-guanidinobutanoate
4-hydroxycinnamate
4-hydroxyglutamate
4-hydroxyphenylacetate
4-hydroxyphenylpyruvate
4-imidazoleacetate
4-methyl-2-oxopentanoate
4-methylthio-2-oxobutanoate
5-aminovalerate
5-hydroxyindoleacetate
5-hydroxylysine
5-oxoproline
6-oxopiperidine-2-carboxylic acid
acisoga
agmatine
alanine
allo-threonine
alpha-hydroxyisocaproate
alpha-hydroxyisovalerate
anthranilate
arginine
argininosuccinate
asparagine
aspartatebeta-hydroxyisovalerate
betaine
cadaverine
carboxyethyl-GABA
cis-4-hydroxycyclohexylacetic acid
cis-urocanate
citrulline
creatine
creatinine
cystathionine
cysteinecysteine s-sulfate
cysteine sulfinic acid
cysteinylglycine
cystine
dihydoxyphenylalanine (L-DOPA)
dihydrocaffeate
dimethylarginine (ADMA + SDMA)
dimethylglycine
ethylmalonate
formiminoglutamate
gamma-aminobutyrate (GABA)
gentisate
glutamate
glutamate, gamma-methyl ester
glutamine
glutarate (pentanedioate)
glycine
guanidinoacetate
guanidinosuccinate
histamine
histidine
homocitrulline
hydantoin-5-propionic acid
imidazole lactateimidazole propionate
indole
indole-3-carboxylic acid
indoleacetate
indolelactate
indolepropionate
isobutyrylglycine (C4)isoleucine
isovalerate (C5)
isovalerylglycine
kynurenate
leucine
lysine
methionine
methionine sulfone
methionine sulfoxide
methylsuccinate
N-acetyl-1-methylhistidine*
N-acetyl-3-methylhistidine*N-acetyl-aspartyl-glutamate (NAAG)
N-acetyl-cadaverine
N-acetylalanine
N-acetylarginine
N-acetylasparagine
N-acetylaspartate (NAA)
N-acetylcitrulline
N-acetylcysteineN-acetylglutamate
N-acetylglutamine
N-acetylglycine
N-acetylhistamineN-acetylhistidine
N-acetylisoleucine
N-acetylkynurenine (2)
N-acetylleucine
N-acetylmethionine
N-acetylmethionine sulfoxide
N-acetylphenylalanine
N-acetylproline
N-acetylputrescine
N-acetylserine
N-acetylserotonin
N-acetyltaurine
N-acetylthreonine
N-acetyltryptophan
N-acetyltyrosineN-acetylvaline
N-alpha-acetylornithine
N-carbamoylalanine
N-carbamoylsarcosine
N-delta-acetylornithine
N-formylmethionineN-formylphenylalanine
N-methylalanine
N-methylglutamate
N-methylhydantoin
N-methylphenylalanine
N-methylproline
N-methyltaurine
N-methyltryptophan
N-monomethylarginine
N-propionylalanine
N1,N12-diacetylspermine
N2,N5-diacetylornithine
N2,N6-diacetyllysine
N2-acetyllysine
N6,N6,N6-trimethyllysine
N6-acetyllysine
N6-carboxyethyllysine
N6-formyllysine
O-acetylhomoserine
o-Tyrosineornithine
p-cresolp-cresol sulfate
phenethylamine
phenylacetate
phenylacetylglutamate
phenylacetylglutamine
phenylalanine
phenyllactate (PLA)
phenylpropionylglycine
phenylpyruvate
picolinate
pipecolate
proline
putrescine
pyroglutamine*
S-1-pyrroline-5-carboxylate
S-methylcysteine
S-methylmethioninesaccharopine
sarcosine
serine
serotoninskatol
spermidine
taurine
thioproline
threoninehydroxyproline
trans-urocanate
tryptamine
tryptophan tyramine
tyrosine
valerylphenylalanine
valine
xanthurenate
2-deoxyribose
arabinose
arabitol/xylitol
arabonate/xylonate
erythronate*
fructose
fucose
galactonateglucose
glucuronate
glycerate
isomaltose
lactatemaltose
maltotriose
mannitol/sorbitol
mannose
N-acetyl-beta-glucosaminylamine
N-acetylglucosamine/N-acetylgalactosamine
N-acetylglucosaminylasparagine
N-acetylmuramateN-acetylneuraminate
N6-carboxymethyllysine
pyruvate
ribitol
ribonate (ribonolactone)ribose
ribulose/xylulose
sedoheptulose
sucrose
xylose
1-methylnicotinamide
5-(2-Hydroxyethyl)-4-methylthiazole
6-hydroxynicotinate
alpha-tocopherolalpha-tocopherol acetate
alpha-tocotrienol
bilirubin
biliverdin
biocytin
biotin
D-urobilin
delta-tocopherol
FAD
FMN
gamma-CEHC
gamma-tocopherol/beta-tocopherol
gamma-tocotrienol
I-urobilinogen
L-urobilin
maleamate
nicotinamide
nicotinamide riboside
nicotinate
nicotinate ribonucleosidepantothenate (Vitamin B5)
pterin
pyridoxal
pyridoxamine
pyridoxate
pyridoxine (Vitamin B6)
quinolinate
retinol (Vitamin A)
riboflavin (Vitamin B2)
thiamin (Vitamin B1)
threonate
trigonelline (N'-methylnicotinate)
2-methylcitrate/homocitratealpha-ketoglutarate
citrate
fumaratemalate
phosphate
succinate
succinylcarnitine (C4)
tricarballylate
1,11-undecanedicarboxylate
1,2-dilinolenoyl-digalactosylglycerol (18:3/18:3)
1,2-dilinolenoyl-galactosylglycerol (18:3/18:3)*
1,2-dilinoleoyl-digalactosylglycerol (18:2/18:2)*1,2-dilinoleoyl-galactosylglycerol (18:2/18:2)*
1,2-dilinoleoyl-GPC (18:2/18:2)
1,2-dipalmitoyl-GPC (16:0/16:0)
1,2-dipalmitoyl-GPG (16:0/16:0)
1-(1-enyl-oleoyl)-GPE (P-18:1)*
1-(1-enyl-palmitoyl)-GPE (P-16:0)*
1-(1-enyl-stearoyl)-GPE (P-18:0)*
1-linolenoylglycerol (18:3)
1-linoleoyl-2-linolenoyl-digalactosylglycerol (18;2/18:3)*1-linoleoyl-2-linolenoyl-galactosylglycerol (18:2/18:3)*
1-linoleoyl-2-linolenoyl-GPC (18:2/18:3)* 1-linoleoyl-GPC (18:2)
1-linoleoyl-GPE (18:2)*
1-linoleoylglycerol (18:2)
1-margaroylglycerol (17:0)
1-myristoylglycerol (14:0)
1-oleoyl-2-linoleoyl-glycerol (18:1/18:2)
1-oleoyl-2-linoleoyl-GPC (18:1/18:2)*
1-oleoyl-3-linoleoyl-glycerol (18:1/18:2)
1-oleoyl-GPC (18:1)
1-oleoyl-GPE (18:1)
1-oleoyl-GPG (18:1)*
1-oleoylglycerol (18:1)
1-palmitoyl-2-arachidonoyl-GPC (16:0/20:4)
1-palmitoyl-2-linolenoyl-digalactosylglycerol (16:0/18:3)
1-palmitoyl-2-linoleoyl-digalactosylglycerol (16:0/18:2)*
1-palmitoyl-2-linoleoyl-galactosylglycerol (16:0/18:2)*
1-palmitoyl-2-linoleoyl-glycerol (16:0/18:2)*
1-palmitoyl-2-linoleoyl-GPC (16:0/18:2)
1-palmitoyl-2-linoleoyl-GPE (16:0/18:2)
1-palmitoyl-2-linoleoyl-GPI (16:0/18:2)
1-palmitoyl-2-oleoyl-GPC (16:0/18:1)
1-palmitoyl-2-oleoyl-GPE (16:0/18:1)
1-palmitoyl-3-linoleoyl-glycerol (16:0/18:2)*
1-palmitoyl-GPC (16:0)
1-palmitoyl-GPE (16:0)
1-palmitoyl-GPG (16:0)*
1-palmitoyl-GPI* (16:0)*
1-palmitoylglycerol (16:0)
1-pentadecanoylglycerol (15:0)
1-stearoyl-GPC (18:0)
1-stearoyl-GPE (18:0)
1-stearoyl-GPG (18:0)1-stearoyl-GPI (18:0)
1-stearoyl-GPS (18:0)*10-heptadecenoate (17:1n7)
10-hydroxystearate
10-nonadecenoate (19:1n9)
11-ketoetiocholanolone sulfate
12,13-DiHOME
12-dehydrocholate12-hydroxystearate
13-HODE + 9-HODE
13-HOTrE
13-methylmyristate
methylpalmitate (15 or 2)
17-methylstearate
2-aminoheptanoate
2-hydroxydecanoate
2-hydroxyglutarate
2-hydroxymyristate
2-hydroxypalmitate
2-hydroxystearate
2-linoleoylglycerol (18:2)
2-methylmalonyl carnitine
2-oleoylglycerol (18:1)3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF)
3-carboxyadipate
3-dehydrocholate
3-hydroxy-3-methylglutarate
3-hydroxybutyrate (BHBA)
3-hydroxyhexanoate
3-hydroxylaurate
3-hydroxymyristate
3-hydroxypropanoate
3-hydroxysebacate
3-hydroxystearate
3-ketosphinganine
3-methyladipate
3-methylglutarate/2-methylglutarate
3b-hydroxy-5-cholenoic acid
4-androsten-3alpha,17alpha-diol monosulfate (2)4-androsten-3beta,17beta-diol disulfate (1)
4-androsten-3beta,17beta-diol disulfate (2)
4-androsten-3beta,17beta-diol monosulfate (1)
4-androsten-3beta,17beta-diol monosulfate (2)
4-cholesten-3-one
5-dodecenoate (12:1n7)
5-hydroxyhexanoate5alpha-androstan-3beta,17alpha-diol monosulfate (1)
5alpha-androstan-3beta,17beta-diol monosulfate (1)
5alpha-pregnan-3beta,20alpha-diol monosulfate (1)
6-oxolithocholate7-ketodeoxycholate
7-ketolithocholate
9,10-DiHOME
acetylcarnitine (C2)
adipateadrenate (22:4n6)
arachidate (20:0)
arachidonate (20:4n6)
azelate (nonanedioate; C9)
beta-sitosterol
butyrylglycine (C4)
campesterol
caprate (10:0)
caproate (6:0)
caprylate (8:0)
carnitine
chenodeoxycholate
cholate
cholesterol
choline
phosphocholine
coprostanol
cortolone
dehydroisoandrosterone sulfate (DHEA-S)
dehydrolithocholate
deoxycarnitine
deoxycholate
deoxycholic acid sulfate
dihomolinoleate (20:2n6)
dihomolinolenate (20:3n3 or 3n6)
dimethylmalonic acid
docosadienoate (22:2n6)
docosahexaenoate (DHA; 22:6n3)
docosapentaenoate (DPA; 22:5n3)
dodecanedioate (C12)
eicosapentaenoate (EPA; 20:5n3)eicosenoate (20:1n9 or 1n11)
epiandrosterone sulfate
ergosterolerucate (22:1n9)
glycerol
glycerol 3-phosphate
glycerophosphoethanolamine
glycerophosphoglycerol
glycerophosphoinositol*
glycerophosphorylcholine (GPC)glycochenodeoxycholate
glycocholate
glycocholenate sulfate*
glycodeoxycholate
glycodeoxycholate sulfate
glycolithocholate
glycolithocholate sulfate*
glycosyl-N-palmitoyl-sphingosineglycosyl-N-stearoyl-sphingosine
glycoursodeoxycholate
heptanoate (7:0)
hexadecanedioate (C16)hexanoylglycine (C6)
hyocholate
isocaproate
lactosyl-N-palmitoyl-sphingosine
lanosterol
laurate (12:0)
linoleamide (18:2n6)
linoleate (18:2n6)
linolenate (18:3n3 or 3n6)
linoleoyl ethanolamide
lithocholate
maleate
malonate margarate (17:0)methylmalonate (MMA)
mevalonate mevalonolactone
myo-inositol
myristate (14:0)
myristoleate (14:1n5)
N-acetylsphingosine
N-linoleoylglycine
N-palmitoyl-sphinganine (d18:0/16:0)
N-palmitoyl-sphingosine (d18:1/16:0)
N-palmitoylglycine
nonadecanoate (19:0)
octadecanedioate (C18)
oleate/vaccenate (18:1)
oleoyl ethanolamide
oleoylcarnitine (C18)
palmitate (16:0)
palmitoleate (16:1n7)
palmitoyl dihydrosphingomyelin (d18:0/16:0)*
palmitoyl ethanolamidepalmitoyl sphingomyelin (d18:1/16:0)
palmitoylcarnitine (C16)
pentadecanoate (15:0)
phytosphingosine
pimelate (heptanedioate)
pregn steroid monosulfate*
pregnen-diol disulfate*
pristanatepropionylglycine (C3)
sebacate (decanedioate)
sphinganine
sphingosine
stearate (18:0)
stearidonate (18:4n3)
stearoyl ethanolamide
stearoylcarnitine (C18)
stigmasterol
suberate (octanedioate)
taurochenodeoxycholate
taurocholate
taurocholenate sulfate
taurodeoxycholate
taurolithocholate 3-sulfate
tetradecanedioate (C14)
trimethylamine N-oxide
undecanedioate
ursodeoxycholateursodeoxycholate sulfate (1)
valerate (5:0)
1-methyladenine
2'-deoxyadenosine2'-deoxycytidine
dCMP
2'-deoxyguanosine
2'-deoxyinosine
2'-deoxyuridine
3-aminoisobutyrate
3-ureidopropionate
4-ureidobutyrate5,6-dihydrothymine
5-methyluridine (ribothymidine)
7-methylguanine
8-hydroxyguanine
adenine
adenosine
AMPallantoin
beta-alanine
cytidine
cytidine 2',3'-cyclic monophosphate
3'-CMP
CMP
cytosinedihydroorotate
guanine
guanosine
guanosine-2',3'-cyclic monophosphate
hypoxanthineinosine
methylphosphate
N-acetyl-beta-alanineN-carbamoylaspartate
1-methyladenosine
N6,N6-dimethyladenosine
N6-carbamoylthreonyladenosine
N6-dimethylallyladenine
orotate
pseudouridinethymidinethymine
uracil
urateuridine
xanthine
xanthosine
alanyl-glutamyl-meso-diaminopimelate
alanylleucine
anserine
carnosine
gamma-glutamyl-epsilon-lysine
gamma-glutamylalanine
gamma-glutamylglutamate
gamma-glutamylglycine
gamma-glutamylhistidine
gamma-glutamylisoleucine*
gamma-glutamylleucine
gamma-glutamylmethionine
gamma-glutamylphenylalaninegamma-glutamyltyrosinegamma-glutamylvaline
glutaminylleucine
glycylisoleucine
glycylleucine
glycylvaline
histidylalanine
isoleucylglycine
leucylalanine
leucylglutamine*
leucylglycine
lysylleucine
phenylalanylalanine
phenylalanylglycine
prolylglycine
threonylphenylalanine
tryptophylglycine
valylglutamine
valylglycine
valylleucine
X - 02249
X - 09789
X - 10457
X - 11381
X - 11429
X - 11440
X - 11538
X - 11612
X - 11640
X - 12026
X - 12096
X - 12097
X - 12100
X - 12101
X - 12117
X - 12125
X - 12199
X - 12339
X - 12450
X - 12456
X - 12680X - 12681
X - 12688
X - 12879X - 13007
X - 13507
X - 13696
X - 13835
X - 14056
X - 14095
X - 14141
X - 14196
X - 14237
X - 14254
X - 14263
X - 14264
X - 14302
X - 14314
X - 14337
X - 14364
X - 14383X - 14384
X - 14392
X - 14416
X - 14454
X - 14502
X - 14697X - 14838
X - 14900
X - 14904
X - 15136
X - 15150
X - 15245
X - 15461
X - 15497
X - 15608X - 15666
X - 15806
X - 15843
X - 15853
X - 15854
X - 15856
X - 16071
X - 16343
X - 16391
X - 16654
X - 16946
X - 17009
X - 17145
X - 17162
X - 17299
X - 17303
X - 17348
X - 17349X - 17438
X - 17469
X - 17653
X - 17739
X - 17749
X - 17807
X - 17852
X - 17877
X - 17910
X - 17919
X - 17960
X - 17969
X - 17978
X - 17983
X - 18162
X - 18165
X - 18307
X - 18410
X - 18889
X - 19220
X - 19232 X - 19299
X - 19497
X - 19751
X - 19763
X - 19931
X - 19932
X - 20100X - 20172
X - 20185
X - 20197
X - 20747
X - 21283
X - 21327
X - 21353
X - 21365
X - 21410
X - 21442
X - 21448
X - 21470
X - 21666
X - 21729
X - 21788
X - 21796
X - 21959
X - 22030
X - 22062
X - 22142
X - 22145
X - 22146
X - 22835
X - 22836
X - 23115
X - 23159
X - 23160
X - 23173
X - 23193X - 23196
X - 23277
X - 23295
X - 23308
X - 23369
X - 23438
X - 23451
X - 23456
X - 23469
X - 23470X - 23475
X - 23482
X - 23507
X - 23581
X - 23583
X - 23584
X - 23585
X - 23587
X - 23600
X - 23637 X - 23639
X - 23652
X - 23655
X - 23657
X - 23662
X - 23665
X - 23710X - 23728
X - 23729
X - 23732
X - 23733
X - 23734
X - 23736
X - 23737
X - 23738
X - 23747
X - 23748
X - 23749
X - 23752X - 23753
X - 23765
X - 23767
X - 23776
X - 23780
X - 23782X - 23925
X - 23974
X - 24023
X - 24027
X - 24077
X - 24177
X - 24204X - 24207
X - 24209
X - 24211X - 24212
X - 24213
X - 24215
X - 24216
X - 24220
X - 24230
X - 24231
X - 24234
X - 24238
X - 24240
X - 24243
X - 24246
X - 24260
X - 24278
X - 24408
X - 24410
X - 24412
X - 24413
X - 24425
X - 24431
X - 24449
X - 24455
X - 24456
X - 24465
X - 24474
X - 24512
X - 24529
X - 24546
X - 24550
X - 24558
X - 24608
X - 24658
X - 24678
X - 24682
X - 24683
X - 24686
X - 24693
X - 24697
X - 24699X - 24727X - 24729
X - 24736X - 24738
X - 24739
X - 24743
X - 24753
X - 24763
X - 24766
X - 24767X - 24803X - 24804X - 24806
X - 24809
X - 24810
X - 24811X - 24813
X - 24829
X - 24831X - 24832
X - 24853
X - 24854
X - 24855
1,3,7-trimethylurate
1,3-dimethylurate
1,3-propanediol
1,7-dimethylurate
1-(3-aminopropyl)-2-pyrrolidone
1-methyl-beta-carboline-3-carboxylic acid
1-methylurate
1-methylxanthine
1H-quinolin-2-one
2,3-dihydroxyisovalerate
2,4,6-trihydroxybenzoate
2,4-dihydroxyhydrocinnamate
2,8-quinolinediol 2-acetamidobutanoate
2-isopropylmalate 2-keto-3-deoxy-gluconate
2-oxindole-3-acetate
2-oxo-1-pyrrolidinepropionate
2-piperidinone
2-pyrrolidinone
3,4-dihydroxybenzoate
3,5-dihydroxybenzoic acid3,7-dimethylurate
3-(2-hydroxyphenyl)propionate
3-hydroxybenzoate
3-hydroxypyridine
3-methylurate*3-methylxanthine
4-acetamidophenol
4-hydroxybenzoate
5-acetylamino-6-amino-3-methyluracil
5-acetylamino-6-formylamino-3-methyluracil
7-methylurate7-methylxanthine
acesulfame
apigenin
astaxanthin
benzoate
beta-carotene
beta-guanidinopropanoate
bufotenine
caffeate
caffeine
ciliatine (2-aminoethylphosphonate)
cinnamate
daidzein
diaminopimelate
diethanolamine
digalacturonic acid
diglycerol
dihydroferulic acid
dipicolinate
ectoine
enterodiol
enterolactone
epicatechin
erythrose
ferulate
fucitol
galacturonate
gluconateglutamyl-meso-diaminopimelate
harmane
hesperetin
hesperidin
indolin-2-one
levulinate (4-oxovalerate)
luteolinidin
methyl glucopyranoside (alpha + beta)
methyl indole-3-acetate
N-carbamylglutamate
N-methylpipecolate
N-propionylmethionine
naringenin
nicotianamine
O-sulfo-L-tyrosine
p-hydroxybenzaldehyde
paraxanthine
pheophorbide A
phytanate
piperidine
piperine
pyrraline
quinate
saccharin
salicylate
secoisolariciresinol
sitostanol
solanidine
stachydrinesuccinimide
sucralose
sulfate*
syringic acid
tartarate
theanine
theobromine
theophylline
tribuloside tyrosol
vanillate
0
1
2
3
-0.4 -0.2 0.0 0.2 0.4Corr.Coeffs.
Varia
ble
Impo
rtanc
e on
Pro
ject
ion
(VIP
)
A B
C
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint
Figure 4
Benzoate degradation:ko00362
Degradation of aromatic compounds:ko01220
Phosphotransferase system (PTS):ko02060
ABC transporters:ko02010
Streptomycin biosynthesis:ko00521
Various types of N-glycan biosynthesis:ko00513
Glycosphingolipid biosynthesis - ganglio series:ko00604
Oxidative phosphorylation:ko00190
Glycosphingolipid biosynthesis - globo and isoglobo series:ko00603
Carbon metabolism:ko01200
Glycosaminoglycan degradation:ko00531
Lysosome:ko04142
Ribosome:ko03010
Biosynthesis of antibiotics:ko01130
Biosynthesis of secondary metabolites:ko01110
Metabolic pathways:ko01100
0 1 2 3
LDA SCORE(log 10)
L H
Fc epsilon RI signaling pathway:ko04664
Apoptosis:ko04210
Necroptosis:ko04217
Ovarian steroidogenesis:ko04913
Sphingolipid metabolism:ko00600
Caprolactam degradation:ko00930
Steroid hormone biosynthesis:ko00140
Porphyrin and chlorophyll metabolism:ko00860
Indole alkaloid biosynthesis:ko00901
Cholinergic synapse:ko04725
mTOR signaling pathway:ko04150
D-Arginine and D-ornithine metabolism:ko00472
Arginine biosynthesis:ko00220
Aminoacyl-tRNA biosynthesis:ko00970
Thermogenesis:ko04714
Monobactam biosynthesis:ko00261
Toluene degradation:ko00623
Salmonella infection:ko05132
Amoebiasis:ko05146
Chagas disease (American trypanosomiasis):ko05142
Benzoate degradation:ko00362
Tyrosine metabolism:ko00350
Glycerolipid metabolism:ko00561
Biotin metabolism:ko00780
Amyotrophic lateral sclerosis (ALS):ko05014
Clavulanic acid biosynthesis:ko00331
Galactose metabolism:ko00052
Carotenoid biosynthesis:ko00906
Polycyclic aromatic hydrocarbon degradation:ko00624
Aminobenzoate degradation:ko00627
Flavone and flavonol biosynthesis:ko00944
Biosynthesis of secondary metabolites - unclassified:ko00999
Arginine and proline metabolism:ko00330
Phenylpropanoid biosynthesis:ko00940
Phosphotransferase system (PTS):ko02060
Phenylalanine metabolism:ko00360
Ubiquinone and other terpenoid-quinone biosynthesis:ko00130
Starch and sucrose metabolism:ko00500
Biosynthesis of secondary metabolites - other antibiotics:ko00998
0 1 2 3 4LDA SCORE(log 10)
L H
Metabolite 37
Microbiome 14
2
A B C
D E
.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted October 30, 2019. . https://doi.org/10.1101/678813doi: bioRxiv preprint