min zhang, md phd purdue university joint work with yanzhu lin, dabao zhang
TRANSCRIPT
![Page 1: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/1.jpg)
Min Zhang, MD PhDPurdue University
Joint work with Yanzhu Lin, Dabao Zhang
![Page 2: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/2.jpg)
Outline
Data Summary
Methods
Data Analysis Procedure
Preliminary Results
Preprocessing GC GC-MS Data
Methods
![Page 3: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/3.jpg)
CCE Data Summary
Phenotype summary for current available data for CCE project:
Healthy
ColonCancer
RectalCancer
Polyp
NA Total
Lipidomics(Lipid)
22 10 2 12 0 46
GProteomics(GP)
33 9 2 20 1 65
NMR 25 2 1 23 2 53
Teac 54 17 5 41 2 119
Comet 27 12 4 12 0 55
![Page 4: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/4.jpg)
Summary of Overlap Dataset Overlap between any 2 data sets:
Overlap among any 3 data sets
Overlap among any 4 data sets
Lipid GP NMR Teac Comet
Lipid 46 41 0 46 41
GP 41 65 17 63 43
NMR 0 17 53 52 2
Teac 46 63 52 119 55
Comet 41 43 2 55 55
Lipid & GP & Teac 41 GP & NMR & Teac 16
Lipid & GP &Comet 37 GP & Teac & Comet 43
Lipid & Teac & Comet 41 NMR & Teac & Comet
2
Lipid & GP & Teac & Comet 37
![Page 5: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/5.jpg)
Overlap of Different Omics Data
![Page 6: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/6.jpg)
Methods for Integrating Omics
Common methods:
- Principal Component Analysis (Jolliffe, I. 1986),
- Co-Inertia Analysis (Doledec, S. and Chessel, D.,1994)
- Partial Least Squares (Wold, H., 1966)
- Bayesian Analysis method (Webb-Robertson et. al., 2009)
Our methods:
We use iteratively weighted partial least squares method (IWPLS) to fit the model for each individual data set, then we use Bayesian method to integrate the results from individual data set.
![Page 7: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/7.jpg)
Overlap B/W NMR and G-Proteomics
NMR: 53 samples Global Proteomics: 65 samples
Overlap: 17 samples
One sample:without phenotype information
One sample: from blood draw 2
15 samples: all from blood draw 1 with phenotypeas either “Healthy Control” or “Polyp”
![Page 8: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/8.jpg)
Data Analysis Procedure
Metabolomics (NMR)
Data PreprocessingEnding with 1824 Variables
IWPLS method
Global Proteomics
Data PreprocessingEnding with 5407 Variables
IWPLS method
Integrate Results
![Page 9: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/9.jpg)
Analysis ResultsOur method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
0.0
0.2
0.4
0.6
0.8
1.0
TrueGProteomicsNMRIntegrate
Subject
Probability
![Page 10: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/10.jpg)
Analysis Results (cont.)
Summary:
Other Methods Tried:- PLS: ending with 0 components;- Univariate t-test: none variables is significant.
Data Classification Rate
GProteomics 100%
NMR 85.7%
Integrated NMR and GProteomics
100%
![Page 11: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/11.jpg)
Example: Overlap of Three Data SetsFor overlap among three data sets, we focus on the overlap
among Lipidomics,Teac and Comet. Data summary:- Phenotype summary:
- Variable summary:
Data analysis: we group patients of colon cancer and rectal cancer together as cancer group, while keeping the other two groups. The we try the following methods:
Method 1: POCRE Method 2: ANOVA test
Phenotype
Healthy Polyp Colon Rectal Total
Sample size
20 10 9 2 41
Lipidomics Teac Comet
Number of variables
52 1 2
![Page 12: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/12.jpg)
Results
Misclassification rate:
Variables identified:
POCRE ANOVA
17% 39%
POCRE Lipids:
Teac: TEAC_mM
ANOVA Lipids:
Teac: TEAC_mM
SPC LPI, 4:20 1,:LPE18 LPG, 1:18
1:LPE18
![Page 13: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/13.jpg)
Preprocessing GC x GC-MS MethodsHow to choose the reference sample for alignment?- Choose the chromatogram in the middle of the run sequence
or the chromatogram containing the highest number of common chemical constituents (i.e. peaks)
- Choose the chromatogram that is most similar to the loading of the first principal component in a PCA model on the unaligned data, or simply to the mean of all chromatogram.
Similarity index method for choosing reference sample: For a given chromatogram , the similarity index is
defined as:
where
The one with the maximum similarity index will be chosen as
the reference sample.Ref: Skov, T. et al, Automated Alignment of Chromatographic Data, Journal of Chemometrics, Vol. 20, Issue 11-12, page: 484-497, 2007.
|),(|Index Similarity 1 itIi xxr
tx
21
21
1
))(())((
))()()((),(
iiJjtt
Jj
iittJj
itxjxxjx
xjxxjxxxr
![Page 14: Min Zhang, MD PhD Purdue University Joint work with Yanzhu Lin, Dabao Zhang](https://reader035.vdocuments.site/reader035/viewer/2022062217/56649f145503460f94c28235/html5/thumbnails/14.jpg)
Results