yu shyr ( 石 瑜 ), ph.d. may 14, 2008 china medical university yu.shyr@vanderbilt
DESCRIPTION
Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University [email protected]. The Biostatistical & Bioinformatics Challenges in the High Dimensional Data Derived from High Throughput Assays: Today and Tomorrow. Vanderbilt University 泛德堡大學. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/1.jpg)
The Biostatistical & Bioinformatics Challenges in the High The Biostatistical & Bioinformatics Challenges in the High Dimensional Data Derived from High Throughput Assays: Dimensional Data Derived from High Throughput Assays:
Today and TomorrowToday and Tomorrow
Yu Shyr (Yu Shyr ( 石 瑜 ), Ph.D.), Ph.D.
May 14, 2008May 14, 2008
China Medical UniversityChina Medical University
![Page 2: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/2.jpg)
Vanderbilt University Vanderbilt University
泛德堡大學泛德堡大學
![Page 3: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/3.jpg)
US News & World Report (American’s best colleges -2007)
1. 1. Princeton UniversityPrinceton University (NJ) (NJ)
2. 2. Harvard University Harvard University (MA) (MA)
3. 3. Yale University Yale University (CT) (CT)
4. 4. California Institute of Technology California Institute of Technology (CA) (CA)
4. 4. Stanford University Stanford University (CA) (CA)
4. 4. Massachusetts Inst. Of Technology Massachusetts Inst. Of Technology (MA) (MA)
7. 7. University of Pennsylvania University of Pennsylvania (PA)(PA)
8. 8. Duke University Duke University (NC) (NC)
9.9. Dartmouth College Dartmouth College (NH) (NH)
9. 9. Columbia University Columbia University (NY) (NY)
9. 9. University of Chicago University of Chicago (IL) (IL)
12. 12. Cornell University Cornell University (NY) (NY)
12. 12. Washington University in St. Louis Washington University in St. Louis (MO) (MO)
![Page 4: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/4.jpg)
14. Northwestern University (IL)
15. Brown University (RI)
16. Johns Hopkins University (MD)
17. Rice University (TX)
18. Vanderbilt University (TN)
18. Emory University (GA)
20. University of Notre Dame (IN)
21. Carnegie Mellon University (PA)
21. University of California – Berkeley (CA)
23. Georgetown University (DC)
24. University of Virginia (VA)
24. University of Michigan – Ann Arbor (MI)
US News & World Report (American’s best colleges -2007)
![Page 5: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/5.jpg)
Tennessee, the “Volunteer State”Tennessee, the “Volunteer State”
![Page 6: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/6.jpg)
Nashville, TN- “Music City, USA!”Nashville, TN- “Music City, USA!”
![Page 7: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/7.jpg)
Vanderbilt UniversityVanderbilt University
![Page 8: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/8.jpg)
Vanderbilt UniversityVanderbilt University
A private, nonsectarian, coeducational A private, nonsectarian, coeducational
research university in Nashville, TN.research university in Nashville, TN.
Established in 1873 by shipping and Established in 1873 by shipping and
rail magnate Cornelius Vanderbilt.rail magnate Cornelius Vanderbilt.
Enrolls 11,000 students in ten schools Enrolls 11,000 students in ten schools
annually.annually.
Ranks 18Ranks 18thth in the nation among in the nation among
national research universities.national research universities.
Also has several research facilities and Also has several research facilities and
a world-renowned medical center.a world-renowned medical center.
Famous alumni include former vice-Famous alumni include former vice-
president Al Gore.president Al Gore.
![Page 9: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/9.jpg)
Vanderbilt University Medical CenterVanderbilt University Medical Center
![Page 10: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/10.jpg)
VUMCVUMC
Collection of several hospitals and clinics Collection of several hospitals and clinics
associated with Vanderbilt University in Nashville, associated with Vanderbilt University in Nashville,
Tennessee.Tennessee.
In 2003, was placed on the Honor Roll of nation’s In 2003, was placed on the Honor Roll of nation’s
best hospitals.best hospitals.
The medical school was ranked 17The medical school was ranked 17thth in the nation in the nation
among research-oriented medical schools and in the among research-oriented medical schools and in the
ISI top 5 for research impact in clinical medicine and ISI top 5 for research impact in clinical medicine and
pharmacology.pharmacology.
![Page 11: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/11.jpg)
Vanderbilt-Ingram Cancer CenterVanderbilt-Ingram Cancer Center
![Page 12: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/12.jpg)
Only NCI-designated Comprehensive Cancer Only NCI-designated Comprehensive Cancer
Center in Tennessee and one of only 39 in the Center in Tennessee and one of only 39 in the
United StatesUnited States
Nearly 300 investigators in seven research Nearly 300 investigators in seven research
programsprograms
More than $190 million in annual research More than $190 million in annual research
fundingfunding
Among the top 10 in competitively awarded NCI Among the top 10 in competitively awarded NCI
grant supportgrant support
Vanderbilt-Ingram Cancer CenterVanderbilt-Ingram Cancer Center
![Page 13: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/13.jpg)
Ranks 20Ranks 20thth in the nation and consistently ranks in the nation and consistently ranks
among the best places for cancer care by U.S. among the best places for cancer care by U.S.
News and World Report.News and World Report.
One of a select few centers to hold agreements One of a select few centers to hold agreements
with the NCI to conduct Phase I and Phase II with the NCI to conduct Phase I and Phase II
clinical trials, where innovative therapies are clinical trials, where innovative therapies are
first evaluated in patient.first evaluated in patient.
Vanderbilt-Ingram Cancer CenterVanderbilt-Ingram Cancer Center
![Page 14: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/14.jpg)
Department of BiostatisticsDepartment of Biostatistics
Created by the School of Medicine at Vanderbilt Created by the School of Medicine at Vanderbilt
University in September 2003.University in September 2003.
The Dean and other senior medical school faculty The Dean and other senior medical school faculty
are committed to providing outstanding are committed to providing outstanding
collaborative support in biostatistics to clinical collaborative support in biostatistics to clinical
and basic scientists and to develop a graduate and basic scientists and to develop a graduate
program in biostatistics that will train outstanding program in biostatistics that will train outstanding
collaborative scientists and will focus on the collaborative scientists and will focus on the
methods of modern applied statistics.methods of modern applied statistics.
![Page 15: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/15.jpg)
The major challenge in high throughput experiments, e.g.,
microarray data, MALDI-TOF data, SELDI-TOF data, or shotgun
proteomic data is that the data is often high dimensional.
When the number of dimensions reaches thousands or more,
the computational time for the pattern recognition algorithms
can become unreasonable. This can be a problem, especially
when some of the features are not discriminatory.
High Dimensional DataHigh Dimensional Data
![Page 16: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/16.jpg)
The irrelevant features may cause a reduction in the accuracy of
some algorithms. For example (Witten 1999), experiments with a
decision tree classifier have shown that adding a random binary
feature to standard datasets can deteriorate the classification
performance by 5 - 10%.
Furthermore, in many pattern recognition tasks, the number of
features represents the dimension of a search space - the larger
the number of features, the greater the dimension of the search
space, and the harder the problem.
High Dimensional DataHigh Dimensional Data
![Page 17: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/17.jpg)
Outcome Measurement: MALDI-TOFOutcome Measurement: MALDI-TOF
![Page 18: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/18.jpg)
LaserOptics
MALDITarget
TOFAnalyzer
Nitrogen Laser (337
nm)
Reflex MALDI TOF Mass SpectrometerReflex MALDI TOF Mass Spectrometer
Ion Mirror
IonGrid
MicrochannelDetector
![Page 19: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/19.jpg)
Time-of-Flight Mass Spectrometry (TOF-MS)Time-of-Flight Mass Spectrometry (TOF-MS)
Linear TOF :
Ionsignals
Ionizing Probe (start)
M3 M2 M1
+/- U
Ion detector (MCP)
M3
M2
M1
t3t2t1Start
t a M b
Time or M
![Page 20: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/20.jpg)
Issues in the Analysis of High-Throughput ExperimentIssues in the Analysis of High-Throughput Experiment
Experiment DesignExperiment Design
Measurement Measurement
PreprocessingPreprocessing
♦♦ Baseline Correction, Normalization Baseline Correction, Normalization
♦ ♦ Profile Alignment, Feature selection, DenosingProfile Alignment, Feature selection, Denosing
Classification Classification
Feature SelectionFeature Selection
QCA (Quality Control Assessment)QCA (Quality Control Assessment)
![Page 21: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/21.jpg)
Issues in the Analysis of High-Throughput ExperimentIssues in the Analysis of High-Throughput Experiment
Computational ValidationComputational Validation
♦ ♦ Estimate the classification error rateEstimate the classification error rate
♦ ♦ bootstrapping, k-fold validation, leave-one-out validationbootstrapping, k-fold validation, leave-one-out validation
Validation – blind test cohortValidation – blind test cohort
Significance Testing of the Achieved Classification ErrorSignificance Testing of the Achieved Classification Error
Reporting the result - graphic & tableReporting the result - graphic & table
Validation – laboratory technology, e.g. RTPCR, Validation – laboratory technology, e.g. RTPCR, Pathway analysisPathway analysis
![Page 22: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/22.jpg)
PreprocessingPreprocessing
Mass Spectrometry (MS) can generate high throughput protein profiles Mass Spectrometry (MS) can generate high throughput protein profiles
for biomedical applications. A for biomedical applications. A consistentconsistent, , sensitivesensitive and and robustrobust MS data MS data
preprocessing method would be greatly desirable because subsequent preprocessing method would be greatly desirable because subsequent
analyses are determined by the preprocessing output. analyses are determined by the preprocessing output.
The preprocessing goal is to The preprocessing goal is to extractextract and and quantifyquantify the the common featurescommon features
across the spectra. across the spectra.
We propose a new comprehensive MALDI-TOF MS data preprocessing We propose a new comprehensive MALDI-TOF MS data preprocessing
method using feedback concepts associated with several new method using feedback concepts associated with several new
algorithms. algorithms.
This new package successfully resolves many conventional difficulties This new package successfully resolves many conventional difficulties
such as such as removing m/z measure errorremoving m/z measure error, , objectively setting de-nosing objectively setting de-nosing
parametersparameters, and , and define common features across spectradefine common features across spectra..
![Page 23: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/23.jpg)
Math Model for MS Data PreprocessingMath Model for MS Data Preprocessing
From a mathematical point of view, one MS data is a signal From a mathematical point of view, one MS data is a signal
function defined on a time or function defined on a time or m/zm/z domain. An observed MS signal domain. An observed MS signal
is often modeled as the superposition of three components:is often modeled as the superposition of three components:
where where f(x)f(x) is observed signal, is observed signal, B(x)B(x) is a slowly varying “baseline” is a slowly varying “baseline”
artifact, artifact, S(x)S(x) is the “true” signal (peaks) to be extracted, is the “true” signal (peaks) to be extracted, N N is the is the
normalization factor, and normalization factor, and e(x)e(x) represents noise. represents noise.
( ) ( ) * ( ) ( ) ,f x B x N S x e x
Basic Descriptions of the Data PreprocessingBasic Descriptions of the Data Preprocessing
Registration Registration Denoising Denoising Baseline correction Baseline correction
NormalizationNormalization Peak selection Peak alignment or Binning Peak selection Peak alignment or Binning
![Page 24: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/24.jpg)
Math Model for MS Data PreprocessingMath Model for MS Data Preprocessing
The preprocessing goal is to The preprocessing goal is to identifyidentify, , quantifyquantify and and match match
peaks across spectrapeaks across spectra. .
Several modern algorithms such as Several modern algorithms such as waveletswavelets, , splinessplines, ,
nonparametric local maximum likelihood estimate(nonparametric local maximum likelihood estimate(NLMLENLMLE) )
are successfully applied to the whole processing system.are successfully applied to the whole processing system.
The feedbacks optimized the calibration and peak picking The feedbacks optimized the calibration and peak picking
procedures automatically.procedures automatically.
![Page 25: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/25.jpg)
Raw dataRaw data
![Page 26: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/26.jpg)
General stepsGeneral steps
(1) (1) Calibration:Calibration: Calibration based on multiple identified peaks (linear Calibration based on multiple identified peaks (linear
shifts on the time domain) and the shape of peak (convolution); in the shifts on the time domain) and the shape of peak (convolution); in the
meanwhile all spectra get aligned.meanwhile all spectra get aligned.
(2) (2) Quantification:Quantification:
Baseline Correction (splines) =>Normalization (TIC) =>area based Baseline Correction (splines) =>Normalization (TIC) =>area based
peak quantification method.peak quantification method.
(3) (3) Feature Extraction:Feature Extraction:
Denoising (wavelets) => Peak Selection (local maximum) => common Denoising (wavelets) => Peak Selection (local maximum) => common
peak finding across spectra(NLMLE)peak finding across spectra(NLMLE)
(4) (4) Feedback:Feedback: optimally choosing calibration peaks and setting feature optimally choosing calibration peaks and setting feature
extraction parameters.extraction parameters.
![Page 27: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/27.jpg)
Flowchart of the Preprocessing Procedure
Raw data De-noisingPeak
DetectionPeak
Distribution
BaselineCorrection
Normalization
Calibration Alignment
CommonFeature
detection
Results
![Page 28: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/28.jpg)
Convolution Based Calibration AlgorithmConvolution Based Calibration Algorithm
1. Known peaks’ simulation (choose 1. Known peaks’ simulation (choose peaks with high prevalence across peaks with high prevalence across spectra and clear pattern by feedback spectra and clear pattern by feedback 80% ).80% ).
2. Convolve each spectra with the 2. Convolve each spectra with the known peak simulation (Gaussian, or known peak simulation (Gaussian, or Beta). Maximum happens when two Beta). Maximum happens when two peak shapes match best.peak shapes match best.
3. The linear shift units makes multiple 3. The linear shift units makes multiple peaks matched best is the optimal peaks matched best is the optimal shift.shift.
Notice: all process are on the time Notice: all process are on the time domain.domain.
![Page 29: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/29.jpg)
Pre- CalibrationPre- Calibration
![Page 30: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/30.jpg)
Post CalibrationPost Calibration
1.1. Accurate m/z peak position (as theoretical)Accurate m/z peak position (as theoretical)2.2. Less variation of the peaks position Less variation of the peaks position 3.3. Easily to handle large dataset in batch mode. Easily to handle large dataset in batch mode.
![Page 31: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/31.jpg)
Pre- CalibrationPre- Calibration
![Page 32: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/32.jpg)
Post CalibrationPost Calibration
![Page 33: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/33.jpg)
Baseline Correction & NormalizationBaseline Correction & Normalization
Baseline is generally considered as an artificial bias of the Baseline is generally considered as an artificial bias of the
signal.signal.
We propose baseline might be caused by delayed charge We propose baseline might be caused by delayed charge
releasing.releasing.
We apply We apply quadratic splinesquadratic splines to the local minimums to get the to the local minimums to get the
continuous curve by sliding windows.continuous curve by sliding windows.
Trimmed total ion currentTrimmed total ion current ( (TIC) normalization.TIC) normalization.
![Page 34: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/34.jpg)
Baseline Data Before CorrectionBaseline Data Before Correction
![Page 35: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/35.jpg)
Baseline Corrected DataBaseline Corrected Data
![Page 36: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/36.jpg)
Wavelets DenoisingWavelets Denoising
Wavelet: FBI's image coding standard for digitized fingerprints, Wavelet: FBI's image coding standard for digitized fingerprints,
successful to reproduce true signal by removing noises of successful to reproduce true signal by removing noises of
specific energy levels.specific energy levels.
Wavelets method has been used to denoise signals in a wide Wavelets method has been used to denoise signals in a wide
variety of contexts.variety of contexts.
Wavelet method analyzes the data in both time and frequency Wavelet method analyzes the data in both time and frequency
domain to extract more useful information. domain to extract more useful information.
Adaptive stationary discrete wavelet denoising method is Adaptive stationary discrete wavelet denoising method is
applied in our research, which is shift-invariant and efficient in applied in our research, which is shift-invariant and efficient in
denoising.denoising.
,( ) ( , ) ( )j kj Z k Z
f t c j k t
,( , ) ( ) ( )j kc j k f t t dt
![Page 37: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/37.jpg)
Denoising strategyDenoising strategy
Stationary discrete wavelet denoising method is shift-Stationary discrete wavelet denoising method is shift-
invariant and offers both good reconstruction invariant and offers both good reconstruction
performance and smoothness.performance and smoothness.
Adaptive denoising method is based on the noise Adaptive denoising method is based on the noise
distribution, we set up different threshold values at distribution, we set up different threshold values at
different mass intervals and frequency levels.different mass intervals and frequency levels.
Parameters (decomposition and thresholds are Parameters (decomposition and thresholds are
determined by the feedback information)determined by the feedback information)
![Page 38: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/38.jpg)
DWT DecompositionDWT Decomposition
![Page 39: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/39.jpg)
Denoised DataDenoised Data
![Page 40: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/40.jpg)
Peak list across spectraPeak list across spectra
![Page 41: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/41.jpg)
Kernel Density EstimationKernel Density Estimation
![Page 42: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/42.jpg)
Peak distribution without high-quality preprocessing Peak distribution without high-quality preprocessing
![Page 43: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/43.jpg)
Peak distribution with high-quality preprocessing Peak distribution with high-quality preprocessing
![Page 44: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/44.jpg)
Peak SelectionPeak Selection
![Page 45: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/45.jpg)
Peak SelectionPeak Selection
![Page 46: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/46.jpg)
Preprocessing on one spectrum after calibrationPreprocessing on one spectrum after calibration
1.1. Read in spectrum by two columns: m/z values and corresponding intensities. Read in spectrum by two columns: m/z values and corresponding intensities.
2.2. Apply Adaptive Stationary Discrete Wavelet Transform for denoising. Apply Adaptive Stationary Discrete Wavelet Transform for denoising.
3.3. Sliding widow splines estimate the baseline, and subtract the baseline. Total Ion Current Sliding widow splines estimate the baseline, and subtract the baseline. Total Ion Current Normalization through the whole spectrum.Normalization through the whole spectrum.
4.4. Local maximums contribute to peak list across spectra.Local maximums contribute to peak list across spectra.
![Page 47: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/47.jpg)
![Page 48: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/48.jpg)
![Page 49: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/49.jpg)
day1day1
day2day2
day3day3
day4day4
Expression ProfilesExpression Profiles
![Page 50: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/50.jpg)
The Results from the Cluster AnalysisThe Results from the Cluster Analysis
![Page 51: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/51.jpg)
Day
Laser P
ow
er
Why?Why?
![Page 52: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/52.jpg)
Quality Control Assessment - Reproducibility Quality Control Assessment - Reproducibility
Intra-class Correlation Coefficient (ICC)Intra-class Correlation Coefficient (ICC)
Intra / Intra + InterIntra / Intra + Inter
Correlation of Variation (CV) Correlation of Variation (CV)
SD/MeanSD/Mean
Goal – Make sure the data is reproducible !Goal – Make sure the data is reproducible ! SOP is a necessary componentSOP is a necessary component
Variance Component AnalysisVariance Component Analysis
Mixed/Random Effect Model. Mixed/Random Effect Model.
The model: investigators, day, spot, machine, lab, etc.The model: investigators, day, spot, machine, lab, etc.
![Page 53: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/53.jpg)
Source of Variability for MALDI-TOF DataSource of Variability for MALDI-TOF Data
Specimen Collection/Handling EffectsSpecimen Collection/Handling Effects
- Tumor: surgical related effects- Tumor: surgical related effects
- Cell Line: culture condition- Cell Line: culture condition
Biological Heterogeneity in SpecimenBiological Heterogeneity in Specimen
Biological Heterogeneity in PopulationBiological Heterogeneity in Population
Laser power variationLaser power variation
![Page 54: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/54.jpg)
1717994416169944161688442020
191911117717171010551717994455
323224241919242416161111191911116611
1.01.00.50.50.20.21.01.00.50.50.20.21.01.00.50.50.20.2Number Number (m)(m)
Inter-Case Inter-Case VarianceVariance
Inter-Case Inter-Case VarianceVariance
Inter-Case Inter-Case VarianceVariance
SubsampleSubsample
1.01.00.50.50.20.2
Intra-Case VarianceIntra-Case Variance
Table IVTable IVPower = 80% Type I error = 5%Power = 80% Type I error = 5%
![Page 55: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/55.jpg)
CV in different daysCV in different days
![Page 56: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/56.jpg)
ICCICC
![Page 57: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/57.jpg)
Variance Components AnalysisVariance Components Analysis
![Page 58: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/58.jpg)
Variance Component AnalysisVariance Component Analysis
Tumor
![Page 59: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/59.jpg)
Things DON’T DOThings DON’T DO
Fold-change for feature selectionFold-change for feature selection
Cluster analysis for class comparison or class predictionCluster analysis for class comparison or class prediction
Ignore the over-fitting issuesIgnore the over-fitting issues
Only report the good newsOnly report the good news
Extremely small sample size for the Independent test cohortExtremely small sample size for the Independent test cohort
![Page 60: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/60.jpg)
Agulnik, M. et al. J Clin Oncol; 25:2184-2190 2007
Multidimensional scaling (MDS)Multidimensional scaling (MDS)
![Page 61: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/61.jpg)
Multidimensional scaling (MDS)Multidimensional scaling (MDS)
![Page 62: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/62.jpg)
AcknowledgementAcknowledgement
PreprocessingPreprocessing Dr. Dean BillheimerDr. Dean Billheimer Dr. Ming LiDr. Ming Li Dr. Dong HongDr. Dong Hong Shuo ChenShuo Chen Huiming LiHuiming Li
Additional AcknowledgementsAdditional Acknowledgements Bashar ShakhtourBashar Shakhtour Dr. William WuDr. William Wu Dr. Bonnie LeFureDr. Bonnie LeFure
AnalysisAnalysis Jeremy RobertsJeremy Roberts Will GrayWill Gray Nimish GautamNimish Gautam Joan ZhangJoan Zhang Haojie WuHaojie Wu
Dr. Heidi ChenDr. Heidi Chen Dr. Jonathan XuDr. Jonathan Xu Dr. Tatsuki KoyamaDr. Tatsuki Koyama
![Page 63: Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt](https://reader030.vdocuments.site/reader030/viewer/2022032708/56812b99550346895d8fbcac/html5/thumbnails/63.jpg)
ENDEND