analysis of proteomics data using maldiquant of proteomics data using maldiquant sebastian gibb...
TRANSCRIPT
Analysis of Proteomics Data using MALDIquant
Sebastian GibbInstitute for Medical Informatics, Statistics and Epidemiology (IMISE)
University of Leipzig
17. August 2011
Sebastian Gibb, MALDIquant, 2011-08-17 1
Table of Contents
1 IntroductionProteomicsMass Spectrometry
2 Analysis of Proteomics Data using MALDIquantDesignSingle Spectrum WorkflowPreview
Sebastian Gibb, MALDIquant, 2011-08-17 2
Predicting disease
Sebastian Gibb, MALDIquant, 2011-08-17 3
Proteomics
Proteomics
• Study of the entirety of proteins produced by an organism.
• Foci: identification, structure determination, biomarker,pathways, expression.
Sebastian Gibb, MALDIquant, 2011-08-17 4
Mass Spectrometry
Ion Source: MALDI
Matrix-Assisted LaserDesorption/Ionization
Mass Analyzer: TOF
Time Of Flight (t ∝√
mq )
Detector
QuantityMeasurement
Abb. 3.14.; S. 67; “Biochemie & Pathobiochemie”, Loffler G., 8. Auflage (2007), Springer Medizin Verlag
Sebastian Gibb, MALDIquant, 2011-08-17 5
MALDI-TOF Example Spectrum
2000 4000 6000 8000 10000
050
0010
000
1500
020
000
2500
030
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 6
MALDIquantMotivation
• Only relatively few open source software solutions availableand very few for the R platform.
• No MALDI-TOF package fitting our needs for clinicaldiagnostics.
• Necessity of handling both technical and biologicalreplicates.
• Unsatisfying quantification of relative intensities(total-ion-current, 0/1)
• Investigation of impact of calibration of spectra on clinicalprognosis.
• Modular and easy to customize analysis routines.
Sebastian Gibb, MALDIquant, 2011-08-17 7
MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)
MALDIquant
single spectrum multiple spectra
peak alignmentcalibration/
normalization
smoothing
baseline correction
peak detection
raw data
readBrukerFlexData
...
readMzXmlData
classification
sda
randomForest
...
pamr
smoothing
baseline correction
peak detection
Sebastian Gibb, MALDIquant, 2011-08-17 8
MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)
⇒
f1,1 f1,2 f1,3 . . . f1,n
f2,1 f2,2 f2,3 . . . f2,n
. . . . . . . . . . . . . . .fm,1 fm,2 fm,3 . . . fm,n
Sebastian Gibb, MALDIquant, 2011-08-17 9
MALDIquantStructure
object-oriented, S4
AbstractMassObject+mass: vector+intensity: vector+metaData: list
+as.matrix()+intensity()+intensityMatrix()+isEmpty()+length()+lines()+mass()+metaData()+plot()+points()+transformIntensity()
MassSpectrum
+detectPeaks()+estimateBaseline()+estimateNoise()+findLocalMaxima()+removeBaseline()
MassPeaks
+labelPeaks()
Sebastian Gibb, MALDIquant, 2011-08-17 10
Example Data
Serum Peptidome Profiling Revealed Platelet Factor 4 as aPotential Discriminating Peptide Associated withPancreatic Cancer
G.M. Fiedler, A.B. Leichtle, J. Kase et alClin Cancer Res June 1, 2009 15:3812-3819
“Two significant peaks ( m/z 3884; 5959) achieved asensitivity of 86.3% and a specificity of 97.6% for thediscrimination of patients and healthy controls . . . ”
“MALDI-TOF MS-based serum peptidome profilingallowed the discovery and validation ofplatelet factor 4 [m/z 3884, 7767; S.G.] as a newdiscriminating marker in pancreatic cancer.”
Sebastian Gibb, MALDIquant, 2011-08-17 11
MALDIquanthands-on: File Import
> library("MALDIquant")
> library("readBrukerFlexData")
> spectra <- mqReadBrukerFlex("/data/fiedler2009/")
> length(spectra)
[1] 480
> spectra[[1]]
S4 class type : MassSpectrum
Number of m/z values : 42388
Range of m/z values : 1000.015 - 9999.734
Range of intensity values: 5 - 101840
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
other possibilities:
library("MALDIquant")
library("readMzXmlData")
s <- mqReadMzXml("/data/exampleMS/spectrum.mzXML")
library("MALDIquant")
s <- createMassSpectrum(mass=1:5,
intensity=runif(5),
metaData=list(name="example"))
Sebastian Gibb, MALDIquant, 2011-08-17 12
MALDIquanthands-on: plot
> plot(spectra[[1]])
> abline(v=3884, col="blue")
2000 4000 6000 8000 10000
050
0010
000
1500
020
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 13
MALDIquanthands-on: plot m/z 3884
> plot(spectra[[1]], xlim=c(3800, 4500))
> abline(v=3884, col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
050
0010
000
1500
020
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 14
MALDIquanthands-on: Variance Stabilization and Smoothing
> spectra <- lapply(spectra, transformIntensity, fun=sqrt)
> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}
> spectra <- lapply(spectra, transformIntensity, fun=movAvg)
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 15
MALDIquanthands-on: Baseline Correction – don’t harm the data
> bl <- estimateBaseline(spectra[[1]], method="Median")
> plot(spectra[[1]], xlim=c(3800, 4500));
> lines(bl, col="red");
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 16
MALDIquanthands-on: Baseline Correction – SNIP
> bl <- estimateBaseline(spectra[[1]], method="SNIP")
> plot(spectra[[1]], xlim=c(3800, 4500)); lines(bl, col="red");
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988
Sebastian Gibb, MALDIquant, 2011-08-17 17
MALDIquanthands-on: Baseline Correction – SNIP
> spectra <- lapply(spectra, removeBaseline)
> lines(spectra[[1]], col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988
Sebastian Gibb, MALDIquant, 2011-08-17 17
MALDIquanthands-on: Baseline Correction – SNIP
2000 4000 6000 8000 10000
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 18
MALDIquanthands-on: Peak Detection
> spectra[[1]]
S4 class type : MassSpectrum
Number of m/z values : 42388
Range of m/z values : 1000.015 - 9999.734
Range of intensity values: 0 - 709.207
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
> peaks <- lapply(spectra, detectPeaks)
> peaks[[1]]
S4 class type : MassPeaks
Number of m/z values : 198
Range of m/z values : 1011.059 - 9423.422
Range of intensity values: 19.273 - 709.207
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
Sebastian Gibb, MALDIquant, 2011-08-17 19
MALDIquanthands-on: Peak Detection
> plot(spectra[[1]], xlim=c(3800, 4500)); points(peaks[[1]], col="green")
> top5 <- intensity(p) %in% sort(intensity(p)[mass(p)>3800 & mass(p)<4500], decreasing=TRUE)[1:5]
> labelPeaks(peaks[[1]], index=top5); labelPeaks(peak[[1]], mass=3884, col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
020
4060
8010
0
mass
inte
nsity
●●● ●●●●● ●●
●●● ●●●
●
●●●●●
●●●●● ●●
●
●
●
●● ●●
●●
●
●
● ●
●
●●●● ●
● ●
●
●● ●
●
●● ●●●
●
●●
●
●
●
●● ●●
●●
●
●
● ●
●
●●
●
4052.944090.24193.806
4209.072
4265.5643882.052
accepted maximarejected maximanoise threshold
Sebastian Gibb, MALDIquant, 2011-08-17 20
MALDIquantSingle Spectrum Workflow
> library("MALDIquant")
> library("readBrukerFlexData")
> spectra <- mqReadBrukerFlex("/data/fiedler2009/")
> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}
> spectra <- lapply(spectra, transformIntensity, fun=sqrt)
> spectra <- lapply(spectra, transformIntensity, fun=movAvg)
> spectra <- lapply(spectra, removeBaseline)
> peaks <- lapply(spectra, detectPeaks)
Sebastian Gibb, MALDIquant, 2011-08-17 21
MALDIquantComparison of Multiple Spectra
3850 3900 3950 4000 4050 4100
010
2030
40
mass
inte
nsity 3882.052
4052.94
4069.799
4090.2
3880.425
4050.8854068.133
4088.301
spectrum 1 (control)spectrum 401 (cancer)
Sebastian Gibb, MALDIquant, 2011-08-17 22
MALDIquantConclusion
MALDIquant is free software (GPLv3).
Currently available:
• Easy to use and full established single spectrum workflow.
• Easy to customize.
• Easy data exchange.
Shortly available:
• Peak alignment.
• Calibration/Normalization routines.
Sebastian Gibb, MALDIquant, 2011-08-17 23
MALDIquantThanks
Alexander B. Leichtle, helpful discussions(Institute for Clinical Chemistry, Bern University Hospital)
Korbinian Strimmer, supervision(IMISE, University of Leipzig)
Thanks for your attention!
Download of MALDIquant software:http://strimmerlab.org/software/maldiquant/
Sebastian Gibb, MALDIquant, 2011-08-17 24