analysis of proteomics data using maldiquant of proteomics data using maldiquant sebastian gibb...
Post on 01-May-2018
227 Views
Preview:
TRANSCRIPT
Analysis of Proteomics Data using MALDIquant
Sebastian GibbInstitute for Medical Informatics, Statistics and Epidemiology (IMISE)
University of Leipzig
17. August 2011
Sebastian Gibb, MALDIquant, 2011-08-17 1
Table of Contents
1 IntroductionProteomicsMass Spectrometry
2 Analysis of Proteomics Data using MALDIquantDesignSingle Spectrum WorkflowPreview
Sebastian Gibb, MALDIquant, 2011-08-17 2
Predicting disease
Sebastian Gibb, MALDIquant, 2011-08-17 3
Proteomics
Proteomics
• Study of the entirety of proteins produced by an organism.
• Foci: identification, structure determination, biomarker,pathways, expression.
Sebastian Gibb, MALDIquant, 2011-08-17 4
Mass Spectrometry
Ion Source: MALDI
Matrix-Assisted LaserDesorption/Ionization
Mass Analyzer: TOF
Time Of Flight (t ∝√
mq )
Detector
QuantityMeasurement
Abb. 3.14.; S. 67; “Biochemie & Pathobiochemie”, Loffler G., 8. Auflage (2007), Springer Medizin Verlag
Sebastian Gibb, MALDIquant, 2011-08-17 5
MALDI-TOF Example Spectrum
2000 4000 6000 8000 10000
050
0010
000
1500
020
000
2500
030
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 6
MALDIquantMotivation
• Only relatively few open source software solutions availableand very few for the R platform.
• No MALDI-TOF package fitting our needs for clinicaldiagnostics.
• Necessity of handling both technical and biologicalreplicates.
• Unsatisfying quantification of relative intensities(total-ion-current, 0/1)
• Investigation of impact of calibration of spectra on clinicalprognosis.
• Modular and easy to customize analysis routines.
Sebastian Gibb, MALDIquant, 2011-08-17 7
MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)
MALDIquant
single spectrum multiple spectra
peak alignmentcalibration/
normalization
smoothing
baseline correction
peak detection
raw data
readBrukerFlexData
...
readMzXmlData
classification
sda
randomForest
...
pamr
smoothing
baseline correction
peak detection
Sebastian Gibb, MALDIquant, 2011-08-17 8
MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)
⇒
f1,1 f1,2 f1,3 . . . f1,n
f2,1 f2,2 f2,3 . . . f2,n
. . . . . . . . . . . . . . .fm,1 fm,2 fm,3 . . . fm,n
Sebastian Gibb, MALDIquant, 2011-08-17 9
MALDIquantStructure
object-oriented, S4
AbstractMassObject+mass: vector+intensity: vector+metaData: list
+as.matrix()+intensity()+intensityMatrix()+isEmpty()+length()+lines()+mass()+metaData()+plot()+points()+transformIntensity()
MassSpectrum
+detectPeaks()+estimateBaseline()+estimateNoise()+findLocalMaxima()+removeBaseline()
MassPeaks
+labelPeaks()
Sebastian Gibb, MALDIquant, 2011-08-17 10
Example Data
Serum Peptidome Profiling Revealed Platelet Factor 4 as aPotential Discriminating Peptide Associated withPancreatic Cancer
G.M. Fiedler, A.B. Leichtle, J. Kase et alClin Cancer Res June 1, 2009 15:3812-3819
“Two significant peaks ( m/z 3884; 5959) achieved asensitivity of 86.3% and a specificity of 97.6% for thediscrimination of patients and healthy controls . . . ”
“MALDI-TOF MS-based serum peptidome profilingallowed the discovery and validation ofplatelet factor 4 [m/z 3884, 7767; S.G.] as a newdiscriminating marker in pancreatic cancer.”
Sebastian Gibb, MALDIquant, 2011-08-17 11
MALDIquanthands-on: File Import
> library("MALDIquant")
> library("readBrukerFlexData")
> spectra <- mqReadBrukerFlex("/data/fiedler2009/")
> length(spectra)
[1] 480
> spectra[[1]]
S4 class type : MassSpectrum
Number of m/z values : 42388
Range of m/z values : 1000.015 - 9999.734
Range of intensity values: 5 - 101840
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
other possibilities:
library("MALDIquant")
library("readMzXmlData")
s <- mqReadMzXml("/data/exampleMS/spectrum.mzXML")
library("MALDIquant")
s <- createMassSpectrum(mass=1:5,
intensity=runif(5),
metaData=list(name="example"))
Sebastian Gibb, MALDIquant, 2011-08-17 12
MALDIquanthands-on: plot
> plot(spectra[[1]])
> abline(v=3884, col="blue")
2000 4000 6000 8000 10000
050
0010
000
1500
020
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 13
MALDIquanthands-on: plot m/z 3884
> plot(spectra[[1]], xlim=c(3800, 4500))
> abline(v=3884, col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
050
0010
000
1500
020
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 14
MALDIquanthands-on: Variance Stabilization and Smoothing
> spectra <- lapply(spectra, transformIntensity, fun=sqrt)
> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}
> spectra <- lapply(spectra, transformIntensity, fun=movAvg)
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 15
MALDIquanthands-on: Baseline Correction – don’t harm the data
> bl <- estimateBaseline(spectra[[1]], method="Median")
> plot(spectra[[1]], xlim=c(3800, 4500));
> lines(bl, col="red");
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 16
MALDIquanthands-on: Baseline Correction – SNIP
> bl <- estimateBaseline(spectra[[1]], method="SNIP")
> plot(spectra[[1]], xlim=c(3800, 4500)); lines(bl, col="red");
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988
Sebastian Gibb, MALDIquant, 2011-08-17 17
MALDIquanthands-on: Baseline Correction – SNIP
> spectra <- lapply(spectra, removeBaseline)
> lines(spectra[[1]], col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988
Sebastian Gibb, MALDIquant, 2011-08-17 17
MALDIquanthands-on: Baseline Correction – SNIP
2000 4000 6000 8000 10000
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 18
MALDIquanthands-on: Peak Detection
> spectra[[1]]
S4 class type : MassSpectrum
Number of m/z values : 42388
Range of m/z values : 1000.015 - 9999.734
Range of intensity values: 0 - 709.207
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
> peaks <- lapply(spectra, detectPeaks)
> peaks[[1]]
S4 class type : MassPeaks
Number of m/z values : 198
Range of m/z values : 1011.059 - 9423.422
Range of intensity values: 19.273 - 709.207
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
Sebastian Gibb, MALDIquant, 2011-08-17 19
MALDIquanthands-on: Peak Detection
> plot(spectra[[1]], xlim=c(3800, 4500)); points(peaks[[1]], col="green")
> top5 <- intensity(p) %in% sort(intensity(p)[mass(p)>3800 & mass(p)<4500], decreasing=TRUE)[1:5]
> labelPeaks(peaks[[1]], index=top5); labelPeaks(peak[[1]], mass=3884, col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
020
4060
8010
0
mass
inte
nsity
●●● ●●●●● ●●
●●● ●●●
●
●●●●●
●●●●● ●●
●
●
●
●● ●●
●●
●
●
● ●
●
●●●● ●
● ●
●
●● ●
●
●● ●●●
●
●●
●
●
●
●● ●●
●●
●
●
● ●
●
●●
●
4052.944090.24193.806
4209.072
4265.5643882.052
accepted maximarejected maximanoise threshold
Sebastian Gibb, MALDIquant, 2011-08-17 20
MALDIquantSingle Spectrum Workflow
> library("MALDIquant")
> library("readBrukerFlexData")
> spectra <- mqReadBrukerFlex("/data/fiedler2009/")
> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}
> spectra <- lapply(spectra, transformIntensity, fun=sqrt)
> spectra <- lapply(spectra, transformIntensity, fun=movAvg)
> spectra <- lapply(spectra, removeBaseline)
> peaks <- lapply(spectra, detectPeaks)
Sebastian Gibb, MALDIquant, 2011-08-17 21
MALDIquantComparison of Multiple Spectra
3850 3900 3950 4000 4050 4100
010
2030
40
mass
inte
nsity 3882.052
4052.94
4069.799
4090.2
3880.425
4050.8854068.133
4088.301
spectrum 1 (control)spectrum 401 (cancer)
Sebastian Gibb, MALDIquant, 2011-08-17 22
MALDIquantConclusion
MALDIquant is free software (GPLv3).
Currently available:
• Easy to use and full established single spectrum workflow.
• Easy to customize.
• Easy data exchange.
Shortly available:
• Peak alignment.
• Calibration/Normalization routines.
Sebastian Gibb, MALDIquant, 2011-08-17 23
MALDIquantThanks
Alexander B. Leichtle, helpful discussions(Institute for Clinical Chemistry, Bern University Hospital)
Korbinian Strimmer, supervision(IMISE, University of Leipzig)
Thanks for your attention!
Download of MALDIquant software:http://strimmerlab.org/software/maldiquant/
Sebastian Gibb, MALDIquant, 2011-08-17 24
top related