analysis of proteomics data using maldiquant of proteomics data using maldiquant sebastian gibb...

Post on 01-May-2018

227 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Analysis of Proteomics Data using MALDIquant

Sebastian GibbInstitute for Medical Informatics, Statistics and Epidemiology (IMISE)

University of Leipzig

17. August 2011

Sebastian Gibb, MALDIquant, 2011-08-17 1

Table of Contents

1 IntroductionProteomicsMass Spectrometry

2 Analysis of Proteomics Data using MALDIquantDesignSingle Spectrum WorkflowPreview

Sebastian Gibb, MALDIquant, 2011-08-17 2

Predicting disease

Sebastian Gibb, MALDIquant, 2011-08-17 3

Proteomics

Proteomics

• Study of the entirety of proteins produced by an organism.

• Foci: identification, structure determination, biomarker,pathways, expression.

Sebastian Gibb, MALDIquant, 2011-08-17 4

Mass Spectrometry

Ion Source: MALDI

Matrix-Assisted LaserDesorption/Ionization

Mass Analyzer: TOF

Time Of Flight (t ∝√

mq )

Detector

QuantityMeasurement

Abb. 3.14.; S. 67; “Biochemie & Pathobiochemie”, Loffler G., 8. Auflage (2007), Springer Medizin Verlag

Sebastian Gibb, MALDIquant, 2011-08-17 5

MALDI-TOF Example Spectrum

2000 4000 6000 8000 10000

050

0010

000

1500

020

000

2500

030

000

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 6

MALDIquantMotivation

• Only relatively few open source software solutions availableand very few for the R platform.

• No MALDI-TOF package fitting our needs for clinicaldiagnostics.

• Necessity of handling both technical and biologicalreplicates.

• Unsatisfying quantification of relative intensities(total-ion-current, 0/1)

• Investigation of impact of calibration of spectra on clinicalprognosis.

• Modular and easy to customize analysis routines.

Sebastian Gibb, MALDIquant, 2011-08-17 7

MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)

MALDIquant

single spectrum multiple spectra

peak alignmentcalibration/

normalization

smoothing

baseline correction

peak detection

raw data

readBrukerFlexData

...

readMzXmlData

classification

sda

randomForest

...

pamr

smoothing

baseline correction

peak detection

Sebastian Gibb, MALDIquant, 2011-08-17 8

MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)

f1,1 f1,2 f1,3 . . . f1,n

f2,1 f2,2 f2,3 . . . f2,n

. . . . . . . . . . . . . . .fm,1 fm,2 fm,3 . . . fm,n

Sebastian Gibb, MALDIquant, 2011-08-17 9

MALDIquantStructure

object-oriented, S4

AbstractMassObject+mass: vector+intensity: vector+metaData: list

+as.matrix()+intensity()+intensityMatrix()+isEmpty()+length()+lines()+mass()+metaData()+plot()+points()+transformIntensity()

MassSpectrum

+detectPeaks()+estimateBaseline()+estimateNoise()+findLocalMaxima()+removeBaseline()

MassPeaks

+labelPeaks()

Sebastian Gibb, MALDIquant, 2011-08-17 10

Example Data

Serum Peptidome Profiling Revealed Platelet Factor 4 as aPotential Discriminating Peptide Associated withPancreatic Cancer

G.M. Fiedler, A.B. Leichtle, J. Kase et alClin Cancer Res June 1, 2009 15:3812-3819

“Two significant peaks ( m/z 3884; 5959) achieved asensitivity of 86.3% and a specificity of 97.6% for thediscrimination of patients and healthy controls . . . ”

“MALDI-TOF MS-based serum peptidome profilingallowed the discovery and validation ofplatelet factor 4 [m/z 3884, 7767; S.G.] as a newdiscriminating marker in pancreatic cancer.”

Sebastian Gibb, MALDIquant, 2011-08-17 11

MALDIquanthands-on: File Import

> library("MALDIquant")

> library("readBrukerFlexData")

> spectra <- mqReadBrukerFlex("/data/fiedler2009/")

> length(spectra)

[1] 480

> spectra[[1]]

S4 class type : MassSpectrum

Number of m/z values : 42388

Range of m/z values : 1000.015 - 9999.734

Range of intensity values: 5 - 101840

File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid

other possibilities:

library("MALDIquant")

library("readMzXmlData")

s <- mqReadMzXml("/data/exampleMS/spectrum.mzXML")

library("MALDIquant")

s <- createMassSpectrum(mass=1:5,

intensity=runif(5),

metaData=list(name="example"))

Sebastian Gibb, MALDIquant, 2011-08-17 12

MALDIquanthands-on: plot

> plot(spectra[[1]])

> abline(v=3884, col="blue")

2000 4000 6000 8000 10000

050

0010

000

1500

020

000

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 13

MALDIquanthands-on: plot m/z 3884

> plot(spectra[[1]], xlim=c(3800, 4500))

> abline(v=3884, col="blue")

3800 3900 4000 4100 4200 4300 4400 4500

050

0010

000

1500

020

000

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 14

MALDIquanthands-on: Variance Stabilization and Smoothing

> spectra <- lapply(spectra, transformIntensity, fun=sqrt)

> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}

> spectra <- lapply(spectra, transformIntensity, fun=movAvg)

3800 3900 4000 4100 4200 4300 4400 4500

050

100

150

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 15

MALDIquanthands-on: Baseline Correction – don’t harm the data

> bl <- estimateBaseline(spectra[[1]], method="Median")

> plot(spectra[[1]], xlim=c(3800, 4500));

> lines(bl, col="red");

3800 3900 4000 4100 4200 4300 4400 4500

050

100

150

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 16

MALDIquanthands-on: Baseline Correction – SNIP

> bl <- estimateBaseline(spectra[[1]], method="SNIP")

> plot(spectra[[1]], xlim=c(3800, 4500)); lines(bl, col="red");

3800 3900 4000 4100 4200 4300 4400 4500

050

100

150

mass

inte

nsity

C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988

Sebastian Gibb, MALDIquant, 2011-08-17 17

MALDIquanthands-on: Baseline Correction – SNIP

> spectra <- lapply(spectra, removeBaseline)

> lines(spectra[[1]], col="blue")

3800 3900 4000 4100 4200 4300 4400 4500

050

100

150

mass

inte

nsity

C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988

Sebastian Gibb, MALDIquant, 2011-08-17 17

MALDIquanthands-on: Baseline Correction – SNIP

2000 4000 6000 8000 10000

050

100

150

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 18

MALDIquanthands-on: Peak Detection

> spectra[[1]]

S4 class type : MassSpectrum

Number of m/z values : 42388

Range of m/z values : 1000.015 - 9999.734

Range of intensity values: 0 - 709.207

File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid

> peaks <- lapply(spectra, detectPeaks)

> peaks[[1]]

S4 class type : MassPeaks

Number of m/z values : 198

Range of m/z values : 1011.059 - 9423.422

Range of intensity values: 19.273 - 709.207

File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid

Sebastian Gibb, MALDIquant, 2011-08-17 19

MALDIquanthands-on: Peak Detection

> plot(spectra[[1]], xlim=c(3800, 4500)); points(peaks[[1]], col="green")

> top5 <- intensity(p) %in% sort(intensity(p)[mass(p)>3800 & mass(p)<4500], decreasing=TRUE)[1:5]

> labelPeaks(peaks[[1]], index=top5); labelPeaks(peak[[1]], mass=3884, col="blue")

3800 3900 4000 4100 4200 4300 4400 4500

020

4060

8010

0

mass

inte

nsity

●●● ●●●●● ●●

●●● ●●●

●●●●●

●●●●● ●●

●● ●●

●●

● ●

●●●● ●

● ●

●● ●

●● ●●●

●●

●● ●●

●●

● ●

●●

4052.944090.24193.806

4209.072

4265.5643882.052

accepted maximarejected maximanoise threshold

Sebastian Gibb, MALDIquant, 2011-08-17 20

MALDIquantSingle Spectrum Workflow

> library("MALDIquant")

> library("readBrukerFlexData")

> spectra <- mqReadBrukerFlex("/data/fiedler2009/")

> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}

> spectra <- lapply(spectra, transformIntensity, fun=sqrt)

> spectra <- lapply(spectra, transformIntensity, fun=movAvg)

> spectra <- lapply(spectra, removeBaseline)

> peaks <- lapply(spectra, detectPeaks)

Sebastian Gibb, MALDIquant, 2011-08-17 21

MALDIquantComparison of Multiple Spectra

3850 3900 3950 4000 4050 4100

010

2030

40

mass

inte

nsity 3882.052

4052.94

4069.799

4090.2

3880.425

4050.8854068.133

4088.301

spectrum 1 (control)spectrum 401 (cancer)

Sebastian Gibb, MALDIquant, 2011-08-17 22

MALDIquantConclusion

MALDIquant is free software (GPLv3).

Currently available:

• Easy to use and full established single spectrum workflow.

• Easy to customize.

• Easy data exchange.

Shortly available:

• Peak alignment.

• Calibration/Normalization routines.

Sebastian Gibb, MALDIquant, 2011-08-17 23

MALDIquantThanks

Alexander B. Leichtle, helpful discussions(Institute for Clinical Chemistry, Bern University Hospital)

Korbinian Strimmer, supervision(IMISE, University of Leipzig)

Thanks for your attention!

Download of MALDIquant software:http://strimmerlab.org/software/maldiquant/

Sebastian Gibb, MALDIquant, 2011-08-17 24

top related