analysis of proteomics data using maldiquant of proteomics data using maldiquant sebastian gibb...

25
Analysis of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of Leipzig 17. August 2011 Sebastian Gibb, MALDIquant, 2011-08-17 1

Upload: phungnhi

Post on 01-May-2018

227 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

Analysis of Proteomics Data using MALDIquant

Sebastian GibbInstitute for Medical Informatics, Statistics and Epidemiology (IMISE)

University of Leipzig

17. August 2011

Sebastian Gibb, MALDIquant, 2011-08-17 1

Page 2: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

Table of Contents

1 IntroductionProteomicsMass Spectrometry

2 Analysis of Proteomics Data using MALDIquantDesignSingle Spectrum WorkflowPreview

Sebastian Gibb, MALDIquant, 2011-08-17 2

Page 3: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

Predicting disease

Sebastian Gibb, MALDIquant, 2011-08-17 3

Page 4: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

Proteomics

Proteomics

• Study of the entirety of proteins produced by an organism.

• Foci: identification, structure determination, biomarker,pathways, expression.

Sebastian Gibb, MALDIquant, 2011-08-17 4

Page 5: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

Mass Spectrometry

Ion Source: MALDI

Matrix-Assisted LaserDesorption/Ionization

Mass Analyzer: TOF

Time Of Flight (t ∝√

mq )

Detector

QuantityMeasurement

Abb. 3.14.; S. 67; “Biochemie & Pathobiochemie”, Loffler G., 8. Auflage (2007), Springer Medizin Verlag

Sebastian Gibb, MALDIquant, 2011-08-17 5

Page 6: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDI-TOF Example Spectrum

2000 4000 6000 8000 10000

050

0010

000

1500

020

000

2500

030

000

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 6

Page 7: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquantMotivation

• Only relatively few open source software solutions availableand very few for the R platform.

• No MALDI-TOF package fitting our needs for clinicaldiagnostics.

• Necessity of handling both technical and biologicalreplicates.

• Unsatisfying quantification of relative intensities(total-ion-current, 0/1)

• Investigation of impact of calibration of spectra on clinicalprognosis.

• Modular and easy to customize analysis routines.

Sebastian Gibb, MALDIquant, 2011-08-17 7

Page 8: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)

MALDIquant

single spectrum multiple spectra

peak alignmentcalibration/

normalization

smoothing

baseline correction

peak detection

raw data

readBrukerFlexData

...

readMzXmlData

classification

sda

randomForest

...

pamr

smoothing

baseline correction

peak detection

Sebastian Gibb, MALDIquant, 2011-08-17 8

Page 9: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)

f1,1 f1,2 f1,3 . . . f1,n

f2,1 f2,2 f2,3 . . . f2,n

. . . . . . . . . . . . . . .fm,1 fm,2 fm,3 . . . fm,n

Sebastian Gibb, MALDIquant, 2011-08-17 9

Page 10: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquantStructure

object-oriented, S4

AbstractMassObject+mass: vector+intensity: vector+metaData: list

+as.matrix()+intensity()+intensityMatrix()+isEmpty()+length()+lines()+mass()+metaData()+plot()+points()+transformIntensity()

MassSpectrum

+detectPeaks()+estimateBaseline()+estimateNoise()+findLocalMaxima()+removeBaseline()

MassPeaks

+labelPeaks()

Sebastian Gibb, MALDIquant, 2011-08-17 10

Page 11: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

Example Data

Serum Peptidome Profiling Revealed Platelet Factor 4 as aPotential Discriminating Peptide Associated withPancreatic Cancer

G.M. Fiedler, A.B. Leichtle, J. Kase et alClin Cancer Res June 1, 2009 15:3812-3819

“Two significant peaks ( m/z 3884; 5959) achieved asensitivity of 86.3% and a specificity of 97.6% for thediscrimination of patients and healthy controls . . . ”

“MALDI-TOF MS-based serum peptidome profilingallowed the discovery and validation ofplatelet factor 4 [m/z 3884, 7767; S.G.] as a newdiscriminating marker in pancreatic cancer.”

Sebastian Gibb, MALDIquant, 2011-08-17 11

Page 12: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: File Import

> library("MALDIquant")

> library("readBrukerFlexData")

> spectra <- mqReadBrukerFlex("/data/fiedler2009/")

> length(spectra)

[1] 480

> spectra[[1]]

S4 class type : MassSpectrum

Number of m/z values : 42388

Range of m/z values : 1000.015 - 9999.734

Range of intensity values: 5 - 101840

File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid

other possibilities:

library("MALDIquant")

library("readMzXmlData")

s <- mqReadMzXml("/data/exampleMS/spectrum.mzXML")

library("MALDIquant")

s <- createMassSpectrum(mass=1:5,

intensity=runif(5),

metaData=list(name="example"))

Sebastian Gibb, MALDIquant, 2011-08-17 12

Page 13: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: plot

> plot(spectra[[1]])

> abline(v=3884, col="blue")

2000 4000 6000 8000 10000

050

0010

000

1500

020

000

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 13

Page 14: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: plot m/z 3884

> plot(spectra[[1]], xlim=c(3800, 4500))

> abline(v=3884, col="blue")

3800 3900 4000 4100 4200 4300 4400 4500

050

0010

000

1500

020

000

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 14

Page 15: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: Variance Stabilization and Smoothing

> spectra <- lapply(spectra, transformIntensity, fun=sqrt)

> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}

> spectra <- lapply(spectra, transformIntensity, fun=movAvg)

3800 3900 4000 4100 4200 4300 4400 4500

050

100

150

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 15

Page 16: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: Baseline Correction – don’t harm the data

> bl <- estimateBaseline(spectra[[1]], method="Median")

> plot(spectra[[1]], xlim=c(3800, 4500));

> lines(bl, col="red");

3800 3900 4000 4100 4200 4300 4400 4500

050

100

150

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 16

Page 17: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: Baseline Correction – SNIP

> bl <- estimateBaseline(spectra[[1]], method="SNIP")

> plot(spectra[[1]], xlim=c(3800, 4500)); lines(bl, col="red");

3800 3900 4000 4100 4200 4300 4400 4500

050

100

150

mass

inte

nsity

C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988

Sebastian Gibb, MALDIquant, 2011-08-17 17

Page 18: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: Baseline Correction – SNIP

> spectra <- lapply(spectra, removeBaseline)

> lines(spectra[[1]], col="blue")

3800 3900 4000 4100 4200 4300 4400 4500

050

100

150

mass

inte

nsity

C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988

Sebastian Gibb, MALDIquant, 2011-08-17 17

Page 19: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: Baseline Correction – SNIP

2000 4000 6000 8000 10000

050

100

150

mass

inte

nsity

Sebastian Gibb, MALDIquant, 2011-08-17 18

Page 20: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: Peak Detection

> spectra[[1]]

S4 class type : MassSpectrum

Number of m/z values : 42388

Range of m/z values : 1000.015 - 9999.734

Range of intensity values: 0 - 709.207

File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid

> peaks <- lapply(spectra, detectPeaks)

> peaks[[1]]

S4 class type : MassPeaks

Number of m/z values : 198

Range of m/z values : 1011.059 - 9423.422

Range of intensity values: 19.273 - 709.207

File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid

Sebastian Gibb, MALDIquant, 2011-08-17 19

Page 21: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquanthands-on: Peak Detection

> plot(spectra[[1]], xlim=c(3800, 4500)); points(peaks[[1]], col="green")

> top5 <- intensity(p) %in% sort(intensity(p)[mass(p)>3800 & mass(p)<4500], decreasing=TRUE)[1:5]

> labelPeaks(peaks[[1]], index=top5); labelPeaks(peak[[1]], mass=3884, col="blue")

3800 3900 4000 4100 4200 4300 4400 4500

020

4060

8010

0

mass

inte

nsity

●●● ●●●●● ●●

●●● ●●●

●●●●●

●●●●● ●●

●● ●●

●●

● ●

●●●● ●

● ●

●● ●

●● ●●●

●●

●● ●●

●●

● ●

●●

4052.944090.24193.806

4209.072

4265.5643882.052

accepted maximarejected maximanoise threshold

Sebastian Gibb, MALDIquant, 2011-08-17 20

Page 22: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquantSingle Spectrum Workflow

> library("MALDIquant")

> library("readBrukerFlexData")

> spectra <- mqReadBrukerFlex("/data/fiedler2009/")

> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}

> spectra <- lapply(spectra, transformIntensity, fun=sqrt)

> spectra <- lapply(spectra, transformIntensity, fun=movAvg)

> spectra <- lapply(spectra, removeBaseline)

> peaks <- lapply(spectra, detectPeaks)

Sebastian Gibb, MALDIquant, 2011-08-17 21

Page 23: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquantComparison of Multiple Spectra

3850 3900 3950 4000 4050 4100

010

2030

40

mass

inte

nsity 3882.052

4052.94

4069.799

4090.2

3880.425

4050.8854068.133

4088.301

spectrum 1 (control)spectrum 401 (cancer)

Sebastian Gibb, MALDIquant, 2011-08-17 22

Page 24: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquantConclusion

MALDIquant is free software (GPLv3).

Currently available:

• Easy to use and full established single spectrum workflow.

• Easy to customize.

• Easy data exchange.

Shortly available:

• Peak alignment.

• Calibration/Normalization routines.

Sebastian Gibb, MALDIquant, 2011-08-17 23

Page 25: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of

MALDIquantThanks

Alexander B. Leichtle, helpful discussions(Institute for Clinical Chemistry, Bern University Hospital)

Korbinian Strimmer, supervision(IMISE, University of Leipzig)

Thanks for your attention!

Download of MALDIquant software:http://strimmerlab.org/software/maldiquant/

Sebastian Gibb, MALDIquant, 2011-08-17 24