daehee hwang leroy hood institute for systems biology

34
Daehee Hwang Leroy Hood Institute for Systems Biology

Upload: jenis

Post on 15-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Daehee Hwang Leroy Hood Institute for Systems Biology. Why Prequips for Systems Biology with proteomic data?. Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Daehee Hwang Leroy Hood Institute for Systems Biology

Daehee HwangLeroy Hood

Institute for Systems Biology

Page 2: Daehee Hwang Leroy Hood Institute for Systems Biology

2Why Prequips for Systems Biology with proteomic data?

• Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample

analysis

• Need for an interface between proteomic data and systems biology analytical tools such as network/pathway analyses

Page 3: Daehee Hwang Leroy Hood Institute for Systems Biology

3Integration of proteomic data at various levels

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Communicationnot possible!

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

eRaw Data

(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?T

ran

s-P

rote

om

ic P

ipel

ine

Page 4: Daehee Hwang Leroy Hood Institute for Systems Biology

4Pep3d: Quality Assessment

Prequips

Multi Sample

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Pep3D

Properties

- quality assessment

- 2D gel-like visualizationGaggle

NetworkAnalysisCytoscape

InteractionDatabase

STRING

PathwayDatabase

KEGG

MicroarrayData Analysis

Mayday, TIGR

Page 5: Daehee Hwang Leroy Hood Institute for Systems Biology

5Pep3d: Quality Assessment

Pep3D

Instance 1

Pep3D

Instance 2Communication

not possible!

Page 6: Daehee Hwang Leroy Hood Institute for Systems Biology

6Interface to Systems Biology

Gaggle

NetworkAnalysisCytoscape

InteractionDatabase

STRING

PathwayDatabase

KEGG

MicroarrayData Analysis

Mayday, TIGR

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Communicationnot possible!

Page 7: Daehee Hwang Leroy Hood Institute for Systems Biology

7Prequips Overview

Prequips

Multi Sample

Gaggle

NetworkAnalysisCytoscape

InteractionDatabase

STRING

PathwayDatabase

KEGG

MicroarrayData Analysis

Mayday, TIGR

- handles multiplesamples at all levels

Key Properties

- integrates high-levelanalysis tools

- is extensible

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Page 8: Daehee Hwang Leroy Hood Institute for Systems Biology

8Integration of proteomic datasets at various levels

Database Search

raw data

Mass Spectrometer

peptide-level data

e.g. mzXML, mzData, ...

Validation

Peptide Quantification

Protein Inference

protein-level data

Protein Quantitation

e.g. pepXML,AnalysisXML,...

e.g. protXML, ...

Trans-Proteomic Pipeline

annotation

further analysis results

Page 9: Daehee Hwang Leroy Hood Institute for Systems Biology

9

Raw Data

Data model

Peptide LevelProtein Level

Core Core CoreMeta Meta Meta

Single-Sample Analysis

Multi-Sample Analysis

Project

Data Providers

Data Structures

protein-level data source,e.g. protXML files

peptide-level datasource, e.g. pepXML,dta or AnalysisXML files

raw data level,e.g. mzXML or mzDatafiles

View

ers

Perspectives

Page 10: Daehee Hwang Leroy Hood Institute for Systems Biology

10Case Study: Toponomic change in drug treated Mø

Calreticulin

BiP

Bcl2

ATPase

Lamp1

2 4 6 8 10 12 14 16 18 20

8% 28%

114 115 116 117

Fraction #:

Mock1 Mock2 Thapsigargin

Page 11: Daehee Hwang Leroy Hood Institute for Systems Biology

11Visualization: Single exp.

CID spectrathat have been selected

detailed information about one of the level 2 spectra

projectmanager peak map for run 29

level 1 spectrum & corresponding CID spectra

level 1

level 2

level 2all scans of Mock 1 experiment

Page 12: Daehee Hwang Leroy Hood Institute for Systems Biology

12Visualization: Multiple exps.

(polymer?) contamination in all 4 runs(this would be hard to see with Pep3D)

green = 0red = 1

Page 13: Daehee Hwang Leroy Hood Institute for Systems Biology

13Visualization: assess, quntify, etc.Mock Up (software is under development):

m/z

min maxretention time

min max

map 1map 2map 3map 4map 5map 6

map 1 map 2

map 3 map 4X

XX

Doesn’t really match the remaining 3 maps!

Page 14: Daehee Hwang Leroy Hood Institute for Systems Biology

14Prequips & the Gaggle

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 15: Daehee Hwang Leroy Hood Institute for Systems Biology

15Mayday

Page 16: Daehee Hwang Leroy Hood Institute for Systems Biology

16Cytoscape

overall mouse protein/protein interaction map in Cytoscape

Page 17: Daehee Hwang Leroy Hood Institute for Systems Biology

17Analysis: Feature extraction

Proteintable

Gaggle pluginfor interactionwith other tools

Filters

Page 18: Daehee Hwang Leroy Hood Institute for Systems Biology

18Analysis: Feature extraction

Gaggle plugin: selection for broadcast

calreticulin

Page 19: Daehee Hwang Leroy Hood Institute for Systems Biology

19Analysis: Feature selection

Mock1 Mock2 Thapsigargin

Page 20: Daehee Hwang Leroy Hood Institute for Systems Biology

20Broadcast to Gaggle

Page 21: Daehee Hwang Leroy Hood Institute for Systems Biology

21Prequips to Gaggle

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 22: Daehee Hwang Leroy Hood Institute for Systems Biology

22Gaggle Boss

Page 23: Daehee Hwang Leroy Hood Institute for Systems Biology

23Gaggle to Cytoscape

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 24: Daehee Hwang Leroy Hood Institute for Systems Biology

24Integration: Network Analysis

proteasome complex

ribosome large subunit

chaperones

actin filamentregulation

Thapsigargin 114 iTRAQ ratio

Page 25: Daehee Hwang Leroy Hood Institute for Systems Biology

25Cytoscape to Prequips

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 26: Daehee Hwang Leroy Hood Institute for Systems Biology

26Analysis: Feature extraction- Module selection

the ids sentfrom Cytoscapethrough the Gaggle

proteasome proteins

Page 27: Daehee Hwang Leroy Hood Institute for Systems Biology

27Prequips & the Gaggle

Gaggle Boss

Prequips

Mayday

R statistical environment

Cytoscape

Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.

KEGGDAVID

Browser

Page 28: Daehee Hwang Leroy Hood Institute for Systems Biology

28Analysis: Functional enrichmentthe proteasome complex enriched compared to a mouse genome background

Page 29: Daehee Hwang Leroy Hood Institute for Systems Biology

29Prequips Summary

Prequips

Multi Sample

Gaggle

NetworkAnalysisCytoscape

InteractionDatabase

STRING

PathwayDatabase

KEGG

MicroarrayData Analysis

Mayday, TIGR

- handles multiplesamples at all levels

Key Properties

- integrates high-levelanalysis tools

- is extensible

Raw Data(MS, MS/MS)

PeptideId + Quantiation

ProteinId + Quantitation

?

Tra

ns-

Pro

teo

mic

Pip

elin

e

Page 30: Daehee Hwang Leroy Hood Institute for Systems Biology

30Conclusion

• general and extensible software for systems biology research with proteomics mass spectrometry data.

• Integration capability of data from various sources for visualization and analysis.

• An interactive environment that supports (visual) data exploration.

Page 31: Daehee Hwang Leroy Hood Institute for Systems Biology

31Software details

• implemented in Java

• based on Eclipse Rich Client Platform

• extremely modular architecture

• multiple plugin interfaces– e.g. viewers, data providers, algorithms

• meta information framework– analysis results, sequence information, annotation, ...– data structures as plugins– requirement to support future analytical tools and data

sources

Page 32: Daehee Hwang Leroy Hood Institute for Systems Biology

32Acknowledgements

• Special thanks to Nils Gehlenborg

• Hood Lab: Inyoul Lee

• Kay Nieselt

• Aebersold Lab: Nichole King, James Eddes,

Eric Deutsch, Ning Zhang, David

Shteynberg, Wei Yan, and Andrew Garbutt

• Paul Shannon for help with the Gaggle

Page 33: Daehee Hwang Leroy Hood Institute for Systems Biology

33

Core

Mayday

Database Gaggle

R

Visualization

Excel

PostgreSQLdatabase

MySQLdatabase

R environmentBioconductor

SBEAMSSBEAMSinstallation

Machine

Learning

WEKA Library

anything else

Prequips

Page 34: Daehee Hwang Leroy Hood Institute for Systems Biology

34Cytoscape