caarray: cancer array informatics open source tools for microarray data management, analysis and...

48
caArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation http://caarray.nci.nih.gov/ caArray overview & demo Mervi Heiskanen (15 min) caArray architecture Scott Gustafson (15 min) webCGH overview & demo David Hall (15 min)

Upload: joy-sabina-conley

Post on 13-Jan-2016

237 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

caArray: Cancer Array InformaticsOpen Source Tools for Microarray Data Management, Analysis and Annotation

http://caarray.nci.nih.gov/

caArray overview & demoMervi Heiskanen (15 min)

caArray architectureScott Gustafson (15 min)

webCGH overview & demoDavid Hall (15 min)

Page 2: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

2

1. Data Portal: Promotes data sharing, - submission of original, raw data files with associated experiment and sample information.

2. Data analysis and visualization tools: webCGH (NCICB/RTI), XpressionWay (NCICB/SAIC) caBIG tools:

1. caWorkbench - Columbia2. DWD - UNC Lineberger3. GenePattern - MIT/Broad ?4. Magellan - UC San Francisco5. VISDA – Georgetown6. Cancer Molecular Pages – Burnham7. Function Express – Wash U Siteman8. GoMiner –NCI/CCR

caArray Data Portal &Data Analysis Tools

Page 3: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

3

caArray version 1.0

Key features:1. MIAME 1.1 compliant data annotation forms2. Support for Affymetrix and GenePix native files3. MAGE-ML import and export4. controlled vocabularies (MGED ontology)5. access to data via MAGE-OM API

caArray installations: 1. NCICB caArray instance supports NCI funded programs.2. Local installations at the cancer centers:

caBIG funded caArray adopters (Lombardi, Wistar, NYU)

Page 4: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

4

caArray listservs:

1. caArray developers 2. caArray users3. caArray team

Page 5: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

5

caArray: Compliance with Standardization Efforts MIAME

Minimum Information About a Microarray Experiment 1.1 Draft 6 (April 1, 2002) http://www.mged.org/Workgroups/MIAME/miame_1.1.html

MAGE-ML MicroArray and GeneExpression Object Model and Markup

Language 1.1 (October 2003) http://www.omg.org/docs/formal/03-10-01.pdf

MGED Ontology Microarray Gene Expression Data Ontology 1.1.8 (April 2004) http://mged.sourceforge.net/ontologies/MGEDontology.php

caBIG compatibility guidelines http://cabig.nci.nih.gov/guidelines_documentation/caBIG_Compatibility_Document

Page 6: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

6

Page 7: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

7

Page 8: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

8

Page 9: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

9

Page 10: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

10

Page 11: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

class CellLineDatabasenamespace:

http://mged.sourceforge.net/ontologies/MGEDOntology.daml#

documentation:Database of cell line information.

type: primitive

superclasses: Database

used in classes: CellLine

used in individuals: ATCC_CulturesCABRI_Human_and_Animal_Cell_lines

class TechnologyTypenamespace:

http://mged.sourceforge.net/ontologies/MGEDOntology.daml#

documentation:The technology type or platform of the reporters on the array.

type: primitive

superclasses: ArrayDesignPackage

used in classes: FeatureGroup

used in individuals: in_situ_oligo_featuresspotted_antibody_featuresspotted_colony_featuresspotted_ds_DNA_featuresspotted_protein_featuresspotted_ss_oligo_features

Page 12: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

12

Page 13: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

13

Page 14: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

14

Page 15: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

15

Page 16: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

16

Page 17: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

17

Page 18: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

18

Page 19: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

19

Page 20: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

20

caArray Phase 2caArray 1.2 (June 2005)

•Support for additional file formats via a software toolkit•Public search without login•Copy bio sample information

caArray 1.5 (September 2005)•XpressionWay, pathway visualization tool•Integration with caDSR 3.0

caArray 1.7 (December 2005)•Store filtered and normalized data•User management user interface

caArray 2.0 (March 2006)•Embedded MAGE-ML validation

All releases:Defect fixes and usabilityenhancements

Page 21: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

Acknowledgements

NCICB

Sue Dubman, Mervi Heiskanen, Xioapeng Bian, Subha Madhavan, Carl Schaefer, Gilberto Fragoso, Denise Warzel…

and Ken Buetow

NCICB/SAIC

Development team:Hangjiong ChenScott GustafsonJuergen LorenzJohn MoySumeet MujuBeth NeubergerPhu TranJim ZhouQA: Durga AddepalliAndrew ShinoharaYe Wu

NCICB/TerpSysDon Swan, Jamie Keller

Research Triangle InstituteDavid Hall (webCGH)

Page 22: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

22

caARRAY’s Architecture

Credits toSumeet MujuPhu Tran

Page 23: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

23

caArray ArchitectureTOMCAT WEBCONTAINER

MAGE-MLExperiment and ArrayDesign

BROWSER

FTP APPLETNATIVE DATAFILE

FTP STAGING AREA

DATATRANSFER

OBJECT(DTO)

SERVLET

JSP ST

RU

TS

EJB CONTAINER

VOCABMGR EJB

SECURITYMGR EJB

VOCABINTERFACE

SECURITYOBJECTS

OBJECTRELATIONAL

BRIDGE(OJB)

caARRAYDB

SECURITYDB

NETCDF API

MAGE-MLIMPORTER MDB

FILE UPLOADERMDB

caCORE------------

caBIOcaDSR

EVS

MAGE-OM APIJAR

MAGE-OMOBJECTS

MAGE-OMRMI MGR

NETCDF API

MAGE-OMPERSISTENCE

PROTOCOLMGR EJB

EXPERIMENTMGR EJB

OTHERMGR EJB

MAGEMANAGER

MA

GE

-ST

K(

MA

GE

OB

JE

CT

S )

FILE SHARE

Page 24: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

24

caArray Interfaces: caArray EJB API

caArrayEJB API: Provides transaction control, asynchronous processes,service location, common security and distributed capabilities for submission and retrieval of Microarray Experiments. The caArray presentation layer utilizes the

above functionality via the caArrayEJB API. Data Transfer Objects (DTOs) utilized to

transfer data between calling application and the EJBs.

APIs can be used for federated access and submission of transaction data.

Page 25: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

25

caArray Interfaces: Mage-OM API MAGE-OM API :Provides fine grain search

and retrieval of all caArray data via a caBIO-like RMI based API. The MAGE-OM API maps the MAGE objects to

the new caArray database schema. RMI Security module incorporated for

user/group level data access. NetCDF API logic incorporated for faster

retrieval of data Built to be grid enabled

Page 26: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

caArray Middleware Data Representation

Data Transfer Objects (DTO) MicroArray Gene Expression Software Toolkit (MAGE-stk) DTO - MAGE-stk Conversion

Data Persistence Data Access Layer

ObJectRelationalBridge (OJB) OJB Abstraction Layer and Data Access Objects (DAO)

EJB Layer Stateless Session Façade Bean-managed Persistence

NETCDF Files Large Data Set Fast Binary Access

MAGE-ML Import and Export Message-Driven Beans

Page 27: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

<MAGE-ML identifier="gov.nih.nci.ncicb.caarray:MAGEML:123:1"> <AuditAndSecurity_package> <Contact_assnlist> <Person identifier="gov.nih.nci.ncicb.caarray:Person:456:1" lastName="Doe" firstName="John"> </Person> <Contact_assnlist> </AuditAndSecurity_package> <Experiment_package> <Experiment_assnlist> <Experiment identifier="gov.nih.nci.ncicb.caarray:Experiment:789:1" name=“Sample Experiment"> <Descriptions_assnlist> <Description text="This is a sample experiment."></Description> </Descriptions_assnlist> <Providers_assnreflist> <Person_ref identifier="gov.nih.nci.ncicb.caarray:Person:456:1"/> </Providers_assnreflist> </Experiment> </Experiment_assnlist> </Experiment_package></MAGE-ML>

MAGE-ML Import and Export: An Example

Identifiable element

Referenced Identifiable element to be resolved

Page 28: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

MAGE-ML Import and Export

Modified from the MAGE-stk’s MAGE-ML SAX-based parser to include a persistence mechanism to insert, update and resolve (look up) parsed objects

Any valid MAGE-ML can be imported. MAGE-ML is assumed valid. Validation is typically done using ArrayExpress’s MAGEValidator

Identifiable objects are first resolved from database by matching their identifier, and if resolved the in-coming object is updated against the existing one

Identifier represents the globally unique key of a MAGE object across domains for its entire lifecycle

Identifier is separate from persisted MAGE-stk object’s primary key which is only internal to caARRAY

Page 29: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

29

MAGE-ML Export

The entire object graph of an object, e.g., ArrayDesign, Experiment, is traversed to collect all Identifiable objects

The MAGE-stk’s MAGEJava object is utilized to contain all the Identifiable objects collected

When an Identifiable object is encountered, the appropriate method in the MAGEJava object is discovered and invoked using reflection to store the object into it

Ultimately MAGEJava.writeMAGEML(Writer) is invoked to recursively invoke the same method of all the contained Identifiable objects.

Xerces’s XMLSerializer pretty-formats the XML content as it is being written with appropriate new lines and indentations

Page 30: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

30

A caArray Configuration

NCICB

caArray 1

caBIO

caDSR / EVS

Security

caBIO

caDSR / EVS

NCICB Security

caWorkbench

caWorkbench

caArrayschema

caArrayschema

MAGE-OM API

MAGE-OM API

MAGE-ML GRID(future)

caARRAY EJB

caARRAY EJB

JAVAAPP

Page 31: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

31

webCGHA web application for the visualization and analysis of array-based CGH and gene expression data

David Hall, Ph.D.Research Triangle Institute

Page 32: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

32

arrayCGH

Page 33: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

33

webCGH Functions

Visualization of copy number and gene expression levels

Interrogation of genome features Data normalization and analysis Virtual experiments

Page 34: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

34

Whole-genome View

Page 35: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

35

Ideograms

Page 36: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

36

Chromosome 17

Page 37: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

37

Chromosome 17

Page 38: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

38

Zoom

Page 39: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

39

Annotated Genes

Page 40: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

40

Gene List

Page 41: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

41

Gene Watch

Page 42: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

42

Data Flow

Database

Transformer

CacheAnalytical Pipeline

Plot Generator

Database

Adaptor Adaptor

Op Op Op Op

Page 43: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

43

Analytical Pipelines

Page 44: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

44

Architecture

caArray

Cloudscape

POJOs

StrutsJSPsDAO

Cache

Web Container (Tomcat)

Client

(HTML, SVG)

Page 45: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

45

Key Design Features

+perform(in data)+validate(in data)

«interface»AnalyticOperation

«interface»FilterOperation

«interface»NormalizationOperation

«interface»SummaryStatisticalOperation

Page 46: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

46

Key Design Features

«interface»DaoFactory

«interface»Authenticator

«interface»ArrayExperimentDao

«interface»UserProfile

«interface»AnnotationDao

creates creates creates

creates usesuses

Page 47: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

47

Past, Present, Future

Dec. 2003 – Version 1.0 Basic plots, analytics, GEDP

March 2005 – Version 2.0 More plots, analytics, caArray

Late April 2005 – Version 2.1 Mouse/human plots CGH/gene expression SKY/M-FISH&CGH integration

Page 48: CaArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation  caArray overview

48

webCGH Team NCICB

Mervi Heiskanen RTI

David Hall Vesselina Bakalov Ying Chen Matt Westlake Bing Liu Laxminarayana Ganapathi Sheping Li Stuart Allen