caarray: cancer array informatics open source tools for microarray data management, analysis and...

Post on 13-Jan-2016

237 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

caArray: Cancer Array InformaticsOpen Source Tools for Microarray Data Management, Analysis and Annotation

http://caarray.nci.nih.gov/

caArray overview & demoMervi Heiskanen (15 min)

caArray architectureScott Gustafson (15 min)

webCGH overview & demoDavid Hall (15 min)

2

1. Data Portal: Promotes data sharing, - submission of original, raw data files with associated experiment and sample information.

2. Data analysis and visualization tools: webCGH (NCICB/RTI), XpressionWay (NCICB/SAIC) caBIG tools:

1. caWorkbench - Columbia2. DWD - UNC Lineberger3. GenePattern - MIT/Broad ?4. Magellan - UC San Francisco5. VISDA – Georgetown6. Cancer Molecular Pages – Burnham7. Function Express – Wash U Siteman8. GoMiner –NCI/CCR

caArray Data Portal &Data Analysis Tools

3

caArray version 1.0

Key features:1. MIAME 1.1 compliant data annotation forms2. Support for Affymetrix and GenePix native files3. MAGE-ML import and export4. controlled vocabularies (MGED ontology)5. access to data via MAGE-OM API

caArray installations: 1. NCICB caArray instance supports NCI funded programs.2. Local installations at the cancer centers:

caBIG funded caArray adopters (Lombardi, Wistar, NYU)

4

caArray listservs:

1. caArray developers 2. caArray users3. caArray team

5

caArray: Compliance with Standardization Efforts MIAME

Minimum Information About a Microarray Experiment 1.1 Draft 6 (April 1, 2002) http://www.mged.org/Workgroups/MIAME/miame_1.1.html

MAGE-ML MicroArray and GeneExpression Object Model and Markup

Language 1.1 (October 2003) http://www.omg.org/docs/formal/03-10-01.pdf

MGED Ontology Microarray Gene Expression Data Ontology 1.1.8 (April 2004) http://mged.sourceforge.net/ontologies/MGEDontology.php

caBIG compatibility guidelines http://cabig.nci.nih.gov/guidelines_documentation/caBIG_Compatibility_Document

6

7

8

9

10

class CellLineDatabasenamespace:

http://mged.sourceforge.net/ontologies/MGEDOntology.daml#

documentation:Database of cell line information.

type: primitive

superclasses: Database

used in classes: CellLine

used in individuals: ATCC_CulturesCABRI_Human_and_Animal_Cell_lines

class TechnologyTypenamespace:

http://mged.sourceforge.net/ontologies/MGEDOntology.daml#

documentation:The technology type or platform of the reporters on the array.

type: primitive

superclasses: ArrayDesignPackage

used in classes: FeatureGroup

used in individuals: in_situ_oligo_featuresspotted_antibody_featuresspotted_colony_featuresspotted_ds_DNA_featuresspotted_protein_featuresspotted_ss_oligo_features

12

13

14

15

16

17

18

19

20

caArray Phase 2caArray 1.2 (June 2005)

•Support for additional file formats via a software toolkit•Public search without login•Copy bio sample information

caArray 1.5 (September 2005)•XpressionWay, pathway visualization tool•Integration with caDSR 3.0

caArray 1.7 (December 2005)•Store filtered and normalized data•User management user interface

caArray 2.0 (March 2006)•Embedded MAGE-ML validation

All releases:Defect fixes and usabilityenhancements

Acknowledgements

NCICB

Sue Dubman, Mervi Heiskanen, Xioapeng Bian, Subha Madhavan, Carl Schaefer, Gilberto Fragoso, Denise Warzel…

and Ken Buetow

NCICB/SAIC

Development team:Hangjiong ChenScott GustafsonJuergen LorenzJohn MoySumeet MujuBeth NeubergerPhu TranJim ZhouQA: Durga AddepalliAndrew ShinoharaYe Wu

NCICB/TerpSysDon Swan, Jamie Keller

Research Triangle InstituteDavid Hall (webCGH)

22

caARRAY’s Architecture

Credits toSumeet MujuPhu Tran

23

caArray ArchitectureTOMCAT WEBCONTAINER

MAGE-MLExperiment and ArrayDesign

BROWSER

FTP APPLETNATIVE DATAFILE

FTP STAGING AREA

DATATRANSFER

OBJECT(DTO)

SERVLET

JSP ST

RU

TS

EJB CONTAINER

VOCABMGR EJB

SECURITYMGR EJB

VOCABINTERFACE

SECURITYOBJECTS

OBJECTRELATIONAL

BRIDGE(OJB)

caARRAYDB

SECURITYDB

NETCDF API

MAGE-MLIMPORTER MDB

FILE UPLOADERMDB

caCORE------------

caBIOcaDSR

EVS

MAGE-OM APIJAR

MAGE-OMOBJECTS

MAGE-OMRMI MGR

NETCDF API

MAGE-OMPERSISTENCE

PROTOCOLMGR EJB

EXPERIMENTMGR EJB

OTHERMGR EJB

MAGEMANAGER

MA

GE

-ST

K(

MA

GE

OB

JE

CT

S )

FILE SHARE

24

caArray Interfaces: caArray EJB API

caArrayEJB API: Provides transaction control, asynchronous processes,service location, common security and distributed capabilities for submission and retrieval of Microarray Experiments. The caArray presentation layer utilizes the

above functionality via the caArrayEJB API. Data Transfer Objects (DTOs) utilized to

transfer data between calling application and the EJBs.

APIs can be used for federated access and submission of transaction data.

25

caArray Interfaces: Mage-OM API MAGE-OM API :Provides fine grain search

and retrieval of all caArray data via a caBIO-like RMI based API. The MAGE-OM API maps the MAGE objects to

the new caArray database schema. RMI Security module incorporated for

user/group level data access. NetCDF API logic incorporated for faster

retrieval of data Built to be grid enabled

caArray Middleware Data Representation

Data Transfer Objects (DTO) MicroArray Gene Expression Software Toolkit (MAGE-stk) DTO - MAGE-stk Conversion

Data Persistence Data Access Layer

ObJectRelationalBridge (OJB) OJB Abstraction Layer and Data Access Objects (DAO)

EJB Layer Stateless Session Façade Bean-managed Persistence

NETCDF Files Large Data Set Fast Binary Access

MAGE-ML Import and Export Message-Driven Beans

<MAGE-ML identifier="gov.nih.nci.ncicb.caarray:MAGEML:123:1"> <AuditAndSecurity_package> <Contact_assnlist> <Person identifier="gov.nih.nci.ncicb.caarray:Person:456:1" lastName="Doe" firstName="John"> </Person> <Contact_assnlist> </AuditAndSecurity_package> <Experiment_package> <Experiment_assnlist> <Experiment identifier="gov.nih.nci.ncicb.caarray:Experiment:789:1" name=“Sample Experiment"> <Descriptions_assnlist> <Description text="This is a sample experiment."></Description> </Descriptions_assnlist> <Providers_assnreflist> <Person_ref identifier="gov.nih.nci.ncicb.caarray:Person:456:1"/> </Providers_assnreflist> </Experiment> </Experiment_assnlist> </Experiment_package></MAGE-ML>

MAGE-ML Import and Export: An Example

Identifiable element

Referenced Identifiable element to be resolved

MAGE-ML Import and Export

Modified from the MAGE-stk’s MAGE-ML SAX-based parser to include a persistence mechanism to insert, update and resolve (look up) parsed objects

Any valid MAGE-ML can be imported. MAGE-ML is assumed valid. Validation is typically done using ArrayExpress’s MAGEValidator

Identifiable objects are first resolved from database by matching their identifier, and if resolved the in-coming object is updated against the existing one

Identifier represents the globally unique key of a MAGE object across domains for its entire lifecycle

Identifier is separate from persisted MAGE-stk object’s primary key which is only internal to caARRAY

29

MAGE-ML Export

The entire object graph of an object, e.g., ArrayDesign, Experiment, is traversed to collect all Identifiable objects

The MAGE-stk’s MAGEJava object is utilized to contain all the Identifiable objects collected

When an Identifiable object is encountered, the appropriate method in the MAGEJava object is discovered and invoked using reflection to store the object into it

Ultimately MAGEJava.writeMAGEML(Writer) is invoked to recursively invoke the same method of all the contained Identifiable objects.

Xerces’s XMLSerializer pretty-formats the XML content as it is being written with appropriate new lines and indentations

30

A caArray Configuration

NCICB

caArray 1

caBIO

caDSR / EVS

Security

caBIO

caDSR / EVS

NCICB Security

caWorkbench

caWorkbench

caArrayschema

caArrayschema

MAGE-OM API

MAGE-OM API

MAGE-ML GRID(future)

caARRAY EJB

caARRAY EJB

JAVAAPP

31

webCGHA web application for the visualization and analysis of array-based CGH and gene expression data

David Hall, Ph.D.Research Triangle Institute

32

arrayCGH

33

webCGH Functions

Visualization of copy number and gene expression levels

Interrogation of genome features Data normalization and analysis Virtual experiments

34

Whole-genome View

35

Ideograms

36

Chromosome 17

37

Chromosome 17

38

Zoom

39

Annotated Genes

40

Gene List

41

Gene Watch

42

Data Flow

Database

Transformer

CacheAnalytical Pipeline

Plot Generator

Database

Adaptor Adaptor

Op Op Op Op

43

Analytical Pipelines

44

Architecture

caArray

Cloudscape

POJOs

StrutsJSPsDAO

Cache

Web Container (Tomcat)

Client

(HTML, SVG)

45

Key Design Features

+perform(in data)+validate(in data)

«interface»AnalyticOperation

«interface»FilterOperation

«interface»NormalizationOperation

«interface»SummaryStatisticalOperation

46

Key Design Features

«interface»DaoFactory

«interface»Authenticator

«interface»ArrayExperimentDao

«interface»UserProfile

«interface»AnnotationDao

creates creates creates

creates usesuses

47

Past, Present, Future

Dec. 2003 – Version 1.0 Basic plots, analytics, GEDP

March 2005 – Version 2.0 More plots, analytics, caArray

Late April 2005 – Version 2.1 Mouse/human plots CGH/gene expression SKY/M-FISH&CGH integration

48

webCGH Team NCICB

Mervi Heiskanen RTI

David Hall Vesselina Bakalov Ying Chen Matt Westlake Bing Liu Laxminarayana Ganapathi Sheping Li Stuart Allen

top related