building and running cagrid workflows in taverna 1 computation institute, university of chicago and...
TRANSCRIPT
Building and Running caGrid Workflows in Taverna
1Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
3 School of Computer Science, University of Manchester, Manchester, UK
OVERVIEWFor the empowerment of users from biological or biomedical domains in creating and executing their workflows efficiently, the caGrid Workflow team, with the ICR working group, has selected the Taverna workbench and successfully created a tool suite to orchestrate caGrid Data and Analytical services for ICR workflows. This tool suite aims at providing an easy-to-use workflow authoring and submission tool that will be capable of integrating caGrid services as well as third-party services in scientific workflows. We also helped caGrid community to build several workflows that have real scientific value, and we commit ourselves to support caBIG users across workspaces in creating and executing their domain based workflows.
Web Resources:Taverna: http://taverna.sourceforge.net/
caGrid Plug-in download: http://www.mcs.anl.gov/~wtan/t2/
caBIG: http://www.cagrid.org/mwiki/index.php?title=CaGrid
CaGrid Workflow Quick Start Guide: http://www.cagrid.org/display/workflow/Taverna+Quickstart+Guide
End-to-End Solution for caGrid Workflow
Search caGrid Index Service for registered caGrid services matching various search
criteria:Service name, inputs, outputs, research center, class names, concept codes, etc.
Application: Lymphoma Prediction Workflow*,[1]
• Scientific value
• Use gene-expression patterns associated with Diffuse large B-cell Lymphoma (DLBCL) and Follicular Lymphoma (FL) to predict the lymphoma type of an unknown sample.
• Use GenePattern services SVM and KNN to build the tumor classification model and predict the tumor types of unknown examples.
• Major steps
• Extract Microarray. Querying training data and unknown sample from experiments stored in caArray.
• Preprocess Microarray. Preprocessing, or normalize the microarray data for later processing.
• Predict Lymphoma type. Predicting lymphoma type using SVM & KNN services.
• Extension
• Generalized the lymphoma prediction workflow into a cancer type prediction workflow.
• Applied it on Experiment 236 in caArray database.[2]
caGrid
Cancer Data Standards Repository
DiscoveryComposition
Execution
Reuse
Community
reuse
genera
te
Service discovery based on caDSR.
Data-flow modeling flavor caGrid activity
State management (WSRF)caGrid security
Implicit iteration: handle parallel executionWSRF and security enforcement
Workflow as a service
A Facebook for workflowscaGrid
Cancer Data Standards Repository
Cancer Data Standards Repository
DiscoveryComposition
Execution
Reuse
Community
reuse
genera
te
Service discovery based on caDSR.
Data-flow modeling flavor caGrid activity
State management (WSRF)caGrid security
Implicit iteration: handle parallel executionWSRF and security enforcement
Workflow as a service
A Facebook for workflows
[1]
[1] MA Shipp, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 2002(8)[2] S. Ramaswamy, et al. Multiclass cancer diagnosis using tumor gene expression signatures. PNAS, vol. 98, p. 15149, 2001.
*Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI); Jared Nedzel (MIT)
Log onto a given Grid, configure service’s security properties with caGrid credential.
Lymphoma prediction workflow1. Extract Microarray2. Preprocess Microarray3. Predict Lymphoma Type
Semantic search WSRF Support
Invoke stateful Grid services
caGrid Security Support
Available caGrid WorkflowscaDSR data queryProtein sequence queryMicroarray clusteringLymphoma predictionCancer classificationcaGrid workflows at myExperiment http://www.myexperiment.org/workflows/search?query=cabig
“Facebook” for caGrid workflows
Result of the lymphoma prediction workflow
Result of the cancer type prediction over caArray
Experiment 236