cipres in kepler: an integrative workflow package for streamlining phylogenetic data analyses
DESCRIPTION
CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses. Zhijie Guan 1 , Alex Borchers 1 , Timothy McPhillips 2 , Shirley Cohen 3 , Mark A. Miller 1 , Ilkay Altintas 1 1 San Diego Supercomputer Center, UCSD 2 University of California, Davis - PowerPoint PPT PresentationTRANSCRIPT
biology.sdsc.edu
CIPRes in Kepler: An integrative workflow package for
streamlining phylogenetic data analyses
Zhijie Guan1, Alex Borchers1, Timothy McPhillips2, Shirley Cohen3, Mark A. Miller1, Ilkay Altintas1
1San Diego Supercomputer Center, UCSD2University of California, Davis
3University of Pennsylvania
biology.sdsc.edu
What is a Scientific Workflow? Combination of
data integration, analysis, and visualization steps larger, automated "scientific process"
Mission of scientific workflow systems Promote “scientific discovery” by providing tools and methods to
generate scientific workflows Create an extensible and customizable graphical user interface
for scientists from different scientific domains Support computational experiment creation, execution, sharing,
reuse and provenance Design frameworks which define efficient ways to connect to the
existing data and integrate heterogeneous data from multiple resources
Make technology useful through user’s monitor!!!
biology.sdsc.edu
Promoter Identification Workflow
Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)
biology.sdsc.edu
A Workflow for Phylogeny Analysis
biology.sdsc.edu
Kepler is a Scientific Workflow System
… and a cross-project collaboration June 2, 2006 Beta release
www.kepler-project.orgwww.kepler-project.org
Ptolemy II: A software system used for prototyping engineering systemKEPLER: A platform to design and execute Scientific Workflows
KEPLER = “Ptolemy II + X” for Scientific Workflows
Builds upon the open-source Ptolemy II framework
biology.sdsc.edu
Some Kepler Contributors
Ptolemy IIPtolemy II
Resurgence
Griddles
SRB
LOOKING
SKIDL
NLADR Contributor names and funding info are at the Kepler website!!
Other contributors: - Chesire (UK Text Mining Center) - DART (Great Barrier Reef, Australia) - National Digital Archives + UCSD-TV (US) - …
biology.sdsc.edu
A co-development in KEPLER: GEON Dataset Generation & Registration
SQL database access (JDBC)
% Makefile$> ant run
% Makefile$> ant run
biology.sdsc.edu
Phylogeny Analysis Workflows
Local Disk
MultipleSequenceAlignment
PhylogenyAnalysis
TreeVisualization
biology.sdsc.edu
Kepler Workflow: Actors Actor
Encapsulation of parameterized actions
Interface defined by ports and parameters
Port Communication between input and
output data The place where data get in/out
Model of computation Flow of control Sequential / parallel execution Implementation is a framework
Actor-Oriented Design
biology.sdsc.edu
CIPRes Workflow: Actors
Input Port:Nexus File Content
Data MatrixTree
Taxa InfoOutput Ports:
biology.sdsc.edu
Some actors in place for…• Generic Web Service Client and Web Service Harvester• Customizable RDBMS query and update• Command Line wrapper tools (local, ssh, scp, ftp, etc.) • Some Grid actors-Globus Job Runner, GridFTP-based file access, Proxy Certificate Generator
• SRB support• Native R and Matlab support• Interaction with Nimrod and APST• Communication with ORBs through actors and services• Imaging, Gridding, Vis Support• Textual and Graphical Output• …more generic and domain-oriented actors…
biology.sdsc.edu
CIPRes Workflow
Run ClustalWChoose the input
file
Get the subset of the aligned sequences
Read the treeParse the
treeDisplay the
tree
Run PAUP for Tree Inference
Channel: Convey the data
GUIGen: Parameter Setting
Actor:
Results:
biology.sdsc.edu
CIPRes Workflows: Demo
Read Sequences Multiple Sequence Alignment Display the Alignment
Matrix Alignment Tree Inference Consensus Tree Tree Visualization
biology.sdsc.edu
Summary Kepler is good at:
Integrating data, programs, and computing resources Capturing your ideas and realizing them Supporting computational experiment creation,
execution, sharing, and reuse Quickly prototyping scientific workflows Building streamlining applications
Visual programming language Don’t write your application, “draw”/compose it
Cipres-Kepler package can be used to build scientific workflows for phylogenetic data analyses
biology.sdsc.edu
Future Work Cipres-Kepler can help you There is (always) a lot more to work on:
More actors for phylogeny analyses Automatically generating actors based on CORBA
services Database (TreeBase) support to store large amounts of
data More computing power for large dataset processing
Need your collaboration: Sharing experiences Teaching each other the domain knowledge Locating a specific problem and solving it
biology.sdsc.edu
Questions?
Zhijie [email protected]
Cipres-Kepler Release:
ftp://ftp.sdsc.edu/outgoing/borchers/cipresReleases/20060621/cipresKepler_Dist.tgz