vouk sdm ahm
TRANSCRIPT
1/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Implementing Scientific Process Automation
- from Art to Commodity
Mladen A. Vouk and Terence Critchlow
2/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Overview
From Art to Commodity Component-based System Engineering Kepler vs. CCA vs ??? Domain Specific Virtualization and Service-based System
Engineering
3/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Team (the artists?)
Ilkay Altinas Zhengang Cheng Terence Chritchlow Bertram Ludaesche Brent Marinello Pierre Moualem Steve Parker Elliot Peele Mladen A. Vouk Anthony Wilson Others …(including the R&D of
the Kepler and Ptolemy community, and W/F developers)
John Blondin Doug Swesty Scott Klasky …. Numerous Kepler and Ptolemy
users and apps
Development and Support (Active) End-Users
Art Commodity
ScientistW/F DevSupportDeveloper
IT Science
4/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Cyberinfrastructure
“Cyberinfrastructure makes applications dramatically easier to [use,]
develop and deploy, thus expanding the feasible scope of
applications possible within budget and organizational constraints,
and shifting the [educator’s,] scientist’s and engineer’s effort
away from information technology (development) and
concentrating it on [knowledge transfer, and] scientific and
engineering research. Cyberinfrastructure also increases efficiency,
quality, and reliability by capturing commonalities among application
needs, and facilitates the efficient sharing of equipment and
services.”(from the Appendix of the Report of the National Science Foundation Blue-Ribbon Advisory
Panel on Cyberinfrastructure, Jan 2003)
5/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Rising Expectations
Delivery of Service with Minimization of Information Technology (IT) Overhead
Move away from the specific resources (e.g., h/w, s/w, net, storage, and app), to the ability to achieve one’s basic mission (learning, teaching, research, outreach, administration, etc.)
Utility-like (appliance-like) on-demand access to needed IT. Use of IT-based solutions moves from a fixed location (such as an specific lab) and fixed resources (e.g., particular operating system) to a (mobile) personal access device a scientist (e.g., laptop or a PDA or a cell phone) and service-based delivery
Business model behind IT virtualization & services needs to conform to the mission of the institution, as well as realistic resource and/or personnel constraints.
6/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Overview
From Art to Commodity Component-based System Engineering Kepler vs. CCA vs ??? Domain Specific Virtualization and Service-based System
Engineering
7/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Component-Based System Engineering*
Composition of systems (e.g., practical workflows) from (existing) components
Systems as assemblies of components Development of components as reusable units Facilitation of the maintenance and evolution of systems by
customizing and replacing their components Methods and tools for support of different aspects of
component-based approach Open – source and process issues, organizational and
management issues, coupling, domain, technologies (e.g., component models), component composition issues, tools…
Building Reliable Component-Based Software Systems, Ivica Crnkovic and Magnus Larsson (editors), Artech House Publishers, ISBN 1-58053-327-2,http://www.idt.mdh.se/cbse-book/
8/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Advantages Business: Shorter time-to-market, lower
development and maintenance costs Technical: Increased understandability of (complex)
systems; Increased usability, interoperability, flexibility, adaptability, dependability…
Strategic: Increasing (software) market share Scope: Web- and internet-based applications,
Desktop and office applications, Mathematical and other libraries, Graphical tools, GUI-based applications, etc.
Practical: De-facto standards, e.g., MS COM, .NET, Sun EJB, J2SEE, CORBA Component Model…
Focus on functionality
9/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Some Issues Standards Non-functional requirements Performance including timing, Resource management Dependability, fault-tolerance Domain specific requirements? Skill level needed and IT overhead Changing expectations and scalability … Coupling and synchronization (synchronous vs. asynchronous
processing, loose vs. tight coupling, parallelization, data movement, bottlenecks …)
Provisioning / Security challenges Marketing hype/over-expectations Customer confusion / skepticism Service quality/control issues Demonstrating benefits/ROI Software Licensing Vendor Factor Other …
10/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
What is it? (*) The basis is the Component Components can be assembled
according to the rules specified by the component model
Components are assembled through their interfaces
A Component Composition is the process of assembling components to form an assembly, a larger component or an application
Component are performing in the context of a component framework
All parts conform to the component model
A component technology is a concrete implementation of a component model
c1 c2
Middleware
Run-time system
framework
Component Model
(*) From the talk entitled “Component-based development challenges in building reliable systems”, by Crnkovic at IIS 2005, Sept 2005.
11/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Component Component Framework
Platfo
rmP
latform
Co
mp
on
ents
Co
mp
on
ents
RepositoryRepository
Supporting ToolSupporting Tool
(*) From the talk entitled “Component-based development challenges in building reliable systems”, by Crnkovic at IIS 2005, Sept 2005.
12/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Component
A unit of composition Contractually specified interfaces Explicit context dependencies (only?). Can be deployed independently Subject to composition by third party. Confirms a component model which defines interaction and
composition standards Composed without modification according to a composition
standard. Specified as a) black-box (signal/response), b) gray-box
(access to internal states), c) white-box (access to all internals), or d) display-box (see internals, but cannot touch internals).
13/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Principles Reusability (docs, re-use process,
architecture, framework, V&V, …) Substitutability (alternative implementations,
functional equivalence, equivalence on other issues, very precise interfaces and specs, run-time replacement mechanism, V&V, …)
Extensibility (extending system component pool, increasing capabilities of individual components – extensible architecture, resource and new functionality discovery, V&V, …)
Composability (functional, extra-functional, reasoning about compositions, V&V, …)
14/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
InputData
HighlyParallelCompute
Output~500x500files
Aggregate to ~500 files (< 10+GB each)
HPSSarchive
Data Depot
Logistic NetworkL-Bone
Local MassStorage 14+TB)
Aggregate to one file (~1 TB each)
VizWall
Viz Client
Local 44 Proc.Data Cluster- data sits on local nodes for weeks
Viz Software
Scientific Workflow Automation (e.g., Astrophysics, v-Desk)In conjunction with John Blondin, NC State UniversityAutomate data acquisition, transfer and visualization of a large-scale simulation at ORNL
15/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
InputData
HighlyParallelCompute
Output~500x500files
Aggregate to ~500 files (< 10+GB each)
HPSSarchive
Data Depot
Logistic NetworkL-Bone
Local MassStorage 14+TB)
Aggregate to one file (~1 TB each)
VizWall
Viz Client
Local 44 Proc.Data Cluster- data sits on local nodes for weeks
Viz Software
Scientific Workflow Automation (e.g., Astrophysics, v-Desk)In conjunction with John Blondin, NC State UniversityAutomate data acquisition, transfer and visualization of a large-scale simulation at ORNL
16/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
InputData
HighlyParallelCompute
Output~500x500files
Aggregate to ~500 files (< 10+GB each)
HPSSarchive
Data Depot
Logistic NetworkL-Bone
Local MassStorage 14+TB)
Aggregate to one file (~1 TB each)
VizWall
Viz Client
Local 44 Proc.Data Cluster- data sits on local nodes for weeks
Viz Software
Scientific Workflow Automation (e.g., Astrophysics, v-Desk)In conjunction with John Blondin, NC State UniversityAutomate data acquisition, transfer and visualization of a large-scale simulation at ORNL
17/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Overview
From Art to Commodity Component-based System Engineering Kepler vs. CCA vs ??? Domain Specific Virtualization and Service-based System
Engineering
18/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
CCA and Kepler CCA is probably more suitable for tightly
coupled applications, especially when all the components are ready within a machine or a local cluster. CCA focuses on high performance parallel and distributed processing.
Kepler is probably more suitable for loosely coupled and diverse components. The component can reside on different and widely separated machines. It is great for service and data that resides with its owner, and are exposed as services. Kepler focuses on process orchestration and control. More conducive of IP protection.
19/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
CCA Component Architecture ModelCCA components interact with each
other with a specific CCA framework implementation through standard CCA interfaces. Each component defines its inputs and outputs in Scientific IDL; these definitions are deposited in, and can be retrieved from a repository by using the CCA Repository API. In addition, these definitions serve as input to a proxy generator which generates component stubs: the component-specific parts of GPorts (white box in the picture). The components can also use framework services directly through the CCA Framework Services Interface. The CCA Configuration API ensures that the the components can collaborate with different builders associated with different frameworks.
Chasm - an F90 interoperability library from Los Alamos. Babel/SIDL - an object oriented language interoperabilty interface definition language. CCA Specification - The Common Component Architecture Specification for high performance components. Ccaffeine - a CCA framework compliant with the CCA specification. Ccaffeine GUI - A Graphical User Interface that works with Ccaffeine.
20/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Kepler
21/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Option A: CCA-Aware Actor Actor interacts with CCA
components. Kepler director only needs to pass relevant parameters.
If such actor is not available, each CCA component may require a customized stub.
Modification only in Kepler space.
Maintains tight coupling among CCA components, e.g., for performance.
Kepler Director
CCA aware actor
CCA Com.
CCA Com.
CCA Com.
22/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Option B: CCA Component Service
CCA component exposes an interface (service) that can be directly orchestrated by Kepler.
Requires extra work on all CCA components. This might not be possible on all machines.
Kepler Director
Service actor
CCA Com.
Service
CCA Com.
Service
CCA Com.
Service
23/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Option C: CCA-Aware Service Proxy Kepler and CCA bridged
through a Service Proxy. It translates service requests into interactions.
Advantage: Decouples CCA and Kepler Service Proxy can be very
flexible No Special CCA execution
module is required on the client.
Disadvantage: Single point vulnerability as
all Kepler users are depends on the Proxy. Extra overhead?
Kepler Director
Service actor
CCA Com.
CCA Com.
CCA Com.
CCA Aware Service Proxy
26/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Overview
From Art to Commodity Component-based System Engineering Kepler vs. CCA vs ??? Domain Specific Virtualization and Service-based System
Engineering
27/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Domain Specific
Klasky FSP W/F Swesty TSI W/F Coleman W/F ChemInfo W/F SciRun Blondin TSI W/F Other …
28/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Fusion Simulation Project Workflow Pilot In conjunction with Scott Klasky, CESP, PPPL, ORNL Automation of
simulation, transfer and analytics of FSP
29/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
TSI Workflow I In conjunction with Doug Swesty and Eric Myra, Stony Brook
Automate the transfer of large-scale simulation data between NERSC and Stony Brook
30/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Promoter Identification WorkflowIn conjunction with Matt Coleman, LLNL
Automate the analysis of gene expression data using a combination of web services and local analysis programs
31/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
ChemInformatics Workflow In conjunction with Resurgence Project
Automate the management and submission of jobs
32/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
SCIRun and Kepler Dataflow IntegrationIncorporate SCIRun computation and visualization with the SPA workflow
engine
33/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
InputData
HighlyParallelCompute
Output~500x500files
Aggregate to ~500 files (< 10+GB each)
HPSSarchive
Data Depot
Logistic NetworkL-Bone
Local MassStorage 14+TB)
Aggregate to one file (~1 TB each)
VizWall
Viz Client
Local 44 Proc.Data Cluster- data sits on local nodes for weeks
Viz Software
Scientific Workflow Automation (e.g., Astrophysics, v-Desk)In conjunction with John Blondin, NC State UniversityAutomate data acquisition, transfer and visualization of a large-scale simulation at ORNL
34/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Workflow - Abstraction
Model
SendData
Merge &Backup
To VizWall
Parallel Computation
RecvData
Parallel Visualization
Data Mover Channel(e.g. LORS, BCC, SABUL, FC over SONET
Split & Viz
Web or Client GUIWebServices
Head NodeServices
Head Node
Services
Mass Storage
Fiber C. or Local NFS
ModelMergeBackupMoveSplitViz
ConstructOrchestrate
Monitor/SteerChange
Stop/Start
Control
35/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Astrophysics Workflow (using Ptolemy II framework)
36/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Blondin Workflow V2
Submit Job Merge Transfer
Slicing/Dicing
Viz via Ensight
Browser Display
Supercomputer Cray (Phoenix @ ORNL)
Local Cluster (Orbitty @ NCSU)
User Laptop
Web DB
37/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Basic
A number of processing steps Range of data transfer rates Parallelism Speedup? Implementation (e.g., Scripts? App?) Ease of use (e.g., LORS) Tracking (e.g., DB, Web, provenance) Fault-Tolerance Distributed (for most part)
38/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Runtime Data Collection Each run of the workflow is associated with a runid. The info is
organized around it.
39/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Workflow Screenshot (Swesty)
40/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Screenshot: Log
41/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Screenshot: Running
42/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Generic Actors (Swesty W/F)
This workflow uses a small number of actors repeatedly to deliver a complex behavior 160 instances an actors 18 different types of actors ~100 Expression actor instances 13 boolean switch actor instances 13 array manipulation actor instances 8 ssh actor instances < 30 instances of other actors (ex: sleep, file I/O, etc)
43/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Work in Progress Validation of input data
llsubmit script and config files need to be consistent
Distributed infrastructure Start this from a web page Monitor and control the workflow from a
remote site Incorporation of data analysis
A small workflow driven by the specific simulation results and how Doug wants to visualize the data
44/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Notes
A complex workflow that is typical of many scientific workflows Follows the run simulation, move data, and
analyze data paradigm Much can be done with a set of generic actors:
Expression actor Ssh2Exec actor BooleanSwitch & Array actors
Configuration parameters allow simple adaptation to different environments
45/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Notes (Blondin)
W/F support system needs to be1. Flexible - everything changes often! If such
tools are to be used by application scientists they need to be easy to reconfigure.
2. Detachable - for one reason or another, one component may not work (network not usable, disks full on local end), so one would like to fire up individual parts of the workflow as needed.
3. Fault-tolerant - ideally, the software itself can recognize some faults and correct them (eg, re-attempt file upload).
46/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Key Issue
Very important to distinguish between a custom-made workflow solution and a more cannonical set of operations, methods, and solutions that can be composed into a scientific workflow.
Complexity, skill level needed to implement, usability, maintainability, “standardization” e.g., sort, uniq, grep, ftp, ssh on unix boxes SAS (that can do sorting), home-made sort, LORS, SABUL, bbcp (free, but not standard), etc.
47/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Overview
From Art to Commodity Component-based System Engineering Kepler vs. CCA vs ??? Domain Specific Virtualization and Service-based System
Engineering
48/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Make everything into a Service
Network-based and On-demand Complements an access/communication
unit of choice Utility-like: ubiquitous, reliable, available,
maintainable Nearly Device-independent May be application level or smaller
granularity (e.g., functions, web-service, grid-service)
49/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Virtualization
Services
Middleware
Hardware
Applications
Provisioning
Operating Systems
To effectively deliver on-demand computing services that are maintainable, scalable, and customizable it is essential that the tiers of virtualization are separated but can be coupled
50/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
InputData
HighlyParallelComputer
Output files~500x500
Aggregate to ~500 files (< 10+GB each)
HPSSarchive
Data Depot
Logistic NetworkL-Bone
Local MassStorage 14+TB)
Aggregate to one file (~1 TB each)
VizWall
Viz Client
Local 44 Proc. Data Cluster- data sits on local nodes for weeks
Viz Software
Scientific Workflow Automation (e.g., Astrophysics, v-RP)In conjunction with John Blondin, NC State UniversityAutomate data acquisition, transfer and visualization of a large-scale simulation at ORNL
VCL-based WorkflowOrchestration, StateTracking, Provenance
51/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
A Resource “Utility Wall”
V-Desks
V-Flow
(resource
collections)
1-1-1 to
N-M-K
…
52/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
vcl.ncsu.edu
Ongoing:-Tipping pt.-Usability-Availability-Pedagogy-…
-Currentlyscaling to 8000+ users- N-M-K
53/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Advantages (2) Easy to use remote access from one's own desktop or
mobile computer in homes, offices, or the local coffee house, bringing the "lab" to you
Full access to a dedicated computing resource (some scheduling choices include monitored root or administrator access). This access is the same or more than what is possible in physical computing laboratory.
Vendor-standard remote access protocols and client software. Eliminates the need for specialized customization of one's own computer and eases updates and maintenance
Platform agnostic (Macs, Win, Linux, …) Extensible to any remotely-accessible desktop systems in
specialized campus labs. Departments can bring their lab to their students.
Protection of Intellectual Property Data Provenance and tracking Fault-Tolerance Higher Security
54/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Issues (1) Communication Coupling (loose, tight, v. tight, code-level) and
Granularity (fine, medium?, coarse) Communication Methods (e.g., ssh tunnels, xmprpc, snmp,
web/grid services,etc.) – e.g., apparently poor support for Cray Storage issues (e.g., p-netcdf support, bandwidth) Direct and Indirect Data Flows (functionality, throughput,
delays, other QoS parameters) End-to-end performance Level of abstraction Workflow description language(s) and exchange issues –
interoperability “Standard” scientific computing “W/F functions”
55/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation
Issues (2) Problem is currently similar to old-time punched-card job
submissions (long turn-around time, can be expensive due to front end computational resource I/O bottleneck) - need up front verification and validation – things will change
Back-end bottleneck due to hierarchical storage issues (e.g., retrieval from HPSS)
Long term workflow state preservation - needed Recovery (transfers, other failures) – more needed Tracking data and files, provenances Who maintains equipment, storage, data, scripts, workflow
elements? Elegant solutions my not be good solutions from the perspective of autonomy.
EXTREMELY IMPORTANT!!! – We are trying to get out of the business of totally custom-made solutions.
56/AHM/Raleigh/Oct05/v3a
Scientific Data Management Center – Scientific Process Automation