vouk sdm ahm

1/AHM/Raleigh/Oct05/v3a

Scientific Data Management Center – Scientific Process Automation

Implementing Scientific Process Automation

- from Art to Commodity

Mladen A. Vouk and Terence Critchlow



Overview

From Art to Commodity Component-based System Engineering Kepler vs. CCA vs ??? Domain Specific Virtualization and Service-based System

Engineering



Team (the artists?)

Ilkay Altinas Zhengang Cheng Terence Chritchlow Bertram Ludaesche Brent Marinello Pierre Moualem Steve Parker Elliot Peele Mladen A. Vouk Anthony Wilson Others …(including the R&D of

the Kepler and Ptolemy community, and W/F developers)

John Blondin Doug Swesty Scott Klasky …. Numerous Kepler and Ptolemy

users and apps

Development and Support (Active) End-Users

Art Commodity

ScientistW/F DevSupportDeveloper

IT Science



Cyberinfrastructure

“Cyberinfrastructure makes applications dramatically easier to [use,]

develop and deploy, thus expanding the feasible scope of

applications possible within budget and organizational constraints,

and shifting the [educator’s,] scientist’s and engineer’s effort

away from information technology (development) and

concentrating it on [knowledge transfer, and] scientific and

engineering research. Cyberinfrastructure also increases efficiency,

quality, and reliability by capturing commonalities among application

needs, and facilitates the efficient sharing of equipment and

services.”(from the Appendix of the Report of the National Science Foundation Blue-Ribbon Advisory

Panel on Cyberinfrastructure, Jan 2003)



Rising Expectations

Delivery of Service with Minimization of Information Technology (IT) Overhead

Move away from the specific resources (e.g., h/w, s/w, net, storage, and app), to the ability to achieve one’s basic mission (learning, teaching, research, outreach, administration, etc.)

Utility-like (appliance-like) on-demand access to needed IT. Use of IT-based solutions moves from a fixed location (such as an specific lab) and fixed resources (e.g., particular operating system) to a (mobile) personal access device a scientist (e.g., laptop or a PDA or a cell phone) and service-based delivery

Business model behind IT virtualization & services needs to conform to the mission of the institution, as well as realistic resource and/or personnel constraints.



Overview


Engineering



Component-Based System Engineering*

Composition of systems (e.g., practical workflows) from (existing) components

Systems as assemblies of components Development of components as reusable units Facilitation of the maintenance and evolution of systems by

customizing and replacing their components Methods and tools for support of different aspects of

component-based approach Open – source and process issues, organizational and

management issues, coupling, domain, technologies (e.g., component models), component composition issues, tools…

Building Reliable Component-Based Software Systems, Ivica Crnkovic and Magnus Larsson (editors), Artech House Publishers, ISBN 1-58053-327-2,http://www.idt.mdh.se/cbse-book/



Advantages Business: Shorter time-to-market, lower

development and maintenance costs Technical: Increased understandability of (complex)

systems; Increased usability, interoperability, flexibility, adaptability, dependability…

Strategic: Increasing (software) market share Scope: Web- and internet-based applications,

Desktop and office applications, Mathematical and other libraries, Graphical tools, GUI-based applications, etc.

Practical: De-facto standards, e.g., MS COM, .NET, Sun EJB, J2SEE, CORBA Component Model…

Focus on functionality



Some Issues Standards Non-functional requirements Performance including timing, Resource management Dependability, fault-tolerance Domain specific requirements? Skill level needed and IT overhead Changing expectations and scalability … Coupling and synchronization (synchronous vs. asynchronous

processing, loose vs. tight coupling, parallelization, data movement, bottlenecks …)

Provisioning / Security challenges Marketing hype/over-expectations Customer confusion / skepticism Service quality/control issues Demonstrating benefits/ROI Software Licensing Vendor Factor Other …



What is it? (*) The basis is the Component Components can be assembled

according to the rules specified by the component model

Components are assembled through their interfaces

A Component Composition is the process of assembling components to form an assembly, a larger component or an application

Component are performing in the context of a component framework

All parts conform to the component model

A component technology is a concrete implementation of a component model

c1 c2

Middleware

Run-time system

framework

Component Model

(*) From the talk entitled “Component-based development challenges in building reliable systems”, by Crnkovic at IIS 2005, Sept 2005.



Component Component Framework

Platfo

rmP

latform

Co

mp

on

ents

Co

mp

on

ents

RepositoryRepository

Supporting ToolSupporting Tool

(*) From the talk entitled “Component-based development challenges in building reliable systems”, by Crnkovic at IIS 2005, Sept 2005.



Component

A unit of composition Contractually specified interfaces Explicit context dependencies (only?). Can be deployed independently Subject to composition by third party. Confirms a component model which defines interaction and

composition standards Composed without modification according to a composition

standard. Specified as a) black-box (signal/response), b) gray-box

(access to internal states), c) white-box (access to all internals), or d) display-box (see internals, but cannot touch internals).



Principles Reusability (docs, re-use process,

architecture, framework, V&V, …) Substitutability (alternative implementations,

functional equivalence, equivalence on other issues, very precise interfaces and specs, run-time replacement mechanism, V&V, …)

Extensibility (extending system component pool, increasing capabilities of individual components – extensible architecture, resource and new functionality discovery, V&V, …)

Composability (functional, extra-functional, reasoning about compositions, V&V, …)



InputData

HighlyParallelCompute

Output~500x500files

Aggregate to ~500 files (< 10+GB each)

HPSSarchive

Data Depot

Logistic NetworkL-Bone

Local MassStorage 14+TB)

Aggregate to one file (~1 TB each)

VizWall

Viz Client

Local 44 Proc.Data Cluster- data sits on local nodes for weeks

Viz Software

Scientific Workflow Automation (e.g., Astrophysics, v-Desk)In conjunction with John Blondin, NC State UniversityAutomate data acquisition, transfer and visualization of a large-scale simulation at ORNL



InputData


Output~500x500files


HPSSarchive

Data Depot




VizWall

Viz Client


Viz Software




Overview


Engineering



CCA and Kepler CCA is probably more suitable for tightly

coupled applications, especially when all the components are ready within a machine or a local cluster. CCA focuses on high performance parallel and distributed processing.

Kepler is probably more suitable for loosely coupled and diverse components. The component can reside on different and widely separated machines. It is great for service and data that resides with its owner, and are exposed as services. Kepler focuses on process orchestration and control. More conducive of IP protection.



CCA Component Architecture ModelCCA components interact with each

other with a specific CCA framework implementation through standard CCA interfaces. Each component defines its inputs and outputs in Scientific IDL; these definitions are deposited in, and can be retrieved from a repository by using the CCA Repository API. In addition, these definitions serve as input to a proxy generator which generates component stubs: the component-specific parts of GPorts (white box in the picture). The components can also use framework services directly through the CCA Framework Services Interface. The CCA Configuration API ensures that the the components can collaborate with different builders associated with different frameworks.

Chasm - an F90 interoperability library from Los Alamos. Babel/SIDL - an object oriented language interoperabilty interface definition language. CCA Specification - The Common Component Architecture Specification for high performance components. Ccaffeine - a CCA framework compliant with the CCA specification. Ccaffeine GUI - A Graphical User Interface that works with Ccaffeine.



Kepler



Option A: CCA-Aware Actor Actor interacts with CCA

components. Kepler director only needs to pass relevant parameters.

If such actor is not available, each CCA component may require a customized stub.

Modification only in Kepler space.

Maintains tight coupling among CCA components, e.g., for performance.

Kepler Director

CCA aware actor

CCA Com.

CCA Com.

CCA Com.



Option B: CCA Component Service

CCA component exposes an interface (service) that can be directly orchestrated by Kepler.

Requires extra work on all CCA components. This might not be possible on all machines.

Kepler Director

Service actor

CCA Com.

Service

CCA Com.

Service

CCA Com.

Service



Option C: CCA-Aware Service Proxy Kepler and CCA bridged

through a Service Proxy. It translates service requests into interactions.

Advantage: Decouples CCA and Kepler Service Proxy can be very

flexible No Special CCA execution

module is required on the client.

Disadvantage: Single point vulnerability as

all Kepler users are depends on the Proxy. Extra overhead?

Kepler Director

Service actor

CCA Com.

CCA Com.

CCA Com.

CCA Aware Service Proxy



Overview


Engineering



Domain Specific

Klasky FSP W/F Swesty TSI W/F Coleman W/F ChemInfo W/F SciRun Blondin TSI W/F Other …



Fusion Simulation Project Workflow Pilot In conjunction with Scott Klasky, CESP, PPPL, ORNL Automation of

simulation, transfer and analytics of FSP



TSI Workflow I In conjunction with Doug Swesty and Eric Myra, Stony Brook

Automate the transfer of large-scale simulation data between NERSC and Stony Brook



Promoter Identification WorkflowIn conjunction with Matt Coleman, LLNL

Automate the analysis of gene expression data using a combination of web services and local analysis programs



ChemInformatics Workflow In conjunction with Resurgence Project

Automate the management and submission of jobs



SCIRun and Kepler Dataflow IntegrationIncorporate SCIRun computation and visualization with the SPA workflow

engine



InputData


Output~500x500files


HPSSarchive

Data Depot




VizWall

Viz Client


Viz Software




Workflow - Abstraction

Model

SendData

Merge &Backup

To VizWall

Parallel Computation

RecvData

Parallel Visualization

Data Mover Channel(e.g. LORS, BCC, SABUL, FC over SONET

Split & Viz

Web or Client GUIWebServices

Head NodeServices

Head Node

Services

Mass Storage

Fiber C. or Local NFS

ModelMergeBackupMoveSplitViz

ConstructOrchestrate

Monitor/SteerChange

Stop/Start

Control



Astrophysics Workflow (using Ptolemy II framework)



Blondin Workflow V2

Submit Job Merge Transfer

Slicing/Dicing

Viz via Ensight

Browser Display

Supercomputer Cray (Phoenix @ ORNL)

Local Cluster (Orbitty @ NCSU)

User Laptop

Web DB



Basic

A number of processing steps Range of data transfer rates Parallelism Speedup? Implementation (e.g., Scripts? App?) Ease of use (e.g., LORS) Tracking (e.g., DB, Web, provenance) Fault-Tolerance Distributed (for most part)



Runtime Data Collection Each run of the workflow is associated with a runid. The info is

organized around it.



Workflow Screenshot (Swesty)



Screenshot: Log



Screenshot: Running



Generic Actors (Swesty W/F)

This workflow uses a small number of actors repeatedly to deliver a complex behavior 160 instances an actors 18 different types of actors ~100 Expression actor instances 13 boolean switch actor instances 13 array manipulation actor instances 8 ssh actor instances < 30 instances of other actors (ex: sleep, file I/O, etc)



Work in Progress Validation of input data

llsubmit script and config files need to be consistent

Distributed infrastructure Start this from a web page Monitor and control the workflow from a

remote site Incorporation of data analysis

A small workflow driven by the specific simulation results and how Doug wants to visualize the data



Notes

A complex workflow that is typical of many scientific workflows Follows the run simulation, move data, and

analyze data paradigm Much can be done with a set of generic actors:

Expression actor Ssh2Exec actor BooleanSwitch & Array actors

Configuration parameters allow simple adaptation to different environments



Notes (Blondin)

W/F support system needs to be1. Flexible - everything changes often! If such

tools are to be used by application scientists they need to be easy to reconfigure.

2. Detachable - for one reason or another, one component may not work (network not usable, disks full on local end), so one would like to fire up individual parts of the workflow as needed.

3. Fault-tolerant - ideally, the software itself can recognize some faults and correct them (eg, re-attempt file upload).



Key Issue

Very important to distinguish between a custom-made workflow solution and a more cannonical set of operations, methods, and solutions that can be composed into a scientific workflow.

Complexity, skill level needed to implement, usability, maintainability, “standardization” e.g., sort, uniq, grep, ftp, ssh on unix boxes SAS (that can do sorting), home-made sort, LORS, SABUL, bbcp (free, but not standard), etc.



Overview


Engineering



Make everything into a Service

Network-based and On-demand Complements an access/communication

unit of choice Utility-like: ubiquitous, reliable, available,

maintainable Nearly Device-independent May be application level or smaller

granularity (e.g., functions, web-service, grid-service)



Virtualization

Services

Middleware

Hardware

Applications

Provisioning

Operating Systems

To effectively deliver on-demand computing services that are maintainable, scalable, and customizable it is essential that the tiers of virtualization are separated but can be coupled



InputData

HighlyParallelComputer

Output files~500x500


HPSSarchive

Data Depot




VizWall

Viz Client

Local 44 Proc. Data Cluster- data sits on local nodes for weeks

Viz Software

Scientific Workflow Automation (e.g., Astrophysics, v-RP)In conjunction with John Blondin, NC State UniversityAutomate data acquisition, transfer and visualization of a large-scale simulation at ORNL

VCL-based WorkflowOrchestration, StateTracking, Provenance



A Resource “Utility Wall”

V-Desks

V-Flow

(resource

collections)

1-1-1 to

N-M-K

…



vcl.ncsu.edu

Ongoing:-Tipping pt.-Usability-Availability-Pedagogy-…

-Currentlyscaling to 8000+ users- N-M-K



Advantages (2) Easy to use remote access from one's own desktop or

mobile computer in homes, offices, or the local coffee house, bringing the "lab" to you

Full access to a dedicated computing resource (some scheduling choices include monitored root or administrator access). This access is the same or more than what is possible in physical computing laboratory.

Vendor-standard remote access protocols and client software. Eliminates the need for specialized customization of one's own computer and eases updates and maintenance

Platform agnostic (Macs, Win, Linux, …) Extensible to any remotely-accessible desktop systems in

specialized campus labs. Departments can bring their lab to their students.

Protection of Intellectual Property Data Provenance and tracking Fault-Tolerance Higher Security



Issues (1) Communication Coupling (loose, tight, v. tight, code-level) and

Granularity (fine, medium?, coarse) Communication Methods (e.g., ssh tunnels, xmprpc, snmp,

web/grid services,etc.) – e.g., apparently poor support for Cray Storage issues (e.g., p-netcdf support, bandwidth) Direct and Indirect Data Flows (functionality, throughput,

delays, other QoS parameters) End-to-end performance Level of abstraction Workflow description language(s) and exchange issues –

interoperability “Standard” scientific computing “W/F functions”



Issues (2) Problem is currently similar to old-time punched-card job

submissions (long turn-around time, can be expensive due to front end computational resource I/O bottleneck) - need up front verification and validation – things will change

Back-end bottleneck due to hierarchical storage issues (e.g., retrieval from HPSS)

Long term workflow state preservation - needed Recovery (transfers, other failures) – more needed Tracking data and files, provenances Who maintains equipment, storage, data, scripts, workflow

elements? Elegant solutions my not be good solutions from the perspective of autonomy.

EXTREMELY IMPORTANT!!! – We are trying to get out of the business of totally custom-made solutions.