sdm spa/utah ahm/mar05– nc state 1 on large data-flow scientific workflows: an astrophysics case...

SDM SPA/Utah AHM/Mar05– NC State 1

On Large Data-Flow Scientific Workflows:

An Astrophysics Case Study

Integration of Heterogeneous Datasets using

Scientific Workflow Engineering

Presenter:

Mladen A. Vouk


Team(Scientific Process Automation - SPA)

Sangeeta Bhagwanani (MS student - GUI interfaces)

John Blondin (NCSU Faculty,TSI PI) Zhengang Cheng (PhD student –

services, V&V) Dan Colonnese (MS student, graduated,

workflow grid and reliability issues) Ruben Lobo (PhD student, packaging) Pierre Moualem (MS student, fault-

tolerance)

Jason Kekas (PhD student, Technical Support)

Phoemphun Oothongsap (NCSU, Postdoc, high-throughput flows)

Elliot Peele (NCSU, Technical Support) Mladen A. Vouk (NCSU faculty, SPA

PI) Brent Marinello (NCSU, workflows

extensions, Others …


NC State researchers are simulating the death of a massive star leading to a supernova explosion. Of particular interest is the dynamics of the shock wave generated by the initial implosion of the star which ultimately destroys the star as a highly energetic supernova.


Key Current TaskEmulating “live” workflows


Key Issue

Very important to distinguish between a custom-made workflow solution and a more cannonical set of operations, methods, and solutions that can be composed into a scientific workflow.

Complexity, skill level needed to implement, usability, maintainability, “standardization” e.g., sort, uniq, grep, ftp, ssh on unix boxes

vs. SAS (that can do sorting), home-made sort, SABUL,

bbcp (free, but not standard), etc.


Topic – Computational Astrophysics

Dr. Blondin is carrying out research in the field of Circumstellar Gas-Dynamics. The numerical hydrodynamical code VH-1 is used on supercomputers, to study a vast array of objects observed by astronomers both from ground-based observatories and from orbiting satellites. The two primary subjects under investigation are interacting

binary stars - including normal stars like the Algol binary, and compact object systems like the high mass X-ray binary SMC X-1 - and supernova remnants - from very young, like SNR 1987a, to older remnants like the Cygnus Loop.

Other astrophysical processes of current interest include radiatively driven winds from hot stars, the interaction of stellar winds with the interstellar medium, the stability of radiative shockwaves, the propagation of jets from young stellar objects, and the formation of globular clusters.


InputData

HighlyParallelCompute

Output~500x500files

Aggregate to ~500 files (< 10+GB each)

HPSSarchive

Data Depot

Logistic NetworkL-Bone

Local MassStorage 14+TB)

Aggregate to one file (~1 TB each)

VizWall

Viz Client

Local 44 Proc.Data Cluster- data sits on local nodes for weeks

Viz Software


Workflow - Abstraction

Model

SendData

Merge &Backup

To VizWall

Parallel Computation

RecvData

Parallel Visualization

Data Mover Channel(e.g. LORS, BCC, SABUL, FC over SONET

Split & Viz

Web or Client GUIWebServices

Head NodeServices

Head Node

Services

Mass Storage

Fiber C. or Local NFS

ModelMergeBackupMoveSplitViz

ConstructOrchestrate

Monitor/SteerChange

Stop/Start

Control


Current and Future Bottlenecks

0 50 100 150 200 250 300 350

10SlicesSer

10SlicesPar

100SlicesS

100SlicesP

500SlicesS

500SlicesP

JobWait (hrs)

Run (hrs)

Merge (hrs)

MT

Tranfer (hrs)

Vizsplit

Vizrun

Computing Resources and Computational Speed (1000+ Cray X1 processors, compute times of 30+ hrs, wait time)Storage and Disks (14+ TB, reliable and sustainable transfer speeds 300+ MB/s , AutomationReliable and Sustainable Network Transfer Rates (300+ MB/s)


Bottlenecks (B-specific) Supercomputer, Storage, HPSS, Ensight Memory

Average per job wait time is 24-48 hrs (could be longer if more processors are requested or more time slices are calculated).

One run – a 6 hrs (run time) on Cray X1 currently uses 140

processors, and produces 10 time steps. Each time step has 140 Fortran-binary files (28 GB total). Hence, currently, this is 280 MB per 6hr run. Takes about 300 to 500 slices for full visualization (30 to

50 runs , and about 280x(300 to 500)= 10 to 14 TB of space). The 140 files of a time step are merged into one (1) netcdf file (takes

about 10 min) BBCP the file to NCSU at about 30 MB/s, or about 15 min per time slice

(this can be done in parallel with next time-slice computation). In the

future network transfer speeds and disk access speeds may be an issues.


B-specific Top-Level W/F Operations

Operators: Create W/F (reserve resources), Run Model, Backup Output, PostProcess Output (e.g., Merge, Split), MoveData, AnalyzeData (Viz, other?), Monitor Progress (state, audit, backtrack, errors, provenance), Modify Parameters

States: Modeling, Backup, Postprocessing (A, .. Z), MovingData, Analyzing Remotely

Creators: CreateWF, Model?, Expand Modifiers: Merge, Split, Move, Backup, Start, Stop,

ModifyParameters Behaviors: Monitor, Audit, Visualize, Error/Exception

Handling, Data Provenance, …


Goal: Ubiquitous Canonical Operations for Scientific W/F

Support Fast data transfer from A to B (e.g., LORS,

SABUL, GridFTP, BBCP?, other …) Database access Stream merging and splitting Flow monitoring Tracking, Auditing, provenance Verification and Validation Communication service (web services, grid

services, xmlrpc, etc.) Other …


Issues (1)

Communication Coupling (loose, tight, v. tight, code-level) and Granularity (fine, medium?, coarse)

Communication Methods (e.g., ssh tunnels, xmprpc, snmp, web/grid services,etc.) – e.g., apparently poor support for Cray

Storage issues (e.g., p-netcdf support, bandwidth) Direct and Indirect Data Flows (functionality,

throughput, delays, other QoS parameters) End-to-end performance Level of abstraction Workflow description language(s) and exchange issues

– interoperability “Standard” scientific computing “W/F functions”


Issues (2) Problem is currently similar to old-time punched-card

job submissions (long turn-around time, can be expensive due to front end computational resource I/O bottleneck) - need up front verification and validation – things will change

Back-end bottleneck due to hierarchical storage issues (e.g., retrieval from HPSS)

Long term workflow state preservation - needed Recovery (transfers, other failures) – more needed Tracking data and files - needed Who maintains equipment, storage, data, scripts,

workflow elements? Elegant solutions my not be good solutions from the perspective of autonomy.

EXTREMELY IMPORTANT!!! – We are trying to get out of the business of totally custom-made solutions.


Workflow - Abstraction

Model

SendData

Merge &Backup

To VizWall

Parallel Computation

RecvData

Parallel Visualization

Data Mover Channel(e.g. LORS, SABUL, FC over SONET

Split & Viz

Web or Client GUIWebServices

Head NodeServices

Head Node

Services

Mass Storage

Fiber C. or Local NFS

ModelMergeBackupMoveSplitViz

ConstructOrchestrate

Monitor/SteerChange

Stop/Start

Control

Goal: 2 -3 Gbps TRatesEnd-To-End

Goal: 1TB per Night


Communications

Web/Java-based GUI Web Services for Orchestration - overall and

less than tightly coupled sub-workflows LSF and MPI for parallel computation Scripts – (in this example csh/sh based, could

be Perl, Python, etc.) on local machines – interpreted language

High-level programming language for simulations, complex data movement algorithms, and similar – compiled language

sdm spa/utah ahm/mar05– nc state 1 on large data-flow scientific workflows: an astrophysics case...

Documents