developing scientific applications for distributed ... · pdf filedeveloping scientific...

Shantenu Jha
http://cct.lsu.edu/~sjha & http://saga.cct.lsu.edu/
Developing Scientific Applications for Distributed Infrastructures

Understanding Distributed Applications IDEAS: First Principles Development Objectives
Interoperability: Ability to work across multiple distributed resources
Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently
Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure
Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data
Simplicity: Accommodate above distributed concerns at different levels easily
Challenge: How to develop DA effectively and efficiently with the above as first-class objectives?

SAGA: In a nutshell
There exists a lack of Programmatic approaches that: Provide general-purpose, basic &common grid functionality for
applications and thus hide underlying complexity, varying semantics..
The building blocks upon which to construct consistent higher-levels of functionality and abstractions
Meets the need for a Broad Spectrum of Application: Simple scripts, Gateways, Smart Applications and Production
Grade Tooling, Workflow
Simple, integrated, stable, uniform and high-level interface Simple and Stable: 80:20 restricted scope and Standard Integrated: Similar semantics & style across Uniform: Same interface for different distributed systems
SAGA: Provides Application* developers with units required to compose high-level functionality across (distinct) distributed systems (*) One Persons Application is another Persons Tool

SAGA and Distributed Applications

Taxonomy of Distributed Application Development
Example of Distributed Execution Mode: Implicitly Distributed
HTC of HPC: 1000 job submissions of NAMD the TG/LONI SAGA shell example (cf DESHL)
Example of Explicit Coordination and Distribution Explicitly Distributed
DAG-based Workflows (example of Higher-level API)
Example of SAGA-based Frameworks Pilot-Jobs, Fault-tolerant Autonomic Framework MapReduce, All-Pairs
Note: An application can be developed differently, and thus be in more than one category (e.g. DAG-based workflow)

Generalized Ensemble Methods: Replica and Replica-Exchange
Sampling is the challenge: Long run vs multiple short runs?
Task Level Parallelism Embarrassingly distributable!
Create replicas of initial configuration.
Spawn 'N' replicas over different machine
RE: Run for time t ; Attempt configuration swap. Run for further time t; Repeat till finish
RN
R1 R2 R3
t
hot
300K
Exchange attempts
T
t

Abstractions for Dynamic Execution (1) Container Task
Adaptive: Type A: Fix number of replicas; vary cores assigned to each replica.
Type B: Fix the size of replica; vary number of replicas (Cool Walking) -- Same temperature range (adaptive sampling) -- Greater temperature range (enhanced dynamics)

Abstractions for Dynamic Execution (2) SAGA Pilot-Job (BigJob)

Deployment & Scheduling of Multiple Infrastructure Independent Pilot-Jobs

Distributed Adaptive Replica Exchange (DARE) Multiple Pilot-Jobs on the Distributed TeraGrid
Ability to dynamically add HPC resources. On TG: Each Pilot-Job 64px Each NAMD 16px
Time-to-completion improves No loss of efficiency
Time-per-generation is measure of sampling
Adaptive Replica-Exchange, Phil. Trans of Royal Society A (2009)

IDEAS: Facilitating Novel Execution Modes
Interoperability and Scale-out enable new ways of resource planning and application execution
Deadline-driven scheduling: e.g., Need Workload X done before time Y
Adapt workload distribution and resource utilization to ensure completion
Accepted for IEEE CCGrid10

Characterizing Reservoirs: Permeability and Porosity
Porosity: Measure of capacity (buckets)
Permeability: Measure of flow (pipes)

Results: Scale-Out Performance
Using more machines decreases the TTC and variation between experiments
Using BQP decreases the TTC & variation between experiments further
Lowest time to completion achieved when using BQP and all available resources
Khamra & Jha, GMAC, ICAC09 TeraGrid Performance Challenge Award 2008

SAGA-based Applications: Examples Data-Intensive Computational Bio Research Agenda
Many More Questions than Answers!
SAGA NxM Framework (All-Pairs) Compute Matrix Elements, each is a Task
All-to-All Sequence comparison Control the distribution of Tasks and Data Data-locality optimization via external (runtime) module
SAGA MapReduce Framework: Master-Worker: File-Based &/or Stream-Based
SAGA-based Sphere (Stream based processing)
SAGA-based DAG Execution Extend to support dynamic decision/placement, LB & scheduling
Applications ordered from more to less regular All-Pairs very structured C, D DAG-based applications can be very irregular

DDIA: Some questions SAGA-based All-Pairs
We want to understand: Performance sensitivity to data decomposition, workload
granularity, degree-of-distribution and their interplay Novel relative compute-data placement
Affinity-based, data-access patterns Adv of Interoperability:
Which infrastructure to use, for specific problem? Examine sensitivity to placement techniques
Performance tradeoffs of a DFS compared to regular distribution. Why DFS? Abstract layer between application and local file systems Some examples include
HDFS, GFS, CloudStore an open-source high performance DFS based on Google's distributed filesystem GFS.
Common to load DFS as part of VM/Image Multiple (Open-Source) now available; generally more reliable now

SAGA-based All-Pairs Performance determined at multiple levels
We use a SAGA-based All-Pairs abstraction as representative example Applies an operation on the input data-set such that every possible pair in
the set is input to the operation Compares genome; Image Similarity (Biometrics) [D Thain]
Initial Data Condition: Data maybe distributed across resources, possibly localized
Work Decomposition: Granularity of work-load
8x8 matrix (=64 matrix elements): workload unit 4? 16? 64? Workload to worker mapping
For a fixed data set size, this is equal to number of workers Worker placement
All local? All distributed? I/O saturation? Compute-bound? Network effects?
Dynamic & Irregular Data: Stage of workload to workers binding

Distributed Data Base Line tests
Time curves down, as Nw up
Adding workers eventually becomes ineffective Coordination costs
dominate
Accessing Remote data is expensive

Distributing Workload (Intelligent)
Configurations: Normal & Intelligent
Very simple hueristics: Assign tasks upon lowest transfer time
Intelligence Overhead negligible Implementing Intelligence
~1% time
Different file sizes Scales similarly
Scale out to > 3 resources

DFS or Simple Intelligence? Scale-out Performance

SAGA-MapReduce (GSOC08 Miceli, Jha et al CCGrid09; Merzky, Jha GPC09)
Interoperability: Use multiple infrastructure concurrently Control the NW placement
Dynamic resource Allocation: Map phase different from Reduce
Distribution of data
Ts: Time-to-solution, including data-staging for SAGA-MapReduce (simple file-based mechanism)

Digedag: SAGA Workflow Package
Digedag - prototype implementation of an experimental SAGA-based workflow package, with: An API for programmatically expressing workflows A parser for (abstract or concrete) workflow descriptions An (in-time workflow) planner A workflow enactor (using the SAGA engine)
Use of an integrated API that allows the specification of the node and data dependencies to be specified & removes the need to manual (explicitly) build DAGs
Can accept mDAG output, or Pegasus output
Move back and forth between A & C-DAG;
Facilitates dynamic execution of DAGs

Application Development Phase
Generation & Exec. Planning Phase
Execution Phase
DAG based Workflow Applications Extensibility and Higher-level API

SAGA-based DAG Execution Preserving Performance

Dist. Data-Intensive Applications (DDIA) Case for Frameworks IDEAS
Data inherently distributed Distributed DIA, not just the simple sum of DIA concerns
Multiple, Heterogeneous Infrastructure Decouple Application Development from underlying infrastructure Interoperation, e.g., concurrently cross Grid-Clouds
Support Runtime or Application Characteristics for multiple applications and different infrastructure Support Multiple Programming Models
Master-Worker, but Irregular Support Application-Level Patterns
MapReduce, File Vs stream Support Distributed Affinities

Intelligent Compute-Data Placement
Objective: Intelligence in Compute-Data placement
Strategies: Assignment of workers (statically) determined by lowest Ttransfer
Simple network measures (ping, throughput etc.) for Ttransfer Data pre-s

developing scientific applications for distributed ... · pdf filedeveloping scientific...

Documents