adaptive computing on the grid using apples

Adaptive Computing on Adaptive Computing on the Grid Using AppLeSthe Grid Using AppLeS

Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman,

Silvia Figueira, Jim Hayes, Graziano Obertelli, Jennifer Schopf, Gary Shao, Shava Smallen,Neil Spring, Alan Su, and Dmitrii Zagorodnov

IEEE Transactions on Parallel and Distributed Systems, Vol. 14, No. 5, May 2003

AgendaAgenda

• Introduction

• Problems

• AppLeS and its components

• Result products

• Related works

• Discussions

• Conclusions

IntroductionIntroduction

• What is a Grid?– A collection of resources that can be used as

an ensemble

• What are resources?– Computational devices, networks, online

instruments, storage archives, and etc

ProblemsProblems

• Heterogeneity– Different performance

• Inconsistentcy– Shared– Fail– Upgraded

AppLeS ProjectAppLeS Project

• Application Level Scheduling

• Goals– Investigate adaptive scheduling for Grid

computing– Apply research results to applications for

validating the efficacy of the approach and extracting Grid performance for the end-user

StepsSteps

(6) ScheduleAdaptation

(1) ResourceDiscovery

(2) ResourceSelection

(3) ScheduleGeneration

(4) ScheduleSelection

(5) ApplicationExecution

Resource DiscoveryResource Discovery

• Depend on the Grid– A List of user’s logins– Resource discovery services of each Grid

Resource SelectionResource Selection

• Simple SARA– Synthetic Aperture

Radar Atlas– Developed by JPL and

SDSC– Provide access to

satellite images distributed in various repositories

– End-to-end available bandwidth is predicted using NWS

Performance ModelingPerformance Modeling

• Jacobi 2D• Main loop

– Loop until convergence– For all matrix entries

Ai,j

• Ai,j = ¼(Ai,j + Ai+1,j + Ai-1,j + Ai,j+1 + Ai,j-1)

– Compute local error

• Model– Ti = Areai * Operi *

AvailCPUi + Ci ; 1 <= I <= p

i,ji-1,j i+1,j

i,j-1

i,j+1

Area - the size of the strip, Oper - execution time to compute one entryAvailCPU - percentage of available CPU, C - Communication time

Scheduling GenerationScheduling Generation

• Complib– A computational biology application– Compare a library of unknown sequences

against a database of “known” sequences using FASTA scoring method

• Parallization– Master/Worker– Work size

• Small unit size (Self-scheduling) - high overhead• Big unit size - load imbalance

AppLeS’s ApprochAppLeS’s Approch

Scheduling AdaptationScheduling Adaptation

• MCell– A computational

neuroscience application

– Study biochemical interactions within living cells at molecular level

– Multiple independent tasks

– Shared input

XSufferageXSufferage

• Based on Sufferage• Sufferage value =

second best - first best

• XSufferage concerns data replication time (zero for locally available)

OutcomeOutcome

• APST - AppLeS Parameter Sweep Template

• AMWAT - AppLeS Master/Worker Application Template

• SA - Supercomputer AppLeS

APSTAPST

• Parameter Sweep Applications– Mostly independent

• Provide– Transparent deployment– Automatic scheduling

• Capabilities– Launching tasks– Moving and storing data– Discovering and monitoring

resources

AMWATAMWAT

• Master/Worker• Provide

– APIs for• Discovering• Scheduling• Predicting

SS - Self-SchedulingFSC - Fixed Size ChunkingGSS - Guided Self-SchdulingTSS - Trapezoidal Self-SchedulingFAC2 - Factoring

SASA

• Space-shared• Moldable jobs• Reduce response

times

Related WorksRelated Works

• Environment– MARS and Dome - Run-time checkpointing environment

• Structure– MARS - SPMD– VDCE and SEA - Task graph– IOS - Real-time, fine-grained, task graph– Dome and SPP - Abstract language

• Dome - SPMD• SPP - Task graph

• Performance model– Depend on program structure

• Objective– Minimize execution time

Related WorksRelated Works

Env Struct Perf Approach

AppLeS Any Any Provided Adaptive

MARS ChkPnt SPMD Statistics Data Dist

Dome ChkPnt SPMD Data Dist Data Dist

VDCE TG Derived List Sched

SPP TG Derived

SEA TG Data Flow Expert Sys

IOS TG Derived GA

GrADS

DiscussionsDiscussions

• Performance of distributed applications depend on both application and platform-specific information

• Storage and service are usually separated

• Communication must be concerned in the model

• Multi-applications environment has not been addressed

ConclusionsConclusions

• AppLeS– An application-level scheduling framework– Provide adaptive, flexible, and reusable

components– being integrated into GrADS for building next

generation Grid applications

• Each part has been demonstrated its improvement

adaptive computing on the grid using apples

Documents

j ai

matrix entries ai

grid performance

computational devices

task graphdome

distributed systems

sea task graphios real

adaptive computing