14.1 “grid-enabling” applications copyright b. wilkinson, 2008. this material is the property of...

57
14.1 Grid-enabling” application Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC- Charlotte) and is for the sole and exclusive use of the students enrolled in the Fall 2008 Grid computing course broadcast on the North Carolina Research and Education Network (NCREN) to universities across North Carolina. Oct 29, 2008

Upload: walter-ellis

Post on 19-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.1

“Grid-enabling” applications

Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the sole and exclusive use of the students enrolled in the Fall 2008 Grid computing course broadcast on the North Carolina

Research and Education Network (NCREN) to universities across North Carolina. Oct 29, 2008

Page 2: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Grid-enabling an applicationA poorly defined and understood term.

In my opinion, it does NOT mean simply executing a job of a Grid

platform!

Almost all computer batch programs can be shipped to a remote Grid site and executed in little more that was possible with a remote ssh connection.

Grid-enabling should include utilizing the unique distributed nature of the Grid platform.

14.2

Page 3: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Grid-enabling an application

With that in mind, a simple definition is:

Being able to execute an application on a Grid platform, using the distributed resources

available on that platform.

However, even that simple definition is not agreed upon by everyone!

14.3

Page 4: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

A broad definition that matches our view of Grid enabling applications is:

“Grid Enabling refers to the adaptation or development of a program to provide the capability of interfacing with a grid middleware in order to schedule and utilize resources from a dynamic and distributed pool of “grid resources” in a manner that effectively meets the program’s needs”2

2 Nolan, K., “Approaching the Challenge of Grid-Enabling Applications.,” Open Source Grid & Cluster Conf., Oakland, CA, 2008.

14.4

Page 5: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.5

How does one do “Grid-enabling”?

• Still an open question and in the research domain without a standard approach.

Here will describe various approaches.

Page 6: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

We can divide the use of the computing resources in a Grid into two types:

•Using multiple computers separately to solve multiple problems

•Using multiple computers collectively to solve a single problem

14.6

Page 7: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Using Multiple Computers SeparatelyParameter Sweep Applications

In some domains areas, scientists need to run the same program many times but with different input data.

“Sweep” across parameter space with different values of input parameter values in search of a solution.

Many cases, no readily computed answer and human intervention involved search or design space.

14.7

Page 8: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Parameter Sweep ApplicationsExamples

•A scientist might wish to search for a new drug and needs to try different formulations that might best fit with a particular protein.

•A design engineer might be studying effects of different aerodynamic designs on performance of an aircraft.

•Sometimes aesthetic design process and many possible alternative designs and a human has to choose.

•Sometimes, a learning process - design engineer wishes to understand effects of changing various parameters.

14.8

Page 9: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Parameters in Parameter Sweep

Typically, many parameters that can be altered.

Might be a vast combination of parameter values.

Ideally, some automated way of doing parameter sweep needed that includes both specifying parameter sweep and a way of scheduling individual sweeps across Grid platform.

14.9

Page 10: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Implementing Parameter SweepCan be simply achieved by submitting multiple job description files, one for each set of parameters but that is not very efficient.

Parameter sweep applications so important that research projects devoted to making them efficient on a Grid.

Appears explicitly in job description languages.

14.10

Page 11: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

RSL-2/JDD Example

<count> 5 </count>

causes five instances of job to be submitted.

Simply cause five identical executables submitted.

Four would be pointless unless either:•Code selected actions for each instance, or •different inputs and output files selected for each instance in job description file.

Job description elements usually can be specified to change for each instance.

14.11

Page 12: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

JSDL (version 1)Originally did not have parameter sweep.

Has been (unofficially) extended to incorporate features for parameter sweep.

Two forms of parameter sweep creation identified:

•Enumeration in a list, and•Numerically related arguments.

14.12

Page 13: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Arguments Enumerated in a List

Two additional elements:

•<Parameter> To specify selection of parameters•<Value> To list the values

contained within an <Assignment> element for each assignment.

Multiple/nested assignments for various scenarios:

• Single substitution or • Multiple simultaneous substitutions in different combinations.

14.13

Page 14: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.14

Page 15: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Parameter sweep element selection and substitution

14.15

Page 16: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Selecting XML Element

Expression needed that selects an XML element.

XPath expression -- provides a way to select an XML element in a XML document.

14.16

Page 17: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

XPathSuppose XML document has form:

<a><b>

<c> </c>

</b></a>

XPath expression to identify element :

<c> ... </c>

would be /a/b/c

14.17

Page 18: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

XPath allows for much more expressive forms.

For example suppose multiple tags called <c>:

<a><b>

<c> </c>..<c> </c>

</b></a>

Expression to select 3rd <c> element is /a/b/c[3]

14.18

Page 19: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

To take an example for parameter sweep, consider JSDL job:

<jsdl:JobDefinition>

<jsdl:JobDescription>

<jsdl:Application>

<jsdl-posix:POSIXApplication>

<jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable>

<jsdl-posix:Argument>Hello</jsdl-posix:Argument>

<jsdl-posix:Argument>Fred</jsdl-posix:Argument>

</jsdl-posix:POSIXApplication>

</jsdl:Application>

</jsdl:JobDescription>

</jsdl:JobDefinition>

14.19

Page 20: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

To alter second argument to be Bob, Alice, and Tom (3 sweeps):<jsdl:JobDefinition><jsdl:JobDescription><jsdl:Application><jsdl-posix:POSIXApplication>

<jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable><jsdl-posix:Argument>Hello</jsdl-posix:Argument><jsdl-posix:Argument>Fred</jsdl-posix:Argument>

</jsdl-posix:POSIXApplication></jsdl:Application></jsdl:JobDescription><sweep:Sweep>

<sweep:Assignment><sweep:Parameter>//jsdl-posix:Argument[2]</

sweep:Parameter><sweepfunc:Values>

<sweepfunc:Value>Bob</sweepfunc:Value><sweepfunc:Value>Alice</sweepfunc:Value><sweepfunc:Value>Tom</sweepfunc:Value>

</sweepfunc:Values></sweep:Assignment>

</sweep:Sweep></jsdl:JobDefinition>

14.20

Page 21: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Question

What is the output form the echo programs?

14.21

Page 22: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Using multiple computers collectively to solve a single problem

1. Existing and legacy programs

2. Wrapping program components as services

3. Modifying programs to use Grid API’s

4. Using parallel programming techniques

14.22

Page 23: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

1. Using existing and legacy programs

Data partitioning

Perhaps easiest way to use multiple computers together.

Divide data into parts.

Each computer works on each part.

14.23

Page 24: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Example

BLAST algorithm used in bioinformatics to find statistical matches between gene sequences.

User might submit sequence query that is compared to a very large database of known sequences in order to discover relationships or to match sequence to a gene family.

Databases extremely large (100’s MBytes).

14.24

Page 25: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Partitioning BLAST database

14.25

If just one sequence from user, database partitioned into parts and different computers work on different parts.

Page 26: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Alternatively, if user(s) submitting many queries, submit each query to a different computer having access to whole database

14.26

Page 27: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Legacy Code

14.27

In many cases, Grid users want to re-use their existing programs written in C, C++ or even Fortran if really old.

Documented source code may not be available.

May be pre-packaged by manufacturer so rewriting not an option.

Page 28: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.28

Grid Enabling Legacy Software (GriddLeS)

One project that addresses porting legacy code onto a Grid.

Focuses on file handling

Overloads existing file handling routines and redirects requests to remote locations if required.

Page 29: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.29

Grid Enabling Legacy Software (GriddLeS)

Derived from: http://www.csse.monash.edu.au/~davida/griddles/

Page 30: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

2. Exposing an Application as a Service

• Grid computing has embraced Web service technology so natural to consider its use for accessing applications.

• “Wrap” application code to produce a Web service

• “Wrapping” means application not accessed directly but through service interface

14.30

Page 31: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Web Service Wrapper Approach

14.31

Page 32: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

3. Using Grid Middleware API’s

Incorporate Grid middleware APIs in application code for operations such as:

• File input/output

• Starting and monitoring jobs

• Monitoring and discovery of Grid resources.

14.32

Page 33: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Using Globus API’s

Globus provides suite of services that have APIs (C and Java interfaces) that could be called from the application.

Extremely steep learning curve!!

Literals hundreds, if not thousands, of C and Java routines listed at the Globus site.

No tutorial help and sample usage.14.33

Page 34: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Code using Globus APIs to copy a file (C+

+)

Directly from (van Nieuwpoort) Also in (Kaiser 2004) (Kaiser

2005).

14.34

Page 35: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Using CoG kit API’s

Using CoG kit API’s is at slightly higher level.

Not too difficult but still requires setting up the Globus context.

14.35

Page 36: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

CoG Kit program

to transfer files

14.36

Page 37: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Higher Level Middleware-Independent APIs

Higher level of abstraction than Globus middleware API’s desirable because:

•Complexity of Globus routines

•Grid middleware changes very often

•Globus not only Grid middleware

14.37

Page 38: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Other Grid middlewareIncludes:

• UNICORE (Uniform Interface to Computing Resources)

• gLite (Lightweight Middleware for Grid computing)

– part of EGEE (Enabling Grids for E-sciencE) collaborative.

To give an indication of the rapid changes that occur:

• gLite 3.0.2 Update 43 released May 22, 2008.• gLite 3.1 Update 27 released July 3, 200 6 weeks later.

14.38

Page 39: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Concept of higher-level API’s above

Grid middleware

14.39

Higher-level API’s should expose simple interface not tied to specific version of Grid middleware or even Grid middleware family at all.

Page 40: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.40

Grid Application Toolkit (GAT)

• APIs for developing and executing portable Grid applications that are independent of the underlying Grid infrastructure and available services.

• Developed in 2003-2005 time frame.

Page 41: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.41

Page 42: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.42

Copy a file in GAT/C++(Kaiser, H. 2005)

Page 43: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.43

SAGA(Simple API for Grid Applications)

A subsequent effort made by Grid community to standardize higher level API’s

Page 44: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.44

SAGA Reading a file (C++) (Kielmann 2006)

Page 45: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.45

4. Using Parallel Programming Techniques

Grid computing offers possibility of using multiple computers on a single problem to decrease execution time.

Potential of using multiple computers collectively well known (at least since 1950’s) and obvious.

Suppose a problem divided into p equal parts, with each part executing on a separate computer at same time. Overall time to execute problem using p computers simultaneously would be 1/p th of time on a single computer.

Ideal situation - problems cannot usually be divided into equal parts that can be executed simultaneously.

Page 46: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.46

Message Passing ProgrammingComputers on a Grid connected through Internet or a high performance distributed network.

Suggests programming model similar to that often used in cluster computing.

In cluster computing, computers connected through a local network with computers physically nearby.

For such a structure, a convenient approach - have processes on different computers communicate through message passing.

Message passing generally done using library routines.

Programmer inserts message-passing routines into their code

Page 47: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.47

Message passing concept using library routines

Page 48: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.48

History of Message Passing Routines

PVM (Parallel Virtual Machine)

First highly successful message passing suite of libraries developed at Oak Ridge National Laboratories in late 1980’s by Sunderam

Became widely used in the early 1990’s.

Provided a complete fully implemented set of library routines for message passing.

Page 49: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.49

Subsequent standard specification for message-passing API’s.

MPI developed by the MPI forum, a collective with more that 40 industrial and research organizations.

Whereas PVM was an implementation of message-passing libraries, MPI only specified API interface.

Implementation left to others.

Now many implementations of MPI, most free.

MPI (Message Passing Interface)

Page 50: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.50

MPI version 1 finalized in 1994.

Version 2 of MPI introduced in 1997 with a greatly expanded scope.

MPI version 1 has about 126 routines

MPI version 2 has to about 156 routines.

MPI-2 standard daunting.

However, in most applications only a few MPI routines actually needed.

Suggested that many MPI programs can be written using only about six different routines.

Page 51: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.51

Basic MPI Programming ModelEach process executes same code

User writes a single program compiled to suit processors and all processes started together.

That does not mean that all necessarily do exactly the same actions.

Processes have IDs. IF statements direct processes to perform specific actions e.g.:

if (procID == 0) ... /* do this */;if (procID == 1) ... /* do this */;

.Usually computation constructed as master-slave model:

if (procID == 0) ... /* master do this */;else ... /* all slaves do this */;

Page 52: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.52

#include <stddef.h> #include <stdlib.h> #include "mpi.h" main(int argc, char **argv ) {   char message[20];   int i,rank, size, type=99;   MPI_Status status;   MPI_Init(&argc, &argv);   MPI_Comm_size(MPI_COMM_WORLD,&size);   MPI_Comm_rank(MPI_COMM_WORLD,&rank);   if(rank == 0) {     strcpy(message, "Hello, world");     for (i=1; i<size; i++)       MPI_Send(message,13,MPI_CHAR,i,type,MPI_COMM_WORLD);   }   else     MPI_Recv(message,20,MPI_CHAR,0,type,MPI_COMM_WORLD,&status);   printf( "Message from process =%d : %.13s\n", rank,message);   MPI_Finalize(); }

Sample MPI “Hello world” programWill look at routines in more detail later

Page 53: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.53

Grid-enabling MPI programs• Globus version of MPI available to run MPI jobs across a grid

(MPICH-G2).

http://www.globus.org/grid_software/computation/mpich-g2.php

Message passing can cross sites:

Page 54: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.54

http://www.ngpp.ngp.org.sg/

Page 55: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.55

MPICH-G2 programs

• Ideally one can simply run the MPI job unmodified across the grid.

• However not that simple

Page 56: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

14.56

Problems:

• Firewalls: Need to accommodate firewalls by opening up ports

• Job Schedulers: Each site will have a separate independent local job scheduler, which will mean can guarantee all MPI processes will be operating at different sites at same time to communicate.

• Latency: The delays in messages in transit are much larger and variable between sites (Internet)

Page 57: 14.1 “Grid-enabling” applications Copyright B. Wilkinson, 2008. This material is the property of Professor Barry Wilkinson (UNC-Charlotte) and is for the

Questions

14.57