distributed computing with triana on the grid

18
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214 Published online 9 March 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.901 Distributed computing with Triana on the Grid Ian Taylor 1, ,† , Ian Wang 2 , Matthew Shields 2 and Shalil Majithia 2 1 School of Computer Science, Cardiff University, P.O. Box 916, Cardiff CF24 3XF, U.K. 2 School of Physics, Astronomy and Computer Science, Queens Buildings, Cardiff University, 5 The Parade, Cardiff CF24 3YB, U.K. SUMMARY In this paper, we describe Triana, a distributed problem-solving environment that makes use of the Grid to enable a user to compose applications from a set of components, select resources on which the composed application can be distributed and then execute the application on those resources. We describe Triana’s current pluggable architecture that can support many different modes of operation by the use of flexible writers for many popular Web service choreography languages. We further show, that the Triana architecture is middleware-independent through the use of the Grid Application Toolkit (GAT) API and demonstrate this through the use of a GAT binding to JXTA. We describe how other bindings being developed to Grid infrastructures, such as OGSA, can seamlessly be integrated within the current prototype by using the switching capability of the GAT. Finally, we outline an experiment we conducted using this prototype and discuss its current status. Copyright c 2005 John Wiley & Sons, Ltd. KEY WORDS: Triana; distributed systems; Grid; workflow; peer-to-peer 1. INTRODUCTION The current emergence of Grid [1,2] and peer-to-peer (P2P) [3] is an exciting and intriguing development for utilizing the vast distributed resources within today’s Internet. Current predictions indicate there will be over 765 million Internet users by year-end 2005. Each user in this massive network has the CPU capability of more than 100 times that of an early 1990s supercomputer and, surprisingly, GartnerGroup research reveals that over 95% of today’s PC power is wasted. A potentially massively parallel machine could exist for computing many of the highly CPU-intensive engineering and scientific applications. Further, expanding this spectrum into the mobile arena there are currently one billion mobile devices worldwide, a figure which is rising rapidly. The ability to tap into this Correspondence to: Ian Taylor, School of Computer Science, Cardiff University, P.O.Box 916, Cardiff CF24 3XF, U.K. E-mail: [email protected] Copyright c 2005 John Wiley & Sons, Ltd. Received 10 September 2003 Revised 15 March 2004 Accepted 15 March 2004

Upload: ian-taylor

Post on 11-Jun-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed computing with Triana on the Grid

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2005; 17:1197–1214Published online 9 March 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.901

Distributed computing withTriana on the Grid

Ian Taylor1,∗,†, Ian Wang2, Matthew Shields2 andShalil Majithia2

1School of Computer Science, Cardiff University, P.O. Box 916,Cardiff CF24 3XF, U.K.2School of Physics, Astronomy and Computer Science, Queens Buildings,Cardiff University, 5 The Parade, Cardiff CF24 3YB, U.K.

SUMMARY

In this paper, we describe Triana, a distributed problem-solving environment that makes use of theGrid to enable a user to compose applications from a set of components, select resources on which thecomposed application can be distributed and then execute the application on those resources. We describeTriana’s current pluggable architecture that can support many different modes of operation by the use offlexible writers for many popular Web service choreography languages. We further show, that the Trianaarchitecture is middleware-independent through the use of the Grid Application Toolkit (GAT) API anddemonstrate this through the use of a GAT binding to JXTA. We describe how other bindings beingdeveloped to Grid infrastructures, such as OGSA, can seamlessly be integrated within the current prototypeby using the switching capability of the GAT. Finally, we outline an experiment we conducted using thisprototype and discuss its current status. Copyright c© 2005 John Wiley & Sons, Ltd.

KEY WORDS: Triana; distributed systems; Grid; workflow; peer-to-peer

1. INTRODUCTION

The current emergence of Grid [1,2] and peer-to-peer (P2P) [3] is an exciting and intriguingdevelopment for utilizing the vast distributed resources within today’s Internet. Current predictionsindicate there will be over 765 million Internet users by year-end 2005. Each user in this massivenetwork has the CPU capability of more than 100 times that of an early 1990s supercomputer and,surprisingly, GartnerGroup research reveals that over 95% of today’s PC power is wasted. A potentiallymassively parallel machine could exist for computing many of the highly CPU-intensive engineeringand scientific applications. Further, expanding this spectrum into the mobile arena there are currentlyone billion mobile devices worldwide, a figure which is rising rapidly. The ability to tap into this

∗Correspondence to: Ian Taylor, School of Computer Science, Cardiff University, P.O. Box 916, Cardiff CF24 3XF, U.K.†E-mail: [email protected]

Copyright c© 2005 John Wiley & Sons, Ltd.Received 10 September 2003

Revised 15 March 2004Accepted 15 March 2004

Page 2: Distributed computing with Triana on the Grid

1198 I. TAYLOR ET AL.

resource using current technologies is a formidable task that will occupy the research arena for manyyears to come.

In this paper we describe Triana [4], a distributed computing environment that attempts to makeuse of the Grid by allowing users to graphically compose applications from a set of components.Such applications can then be distributed by selecting resources for execution. The current modelsupports both P2P and Grid computing using the dynamic switching capability of the employed GridApplication Toolkit (GAT) [5]. The majority of Grid-enabled applications are still very much at theprototype stage with initiatives worldwide aiming at different users ranging from the scientific world,e.g. medicine, physics, genetics, etc., to commerce. Proposed, differing approaches to the architectureof the Grid, such as the Globus Toolkit, Open Grid Services Architecture (OGSA) and Web services,all contribute to a set of emerging standards that overlap in places and provide different functionality.This mix of technologies often leaves the application programmer confused and in an unstable positionof attempting to guess which ‘horse to back’. Consequently, this slows the development of applicationsfor the Grid as the developers wait to see which technology becomes dominant or the most widelyaccepted. We introduce Triana, an environment that attempts to be middleware technology agnosticby the use of a GAT API that insulates the application programmer from the underlying distributiontechnology. The application programmer codes to a single API providing seamless access to whicheverunderlying technology is required.

We demonstrate Triana’s use through one particular implementation of our toolkit using the JXTAplatform. JXTA is a set of P2P networking protocols that allow any connected device on a network todiscover other devices and communicate. We believe that the P2P architecture has a direct analogy withthe Grid, a peer can be thought of as a node in a Grid and, in fact, in our toolkit uses this fact to hide theparticular implementation we use for a Triana service to communicate with other Triana services ondifferent resources. We show how the toolkit can easily be implemented for other Grid infrastructures,such as OGSA, as they become available and outline a method by which Triana is able to produceBPEL4WS [6], a Web services flow language, output to coordinate Web components as Web services.Finally, we present some preliminary results using the distributed toolkit over JXTA.

2. THE TRIANA SOFTWARE ENVIRONMENT

Triana is an open-source, distributed, platform-independent problem-solving environment (PSE)written in the Java programming language. A PSE is a complete, integrated computing environment forcomposing, compiling, and running applications in a specific area [7]. Triana is intended to be flexibleand can be used in many different scenarios and at many different levels. It can be used as a workflowengine for Grid applications; for connecting data driven Grid components and managing the workflowbetween them, or as a data-analysis system for image-, signal- or text-processing applications, allowinga scientist to quickly apply algorithms to data sets and view results. It can also be used as a high-levelgraphical script editor for creating a number of task-graph or workflow script formats including, butnot limited to, directed acyclic graphs (DAGs) and BPEL4WS (see Figure 1).

Triana was originally developed as a quick-look data analysis system for the GEO600 gravitationalproject [8]. The past year it has been completely redesigned in order to modularize the applicationinto a set of flexible interacting components that can be used to create Triana or any subset thereof.The first and significant difference is that the user interface has been completely disconnected from

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 3: Distributed computing with Triana on the Grid

DISTRIBUTED COMPUTING WITH TRIANA ON THE GRID 1199

Figure 1. The Triana user interface running a simple signal processing network.

the underlying subsystem for both the functionality of the main system and for every Triana unit andits associated user interface. Figure 2 outlines this new architecture. We discuss the integration withGAT in Section 5.1. Clients (i.e. those running a graphical user interface (GUI)) can log into a TrianaControlling Service (TCS), remotely build and run a Triana network and visualize the result (usinga graphing unit) on their device even though the visualization unit itself is run remotely. Users canalso log off without stopping the execution of the network and then log in at a later stage to viewthe progress (perhaps using a different device, e.g. mobile phone, handheld). In this context, Trianacould be used as a visual environment for monitoring the workflow of Grid services or as an interactiveportal by running the TCS as a servlet on the Web server and by running the applet version of theTriana GUI (TGUI). Further, since any Triana network can be run with or without using the TGUI,Triana networks can be run as executables in a stand alone or batch-processing mode, i.e. Triana canbe used as an application designer tool that produces an executable application.

Since Triana is written in Java, it is platform independent and, therefore, Triana services can berun on virtually any hardware and operating system platform. It installs easily with a ‘point-and-click’method and only relies on a Java virtual machine to be present on the target platform.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 4: Distributed computing with Triana on the Grid

1200 I. TAYLOR ET AL.

Figure 2. The distributed architecture of the new Triana system.

3. TRIANA PLUGGABLE ARCHITECTURE

The previous section presented an overview of Triana. In this section, we detail Triana’s pluggablearchitecture. The pluggable architecture allows programmers to customize the system by beingable to plug in their own code at various points within the architecture. This is achieved by thefragmentation of each section of the Triana subsystem and flexibility in the new reader/writerinterfaces. The reader/writer interfaces allow the seamless insertion of new readers and writers forrepresenting tools, task-graphs and GUI commands. Programmers can plug in their code at any of theinsertion points within the system (indicated in Figure 3).

A programmer can now use the system in the three key ways. First, use the TGUI as a front end to astand-alone application. This can be accomplished either by using one of the provided task-graph andcommand writers or by implementing others. We provide two task graph writers and one commandwriter. Task graph writers include BPEL4WS and a proprietary Triana XML format. The commandwriter outputs action commands made by the user during an interactive session, e.g. play, stop, log in,log out, visualize unit GUI, etc. The combination of these allows simulation of user commands from theGUI allowing Triana to run the stand-alone application directly. Second, implement the same interfaces

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 5: Distributed computing with Triana on the Grid

DISTRIBUTED COMPUTING WITH TRIANA ON THE GRID 1201

Figure 3. The Triana pluggable architecture.

that the included task graph readers use in the TCS, another distribution and execution mechanismcould be inserted with its own resource management and scheduling but taking advantage of the remotelog-in features and user interface of the distributed GUI and subsystem. Third, implement the desiredfunctionality of a specific application within the Triana framework as a collection of Triana units.This would involve decomposing the original application to build a number of Triana componentsthat could be combined to provide the original functionality. This approach can take advantage of theservices that Triana provides including type checking, the use of many pre-written Triana units and theseamless distribution of networks using many distribution policies over various types of middleware,e.g. JXTA and OGSA.

Further, there are many different ways of creating a Triana unit in version 3.0. A Triana unitencapsulates functionality and can be a local or a remote tool or a service. Firstly, and most importantly,users do not need to write any code in order to make their Java or C functions into Triana units.Units in the new version can take any Java object as its input (or use a Triana pre-defined one) andcan be automatically wrapped by using the Medli [9] data mediation interface. Medli is a graphicalenvironment for mediating the conversion of data types between the classes of a Java application andthe inputs and outputs of third-party Java or C functions. Medli’s wizard guides the user through themediation (i.e. conversion) of the data types between the Java input classes and the target methodand then from the return object of the target method to the Java output class. It also facilitates themediation between the input and output classes for variables that are not required in the target routine.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 6: Distributed computing with Triana on the Grid

1202 I. TAYLOR ET AL.

In this way, native C (or Java) functions can be compiled, wrapped and mediated allowing the scientistto incorporate their code within a Triana unit within minutes. For simple units, this is often all thatis needed. For more flexible unit implementations, there are many Triana interfaces that can supporta number of features. A Triana unit and its GUI can be defined and written almost completely bythe use of the new ToolMaker application. ToolMaker provides a convenient wizard for creatinga new unit within Triana. It prompts the user for every implementable unit property and for theautomatic construction of its GUI. ToolMaker writes a Java template according to these user choices.The unit programmer inserts the functionality (in the unit’s process() function) for the specificimplementation and the unit is ready to use within the environment.

3.1. Triana pluggable writers and readers

Triana allows the user to read and write workflows in any supported language. At the current time,Triana supports a subset of BPEL4WS and a Triana proprietary language. We have implementedBPEL4WS readers and writers which can read/generate BPEL4WS compliant workflows and returnan internal implementation. We are currently working on the ‘Translator’ component, which willautomatically recognize the format of the supplied workflow and convert this into a PNML format.

4. TRIANA DISTRIBUTION

The basic building block for distributing code within Triana is a service. A Triana service (TS) isa network-enabled entity that provides the capability of running a partial Triana task-graph (in theform of a Triana group). TSs are middleware independent and can discover and communicate witheach other using any GAT binding available making the switching between the underlying middleware(e.g. JXTA and OGSA) transparent (see Section 6). In this section, the method by which this service isrepresented on the Grid is described.

4.1. Triana distributed services

Each TS includes executing a client and server component and a user may connect to a TS using acommand line or GUI. The client can distribute code to many servers, depending on where executionis required.

As seen in Figure 4, there are three distinct components in the Triana implementation: a TS, a TCSand the TGUI. TGUI is a user interface to a TCS. The TGUI can be based either on a command lineor a GUI, and the TCS may be either local or remote. The TGUI was previously part of the TCS,but has recently been de-coupled as described in Section 3. The TGUI provides access to a networkof computers running TS daemons via a gateway TS daemon and allows the user to describe thekind of functionality required of Triana: the module deployment model and data stream pathways.The TCS is a persistent service which controls the execution of the given Triana network. It canchoose to run the network itself or distribute it to any available TS. Therefore, a single TGUI cancontrol multiple Triana networks deployed over multiple CPU resources. A TS is comprised of threecomponents: a client, a server and a command service controller. The client of the TCS in contact withthe TGUI creates a distributed task graph that pipes modules, programs and data to the TS daemons.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 7: Distributed computing with Triana on the Grid

DISTRIBUTED COMPUTING WITH TRIANA ON THE GRID 1203

Figure 4. An outline of the distribution of Triana.

These TSs simply act in server mode to execute the byte code and pipe data to others. Triana servicescan pass data to each other also.

4.2. Distribution policies and mechanisms

In this section we examine the mechanisms used by Triana to distribute tasks across the Grid fromthe workflow definition perspective; in Section 5 we extend this to look at how Triana maps thespecified distribution to an underlying Grid middleware. As Triana is a flexible PSE that can handlediverse application scenarios, the distribution mechanisms within Triana must also be flexible andextensible to diverse application scenarios. However, the users of Triana are generally not interestedin the underlying distribution technology, so the distribution mechanism must also hide underlyingdistribution complexities and be easily applied to user developed scenarios.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 8: Distributed computing with Triana on the Grid

1204 I. TAYLOR ET AL.

Therefore, there are three design constraints we imposed for the distribution mechanism of Triana:

• the visual representation within the TGUI should be able to hide the complexity from the user;• the implementation should be middleware independent: this is essential for flexibility in the

future, e.g. mapping to alternative middleware architectures should be trivial;• allow easy extensibility of new user distribution policies.

The current implementation meets these three constraints. From the user perspective there are tworelatively simple concepts to grasp concerning Triana distribution: group tasks and control tasks. Grouptasks are composite tasks that form the unit of distribution, while control tasks are special tasks appliedto a group that specify the policy that is used to distribute that group. For example, a user wishing todistribute some image-processing tasks so they operated in parallel on different frames from a moviewould simply form a group of the tasks they wished to distribute and attach a HTCParallel (high-throughput computing) control task to that group; Triana will then handle the farming out of these tasksto remote services and piping data to and from these services, hiding this complexity from the user.

In the next sections we examine further the concepts of group tasks and control tasks, looking intohow control tasks annotate the workflow so that it can be connected in a distributed environment andalso manage how data is distributed to the remote services.

4.2.1. Group tasks

The concept of group tasks existed in Triana before the current distribution mechanism was developed.Group tasks are composite tasks formed by the user by grouping one or more tasks within a workflow.In many ways group tasks behave just like individual tasks: both receive data from a number of inputnodes, perform some form of processing on this data, and output the result from their output nodes.In fact, if a user is not interested in the inner workings of a group then they can treat a group task justas they would an individual task.

Group tasks form the unit of distribution in Triana. If a user wishes to distribute a number of tasksthen they create a group from those tasks, and designate that group for distribution through attachinga control task. As the number of tasks in a group is determined by the user this mechanism allowsthe user to determine the granularity of distribution (one task or multiple tasks), and as the user alsochooses which tasks to place in the group they determine which operations within the workflow aredistributed.

4.2.2. Control tasks

As with group tasks, the concept of control tasks existed in Triana before the current distributionmechanism was developed. Control tasks are standard Triana units that are connected into the workflowto receive the input to a group before it is sent to the tasks within that group, and also to receive theoutput from a group before it is passed back to the main workflow. As a control task handles the inputto/output from a group before other tasks it can redirect/manipulate the data as it chooses, the originaluse of control tasks being to loop over the group (i.e. redirect the group’s output back to its input nodesuntil some specified condition is met). Although the control task is connected into the workflow as anystandard unit, in the TGUI the control task’s connections are hidden from the user in order to simplifythe visual representation of workflow.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 9: Distributed computing with Triana on the Grid

DISTRIBUTED COMPUTING WITH TRIANA ON THE GRID 1205

Control

Task

Wave Grapher

GaussianFFT

GaussianFFT

Sub-groups running on

remote services

Tasks running on

local service

Figure 5. An example of a workflow distributed according to a parallel distribution policy.

In distributed Triana control tasks have two roles: firstly, a control task is responsible for specifyingthe rewiring of the workflow to support some distribution topology (such as parallel distributed orpipeline distributed) and, secondly, as with looping control tasks, to redirect/manipulate the data inputto/output from a group. Between the control task specifying the rewiring of the workflow, and thecontrol task receiving the data input to the group, the Triana engine constructs the specified distributedworkflow; thus, when the control task redirects the data it is actually sent to distributed tasks runningon remote Triana services. In the following two sections, we describe how such control tasks annotatethe workflow to specify a distribution topology and how they distribute a group’s data.

4.2.3. Workflow annotation

Each control task is responsible for specifying how the group that it is attached to is distributed.For example, to achieve a parallel distribution the control task would specify that all the tasks withinthe group are replicated on every service, while to achieve a pipeline distribution the control taskwould specify that a single task was placed on each service and the data passed from one remotetask to another. In Figures 5 and 6 we illustrate workflows distributed according to parallel andpipeline distribution policies. Note that, while parallel and pipeline distributions are the most obviousdistribution policies, custom control tasks can easily be implemented to specify application specificdistributions.

To create the distribution specification the control task is passed a copy of its group’s workflowin an internal representation. From this workflow the control task constructs a number of sub-groupsthat it requires to be distributed by Triana; with the control task having the freedom to slice the initialworkflow however it chooses and to replicate tasks over multiple sub-groups if required. We illustratethe sub-groupings control tasks that implementing parallel and pipeline distribution policies wouldcreate in Figures 5 and 6.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 10: Distributed computing with Triana on the Grid

1206 I. TAYLOR ET AL.

Control

Task

Wave Grapher

Gaussian

Sub-groups running on

remote services

Tasks running on

local service

FFT

Figure 6. An example of a workflow distributed according to a pipeline distribution policy.

As well as constructing the sub-groups to be distributed, the control task also has to specify howthese distributed groups are connected together and back to the local service. To do this the controltask creates a unique pipe name for each remote connection, and annotates this name as a parameterin the workflow for both the input and output sub-groups. Thus, when the sub-groups are distributedto remote services the services know the names of the both the input and output pipes they are tryingto establish. For each input connection a service simply creates and advertises a pipe of the givenname, and the service outputting to that connection locates and binds to that pipe; in this way, thedistributed workflow is connected together in the topology specified by the control task. Note that theactual mechanism used to create and locate pipes depends on the underlying Grid middleware beingused (discussed further in Section 6); however, this is not the concern of control tasks. In Triana controltasks are only concerned with specifying the distribution topology and are independent of any particularGrid middleware.

The sub-groups created by the control task are passed back to the Triana engine, which is thenresponsible for locating and setting up a remote Triana service to run each sub-group, and eventuallyrunning the distributed workflow. Again the actual mechanism used to locate Triana services dependson the underlying Grid middleware.

4.2.4. Data distribution

The second role of control tasks is to redirect the data input to/output from a group depending onthe distribution policy being implemented. As mentioned in Section 4.2, control tasks receive the datainput to/output from a group before it is passed to the tasks within that group. Due to the workflowhaving been rewired according to the control task’s distribution policy, data output from the control

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 11: Distributed computing with Triana on the Grid

DISTRIBUTED COMPUTING WITH TRIANA ON THE GRID 1207

task is sent to groups running on remote machines. As can be seen from Figure 6, the control task canmanage the allocation of the items of data to be processed by a distributed group.

The strategy a control task uses to determine how the data is distributed depends on its distributiongoal. An example of a potential distribution strategy could be a control task that receives fragmenteddata, and ensures that each distributed group always processes the same section of data. The control taskperforms a basic form of load balancing in the parallel distribution policy that we have implemented.When a new piece of data arrives the control task polls each output connection to see which is availablefor more data (i.e. has finished processing its previous data). As the data is always sent to the distributedgroup that is ready for more data, the groups running on fast processor machines will end up handlingmore items of data; thus the load is balanced according to processing speed.

4.3. An example distribution

To illustrate the Triana distribution mechanism we have described, in this section we give an exampleof how a user enables the distribution of workflow tasks through the TGUI. In Figure 7, we show anexample user-created workflow containing four tasks: Wave, Gaussian, fast Fourier transform (FFT)and SGTGrapher. In this workflow the user has grouped the Gaussian and FFT tasks into a group taskcalled New Group; this is done simply by the user selecting the tasks they wish to group and choosingthe group option from the edit menu. In our example, the Triana client is running in distributed mode.This means that by right clicking on the group the user can select a distribution tool to apply to thegroup, the two available tools being HTCParallel (parallel) distribution and P2P (pipeline) distribution.Once the user has selected the distribution tool for a group then they simply have to run the workflowalgorithm and the Triana engine will handle the location of remote Triana services and the distributionof the workflow. This distribution can be achieved using various middleware components such as theJXTA or OGSA through the use of the GAT, Such distribution complexities are hidden from the user.

In Figure 8 we show how the workflow from Figure 7 is distributed and reconnected accordingto the P2P (pipeline) distribution policy. In this illustration the Gaussian task has been distributedto one peer and the FFT to another, and remote pipes have been connected between the client andthese peers (shown by the dotted lines). Although this is a simple distribution, the visualization ofthe distributed connections appears to be complex and is of little interest to scientists developing dataanalysis-applications using Triana. However, in Triana these connections are hidden visually, as is theprocess by which the tasks are distributed and the connections created.

4.4. XML distribution

Although in Triana the distribution and reconnection of the workflow is hidden from the user,behind the graphical interface a fully connected distributed task graph is maintained. As discussedin Section 4.2.2, the control task attached to a group is responsible for specifying the rewiring ofthe workflow to support a distribution topology and to do this it reworks the existing workflow intoa number of annotated sub-groups. To actually achieve the distribution each of these annotated sub-groups is serialized by the Triana engine according to the workflow language being supported at thetime (see Section 4), and this serialized form is passed to the service that is to run that subset of theworkflow.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 12: Distributed computing with Triana on the Grid

1208 I. TAYLOR ET AL.

Figure 7. An example of a user distributing a group of tasks via the TGUI.

Figure 8. An example of a group distributed using a P2P (pipeline) distribution policy.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 13: Distributed computing with Triana on the Grid

DISTRIBUTED COMPUTING WITH TRIANA ON THE GRID 1209

<?xml version="1.0" encoding="UTE-8"?><tool><inportnum>1</inportnum><outportnum>1</outportnum><parameters><param name="inNodeTag0"> <value>PIPE[0]-0</value> </param><param name="outNodeTag0"> <value>PIPE[0]-1</value> </param>

</parameters>

<tasks><task><taskname>Gaussian</taskname><package>SignalProc.Algorithms</package><parameters>. . .</parameters>

</task></tasks>

</tool>

Figure 9. A simplified XML serialization of a distributed workflow sub-group.

In Figure 9, we show an XML serialization of a distributed sub-group within Triana; this groupcontains a single Gaussian noise task that corresponds to the sub-group that would be passed to peer 1in Figure 8. Although in Figure 9 the example group only contains a single task it could contain a muchlarger subset of the workflow if specified by the control task. As well as distributing subsets of the initialworkflow to Triana services, creating a fully connected distributed workflow requires linking data pipesbetween the nodes of the distributed groups and back to the control task. The technology used to set-up these connections depends on the underlying Grid middleware being employed (see Section 5);however, the pipes each distributed group creates/binds to is specified in the XML. In the example givenin Figure 9 the distributed pipes are specified by the parameters inNodeTag0 and outNodeTag0(which were appended by the control task); these tell the TS that this group is run on to create an inputpipe named PIPE[0]-0 and an output pipe named PIPE[0]-1.

Similarly, the group distributed to peer 2 in Figure 9 will be appended with a parameter specifyingthat it creates an input pipe named PIPE[0]-1. By using the mechanisms provided by the underlyingGrid middleware to advertise and discover each other’s named pipe (PIPE[0]-1) the data pipebetween the sub-group running on peer 1 and the sub-group running on peer 2 is established withoutconnecting back to a central server. Once all the pipes specified by the control task are linked, then thedistributed sub-groups of the initial workflow form a fully connected distributed workflow.

5. TRIANA PROTOTYPE

The Triana prototype presented here uses a JXTA binding of the GAT interface described in the nextsection. Triana is open-source and can be downloaded at [4]. Apart from the JXTA binding, the Webservices binding has also been implemented recently. Additionally, in order to reduce complexity,

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 14: Distributed computing with Triana on the Grid

1210 I. TAYLOR ET AL.

we do not address the issues of workflow versioning, consistency checking and security in this version,although these are planned as future directions.

5.1. GridLab and the GAT

GridLab is an E.U. funded project that is a collaboration between several different institutions acrossEurope and America. The goal of this project is to produce a GAT interface, which will have twoinitial reference implementations. One will be written in C and the other in Java. The GAT interface isa high-level application-driven API that is used to access core Grid services. It provides applicationdevelopers with a middleware independent API containing the functionality they need from Gridservices. The GAT engine implements the necessary hooks to the underlying middleware. The mainfocus of the project is to build a set of services that enable applications to easily use the functionalityprovided by Web services, OGSA and Globus. Additionally, we are also developing a JXTA binding(see the next section). GridLab services include monitoring, adaptive components, resource brokering,scheduling, security and application management. High-level graphical interface portals are also beingdeveloped for submitting, monitoring and viewing the progress of the user’s application on the Grid.

The GAT interface provides an insulation layer between the application and emerging technologieson the Grid. It has been designed from an application-level perspective and contains only thefunctionality that an application requires for use in a distributed Grid environment. The GATimplements its functionality as a set of adapters. Such adapters can be dynamically switched at run-time depending on the services that are available. For example, a user wishing to transfer a file fromone place to another would issue one GAT file copy command. Within the GAT engine, it could usea GridFTP adapter to perform this functionality, a JXTA pipe or a local ‘cp’ command depending onthe current environment. The same mechanism can be used to switch between the various availablemiddleware bindings, e.g. the same Triana implementation can run within virtual organizations usingthe OGSA binding and also within a P2P environment using the JXTA binding. GridLab uses twoprincipal applications to extract the necessary GAT functionality. These applications are Cactus [10]and Triana. A number of application scenarios have been defined to identify example uses of theseapplications.

5.2. JXTAServe

Project JXTA defines a set of protocols that can be used to construct decentralized P2P applications(JXTA also supports centralized and brokered architectures). A peer in JXTA is any networkeddevice that implements one or more of the JXTA protocols. Peers can be sensors, phones, PDAs,PCs, servers and even supercomputers. The JXTA protocols define the way peers discover andcommunicate with each other and are specified using XML message formats. Such protocols aretherefore programming-language independent and current bindings include Java, C and Python.Each peer operates independently and asynchronously from all other peers, and is uniquely identifiedby a peer ID. A peer group is a collection of cooperating peers providing a common set of services.Groups also form a hierarchical parent–child relationship, in which each group has single parent.

There are six JXTA protocols. The Peer Resolver Protocol (PRP) is the mechanism by which apeer can send a query to one or more peers, and receive a response (or multiple responses) to thequery. The Peer Discovery Protocol (PDP) is the mechanism by which a peer can advertise its own

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 15: Distributed computing with Triana on the Grid

DISTRIBUTED COMPUTING WITH TRIANA ON THE GRID 1211

Triana Service

Triana Network

Wave GrapherControl

Task

Gaussian

JXTA Serve ServiceJXTA Pipe

FFT

JXTA Pipe

Figure 10. An illustration of how JXTAServe maps a Triana group. There is a one-to-one correspondence betweena Triana group unit, a Triana service and a Triana JXTAServe service.

resources, and discover the resources from other peers (peer groups, services, pipes and additionalpeers). The Peer Information Protocol (PIP) is the mechanism by which a peer may obtain statusinformation about other peers, such as state, uptime, traffic load, capabilities. The Pipe BindingProtocol (PBP) is used to connect pipes between peers. The Endpoint Routing Protocol (ERP) is usedto route JXTA messages. Finally, the Rendezvous Protocol (RVP) is the mechanism by which peerscan subscribe or be a subscriber to a propagation service. Within a peer group, peers can be rendezvouspeers, or peers that are listening to rendezvous peers. The RVP allows a peer to send messages to all thelisteners of the service. The RVP is used by the PRP and by the PBP in order to propagate messages.

JXTAServe is a GAT for JXTA. It implements the basic functionality that an application needs touse and the hides JXTA specific details from application developers. Furthermore, since JXTA is anevolving system, JXTAServe provides the stability for our implementation in Triana, i.e. even thoughJXTA may change, the interface to JXTAServe will not. JXTAServe will become the GAT JXTAbinding and our prototype will expedite this development by giving the GridLab project a concreteprototype implementation to work with. JXTAServe implements a service-oriented architecture withinJXTA. It is conceptually similar to a Triana group unit, illustrated in Figure 10.

A JXTAServe service can have one or more input nodes (one is needed for control at least) andcan have zero, one or more output nodes. It advertises its input and output nodes as JXTA pipesand connects between pipes using the virtual communication channel that adapts to the particularcommunication protocol depending on the current operating environment.

6. TRIANA DISTRIBUTED SIMULATION

This section briefly describes a distributed simulation we performed over the Internet between fourmachines located at Cardiff and London. The purpose of the simulation was to test the proof of concept

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 16: Distributed computing with Triana on the Grid

1212 I. TAYLOR ET AL.

Table I. Experimental configuration.

Machine Operating system Location Role

Apple Mac MAC OS 10.2.2 Cardiff University ServerPower PC 933 MHz behind local fire wall

iMac 700 MHz MAC OS 10.2.2 Cardiff University Server

Dell Laptop Windows XP Cardiff University Client (running TrianaPentium III 1.1 MHz GUI) and server

IBM T23 Windows XP Imperial College, London ServerPentium III 1.1 MHz

of our current implementation, the robustness of discovering both local services and those distributedacross the Internet and the ability for Triana services to be executed on a set of heterogeneousresources. Scientific performance tests were not conducted as this would not be appropriate for thislevel of scale and, therefore, we felt that a full production run was outside the scope of this paper.However, we are planning a production simulation across the GridLab test-bed for a galaxy formationimplementation [11] within Triana within the next few months. Details of these findings will be madeavailable on our Web site [12], as and when they are available. Table I below shows the machineswe used within the experiment, along with their operating system, the location and their role in thesimulation.

The simulation involved distributing data using task-farming, i.e. the parallel distribution mechanism(HTC) described in Section 4.2. The data used was data taken from the GEO 600 gravitational WaveDetector (a total of 122 s, sampled at 16 KHz). Thus, for one second of data, there is 128 Kbytes(each is a double) of data being passed between TSs.

The data is loaded using one of the generic Triana components, the Importer. This data is passedto a Triana group unit that implemented the HTC control unit distribution policy for task farming theinput data. The group contains three simple signal-processing units, but is typically representative ofthe type of pre-processing that the GEO group needs to perform. The group consisted of a FFT, a unitto transform the representation into a one-sided frequency representation (using the OneSide unit) anda unit to transform this into a spectrum (using Magnitude). The output of the group was passed into agraphical displaying unit for viewing (using SGTGrapher).

This simulation tested the HTC mechanism using the JXTA binding. The HTC distribution takes theinput data one segment at a time and farms it out to the available servers on the Grid (in this case, fournodes) on a first come first serve basis. When the data has been processed, it is returned to the clientcomputer and reassembled in the correct order that it was received (there is an option to specify theordering of the sequence).

In all, we performed three different tests. In the first run, we used 10 s of this data and distributedacross all four machines. We found that the two Apple Macs running locally processed three packetseach whereas the local machine running the GUI and the machine running remotely in Londonprocessed two. This is hardly surprising since the local machine running the client was also transmitting

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 17: Distributed computing with Triana on the Grid

DISTRIBUTED COMPUTING WITH TRIANA ON THE GRID 1213

the data and therefore had a larger workload and for the remote machine, the data had to travel acrossthe Internet from Cardiff to London.

In the second run, we used the full 122 s of data (two million KSamples, i.e. 16 MBytes) over threeservers, running on the three server machines that were not running the client. The data was transmittedto the various hosts and received back to the client within roughly 30 s and with again the local Macsprocessing more packets overall.

Lastly, we again used the data set but this time used six servers, three running on the one machinein London and one each running on the machines at Cardiff. We wanted to test the robustness of thediscovery mechanism when a number of services were used hosted on a remote server. The run wasa success and all data was transmitted to the various hosts and received back to the client in roughlythe same time as the previous run. This is hardly surprising due to the communication overhead oftransmitting the data across the Internet.

Overall, we were encouraged by the preliminary results shown in the experiments. It will beinteresting to analyze the results when we have more servers at our disposal. We learnt severallessons during the run. Firstly, we outlined the various statistics we would like when we approachour production run, e.g. packets processed by each machine, automatic timing mechanisms for thelocal processing of the data (so that we can identify the ratio between communication and processing)and identification of the packet sequence for easy graphing of the animation of the packet distribution.Further, we identified some minor problems with the current implementation. For example, althoughthe discovery of the services was extremely reliable, the initial setting up of the JXTA pipes for re-routing the connections between the remote servers was not. This failed in 20% of the case for theremote server running in London, although this was reliable for the machines on a local network inCardiff. We believe this is due to the remote time to live (TTL) settings for the JXTA advertisementsand, therefore, further distributed experiments need to be conducted in order to isolate the problem.

7. CONCLUSIONS AND FUTURE WORK

We have demonstrated a prototype Grid-enabled environment that uses P2P services for its resourcediscovery, distribution and execution requirements. The environment is insulated from the underlyingcomplexities and functional flux of Grid computing through the use of an interface layer, theGAT. As OGSA compliant services become available an implementation of the GAT will allow theenvironment to make use of these services with no change to the actual code. The P2P frameworkprovides, we would argue, a useful mechanism for prototyping and exploring Grid service awareapplications that at the moment would only be able to use the pre-release version of OGSA.A preliminary example application run under the environment with some test run timing is includedwhich illustrates some of the potential of the system.

REFERENCES

1. Foster I, Kesselman C, Tuecke S. The anatomy of the Grid: Enabling scalable virtual organizations. International Journalof Supercomputer Applications 2001; 15(3):200–222.

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214

Page 18: Distributed computing with Triana on the Grid

1214 I. TAYLOR ET AL.

2. Foster I, Kesselman C, Nick J, Tuecke S. The physiology of the Grid: An Open Grid services architecture for distributedsystems integration, Open Grid service infrastructure WG, Global Grid Forum, 2002. Available at:http://www.globus.org/research/papers/ogsa.pdf [13 December 2002].

3. Gong L. JXTA: A network programming environment. IEEE Internet Computing 2001; 5:88–95.4. Triana Software Environment. http://www.trianacode.org [February 2005].5. Grid Application Toolkit (GAT) (part of the GridLab project). http://www.gridlab.org [February 2005].6. Thatte S, Curbera F, Goland Y, Klein J, Leymann F, Roller D, Weerawarana S. Business process execution language for

Web Services Version 1.0, 2002. Available at: http://dev2dev.bea.com/techtrack/BPEL4WS.jsp [13 December 2002].7. Gallopoulos E, Houstis EN, Rice JR. Computer as thinker/doer: Problem-solving environments for computational science.

IEEE Computational Science and Engineering 1994; 1(2):11–23.8. GEO 600 Home Page. http://www.geo600.uni-hannover.de/ [February 2005].9. Taylor I, Davies R, Marzi H. Automatic wrapping of legacy code and the mediation of its data. Proceedings of the UK

eScience ‘All Hands Meeting’, Sheffield, 2–4 September 2002 (CD-ROM).10. Cactus Computational Toolkit home page. http://www.cactuscode.org [February 2005].11. Taylor I, Shields M, Philp R. GridOneD: Peer to peer visualization using Triana: A galaxy formation test case. Proceedings

of the UK eScience ‘All Hands Meeting’, Sheffield, 2–4 September 2002 (CD-ROM).12. GridOneD, The GridOneD project home page. http://www.gridoned.org [February 2005].

Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 17:1197–1214