hpjava seminar report

8/2/2019 HPJava Seminar Report

1/37

Seminar Report03 HPJava

Dept. of CSE MESCE, Kuttippuram1

1.IntroductionThe explosion of java over the last year has been driven largely by

its in role in bringing a new generation of interactive web pages to World

Wide Web. Undoubtedly various features of the languages-compactness,

byte code portability, security, and so onmake it particularly attractive

as an implementation languages for applets embedded in web pages. But

it is clear that the ambition of the Java development team go well beyond

enhancing the functionality of HTML documents.

Java is designed to meet the chalanges of application

development on the context of heterogeneous, network-wide distributed

environments. Paramaount amoung these chalanges is secure delivery of

applications that consume the minimum of systems resources, can run on

any hardware and software platform, can be extended dynamically.

Several of these concerns are mirrored in developments in the High

Prerformance Computing world over a number of years. A decade ago

the focus of interest in the parallel computing community was on parallel

hardware. A parallel computer was typically built from specialized

processers through a proprietary high-performance communication

switch. If the machine also had to be programmed in a proprietary

language, that was an acceptable price for the benefits of using a

supercomputer. This attitude was not sustainable as one parallel

architecture gave way to another, and cost of porting software became

exorbitant. For several years now, portability across platforms had been a

central concern in parallel computing.


2/37



HPJava is a programming language extended from Java to support

parallel programming, especially (but not exclusively) data parallelprogramming on message passing and distributed memory systems,

from multi-processor systems to workstation clusters.

Although it has a close relationship with HPF, the design of

HPJava does not inherit the HPF programming model. Instead the

language introduces a high-level structured SPMD programming style--

the HPspmd model. A program written in this kind of language explicitly

coordinates well-defined process groups. These cooperate in a loosely

synchronous manner, sharing logical threads of control. As in a

conventional distributed-memory SPMD program, only a process owning

a data item such as an array element is allowed to access the item

directly. The language provides special constructs that allow

programmers to meet this constraint conveniently.

Besides the normal variables of the sequential base language, the

language model introduces classes of global variables that are stored

collectively across process groups. Primarily, these are distributed arrays.

They provide a global name space in the form of globally subscripted

arrays, with assorted distribution patterns. This helps to relieve

programmers of error-prone activities such as the local-to-global, global-

to-local subscript translations which occur in data parallel applications.

In addition to special data types the language provides special

constructs to facilitate both data parallel and task parallel programming.

Through these constructs, different processors can either work


3/37



simultaneously on globally addressed data, or independently execute

complex procedures on locally held data. The conversion between these

phases is seamless.

In the traditional SPMD mold, the language itself does not provide

implicit data movement semantics. This greatly simplifies the task of the

compiler, and should encourage programmers to use algorithms that

exploit locality. Data on remote processors is accessed exclusively

through explicit library calls. In particular, the initial HPJava

implementation relies on a library of collective communication routines

originally developed as part of an HPF runtime library. Other

distributed-array-oriented communication libraries may be bound to the

language later. Due to the explicit SPMD programming model, low level

MPI communication is always available as a fall-back. The language itself

only provides basic concepts to organize data arrays and process groups.

Different communication patterns are implemented as library functions.

This allows the possibility that if a new communication pattern is needed,

it is relatively easily integrated through new libraries.

2. Overview of HPJava

HPJava stands for high performance java. Java already provides

parallelism through threads. But that model of parallelism can only be

easily exploited on shared memory computers. HPJava is targetted at

distributed memory parallel computers (most likely, networks of PCs and

workstations).


4/37



2.1. HPJava History

Active in the early 1990s, the High Performance Fortran Forumbrought together leading high-performance computing practitioners to

define a common language for data parallel computing. Inspired by the

success of parallel dialects of Fortran such as Connection Machine

Fortran, the resulting high performance Fortran(HPF) language definition

extended the then-standard Fortran 90 with several parallel features.

HPJava project started around 1997, growing out of groups earlier

HPF work.People were starting to talking about java Grande and the

possibility that eventually java could be a high-performance language.

The reason was that the syntax was considerably simpler, and perhaps

there was scope to extend it with the features we wanted for data-parallel

computing.

2.2. HPjava Philosophy

HPJava lifts some ideas directly from HPF, and its domain of

applicability is likely to overlap HPF to a large extent. It has an almost

equivalent model of distributed arrays(the only very significant

difference in this respect is that eventually abandoned the block cyclic

distribution format); you can write HPJava programs that very look like

corresponding HPF programs. But the philosophies of the languages

differ in some significant ways. HPJava was designed in bottom up

manner.


5/37



3. Characterestics

We have explored the practicality of doing parallel computing in

Java, and of providing Java interfaces to High Performance Computing

software. For various reasons, the success of this exercise was not a

foregone conclusion. Java sits on a virtual machine model that is

significantly different to the hardware-oriented model which C or Fortran

exploit directly. Java discourages or prevents direct access to the some of

the fundamental resources of the underlying hardware (most extremely,

its memory).

Which is the better strategy? In the long term Java may become a

major implementation language for large software packages like MPI. It

certainly has advantages in respect of portability that could simplify

implementations dramatically. In the immediate term recoding these

packages does not appear so attractive. Java wrappers to existing

software look more sensible. On a cautionary note, our experience with

MPI suggests that interfacing Java to non-trivial communication packages

may be less easy than it sounds. Nevertheless, we intend in the future to

create a Java interface to an existing run-time library for data parallel

computation.

So is Java, as it stands, a good language for High Performance

Computing?

It still has to be demonstrated that Java can be compiled to code of

efficiency comparable with C or Fortran. Many avenues are being


6/37



followed simultaneously towards a higher performance Java. Besides the

Java chip effort of Sun, it has been reported at this workshop that IBM is

developing an optimizing Java compiler which produces binary codedirectly, that Rice University and Rochester University are working on

optimization and restructuring of bytecode generated by javac, and that

Indiana University is working on source restructuring to parallelize Java.

Parallel interpretation of bytecode is also an emerging practice. For

example, the IBM JVM, an implementation of JVM on shared memory

architectures, was released in spring 1996, and UIUC has recently started

work aimed at parallel interpretation of Java bytecode for distributed

memory systems.

Another promising approach under investigation is to integrate

interpretation and compilation techniques for parallel execution of Java

programs. In such a system, a partially ordered set of interpretive frames

is generated by an II/CVM compiler. A frame is a description of some

subtask, whose granularity may range from a single scalar assignment

statement to a solver for a system of equations. Under supervision of the

virtual machine (II/CVM), the actions specified in a frame may be

performed in one of three ways:

Executed by an interpretive module directly, which alsoincorporates JIT compilation capability.

Some precompiled computational library function is invokedlocally to accomplish the task; this function may be executed

sequentially or in parallel.

The frame is sent to some registered remote system, which will getthe work done, once again either sequentially or in parallel.


7/37



With this approach, optimized binary codes for well formed

computation subtasks exist in runtime libraries, supporting a high level

interpretive environment. Task parallelism is observed among differentframes executed by the three mechanisms simultaneously, while data

parallelism is observed in the execution of some of the runtime functions.

Presuming these efforts satisfactorily address the performance

issue, the second aspect in question concerns expressiveness of the Java

language. Our final interface to MPI is quite elegant, and provides much

of the functionality of the standard C and Fortran bindings. But creating

this interface was a more difficult process than one might hope, both in

terms of getting a good specification, and in terms of making the

implementation work. The lack of features like C++ templates (or any

form of parametric polymorphism) and user-defined operator

overloading (available in many modern languages, from functional

programming languages to Fortran) made it difficult to produce a

completely satisfying interface to a data parallel library. The Java

language as currently defined imposes various limits to the creativity of

the programmer.

In many respects Java is undoubtedly a better language than

Fortran. It is object-oriented to the core and highly dynamic, and there is

every reason to suppose that such features will be as valuable in scientific

computing as in any other programming discipline. But to displace

established scientific programming languages Java will surely have to

acquire some of the facilities taken for granted in those languages.

Popular acclaim aside, there are some reasons to think that Java

may be a good language for scientific and parallel programming .


8/37



Java is a descendant of C++ .C and C++ are used increasingly in

scientific programming; they are already used almost universally by

implementers of parallel libraries and compilers. In recent yearsnumerous variations on the theme of C++ for parallel computing

have appeared.

Java omits various features of C and C++ that are considered

difficultnotably, pointers. Poor compiler analysis has often been

blamed on these features. The inference is that Java, like Fortran,

may be a suitable source language for highly optimizing compilers

(although direct evidence for this belief is still lacking).

Java comes in built in multithreading. Independent threads may be

scheduled on different processor by a suitable runtime. In any case

multithreading can be very convenient in explicit message-passing

styles of parallel programming.

HPJava is a language for parallel programming, especially suitable

for programming massively parallel, distributed memory computers.

3.1. Multidimensional arrays

First we describe a modest extension to Java that adds a class of

true multi-dimensional arrays to the standard Java language. The new

arrays allow regular section subscripting, similar to Fortran 90 arrays.

The syntax described in this section is a subset of the syntax introduced

later for parallel arrays and algorithms: the only motivation for

discussing the sequential subset first is to simplify the overall

presentation. No attempt is made to integrate the new multidimensional

arrays with the standard Java arrays: they are a new kind of entity that

coexists in the language with ordinary Java arrays. There are good

technical reasons for keeping the two kinds of array separate.


9/37



The type-signatures and constructors of the multidimensional

array use double brackets to distinguish them from ordinary arrays:int [[,]] a = new int [[5, 5]] ;

float [[,,]] b = new float [[10, n, 20]] ;

int [[]] c = new int [[100]] ;

a, b and c are respectively 2-, 3- and one- dimensional arrays. Of course c

is very similar in structure to the standard array d, created by

int [] d = new int [100] ;

c and d are not identical, though.

Access to individual elements of a multidimensional array goes

through a subscripting operation involving single brackets, for example

for(int i = 0 ; i < 4 ; i++)

a [i, i + 1] = i + c [i] ;

For reasons that will become clearer in later sections, this style of

subscripting is called local subscripting. In the current sequential context,

apart from the fact that a single pair of brackest may include several

comma-separated subscripts, this kind of subscripting works just like

ordinary Java array subscripting. Subscripts always start at zero, in the

ordinary Java or C style (there is no Fortran-like lower bound).

In general our language has no idea of Fortran-like array

assignments.

int [[,]] e = new int [[n, m]] ;

...

a = e ;


10/37



In the assignment simply copies a handle to object referenced by e

into a. There is no element-by-element copy involved. Similarly we

introduce no idea of elemental arithmetic or elemental functionapplication. If e and a are arrays, the expressions

e + a

Math.cos(e)

are type errors.

Our HPJava does import a Fortran-90-like idea of array regular

sections. The syntax for section subscripting is different to the syntax for

local subscripting. Double brackets are used. These brackets can include

scalar subscripts or subscript triplets.

A section is an object in its own right--its type is that of a suitable

multi-dimensional array. It describes some subset of the elements of the

parent array. This is slightly different to the situation in Fortran, where

sections cannot usually be captured as named entities.

int [[]] e = a [[2, 2 :]] ;

foo(b [[ : , 0, 1 : 10 : 2]]) ;

e becomes an alias for the 3rd row of elements of a. The procedure foo

should expect a two-dimensional array as argument. It can read or write

to the set of elements of b selected by the section. As in Fortran, upper or

lower bounds can be omitted in triplets, defaulting to the actual bound of

the parent array, and the stride entry of the triplet is optional. The

subscripts of e, like any other array, start at 0, although the first element

is identified with a [2, 2].


11/37



In our language, unlike Fortran, it is not allowed to use vectors of

integers as subscripts. The only sections recognized are regular sections

defined through scalar and triplet subscripts.

The language provides a library of functions for manipulating its

arrays, closely analogous to the array transformational intrinsic functions

of Fortran 90:

int [[,]] f = new int [[5, 5]] ;

HPJlib.shift(f, a, -1, 0, CYCL) ;

float g = HPJlib.sum(b) ;

int [[]] h = new int [[100]] ;

HPJlib.copy(h, c) ;

The shift operation with shift-mode CYCL executes a cyclic shift on

the data in its second argument, copying the result to its first argument--

an array of the same shape. In the example the shift amount is -1, and the

shift is performed in dimension 0 of the array--the first of its two

dimensions. The sum operation simply adds all elements of its argument

array. The copy operation copies the elements of its second argument to

its first--it is something like an array assignment. These functions may

have to be overloaded to apply to some finite set of array types, eg they

may be defined for arrays with elements of any suitable Java primitive

type, up to some maximum rank of array. Alternatively the type-

hierarchy for arrays can be defined in a way that allows these functions to

be more polymorphic.


12/37



3.2. Process arrays

HPJava adds class libraries and some additional syntax for dealingwith distributed arrays. These arrays are viewed as coherent global

entities, but their elements are divided across a set of cooperating

processes. As a pre-requisite to introducing distributed arrays we discuss

the process arrays over which their elements are scattered.

An abstract base class Procs has subclasses Procs1, Procs2, ...,

representing one-dimensional process arrays, two-dimensional process

arrays, and so on.

Procs2 p = new Procs2(2, 2) ;

Procs1 q = new Procs1(4) ;

These declarations set p to represent a 2 by 2 process array and q to

represent a 4-element, one-dimensional process array. In either case the

object created describes a group of 4 processes. At the time the Procs

constructors are executed the program should be executing on four or

more processes. Either constructor selects four processes from this set and

identifies them as members of the constructed group.

Procs has a member function called member, returning a boolean

value. This is true if the local process is a member of the group, false

otherwise.

if(p.member()) {

...

}


13/37



The code inside the if is executed only if the local process is a

member p. We will say that inside this construct the active process groupis restricted to p.

The multi-dimensional structure of a process array is reflected in its

set of process dimensions. An object is associated with each dimension.

These objects are accessed through the inquiry member dim:

Dimension x = p.dim(0) ;

Dimension y = p.dim(1) ;

Dimension z = q.dim(0) ;

The object returned by the dim inquiry has class Dimension. The

members of this class include the inquiry crd. This returns the coordinate

of the local process with respect to the process dimension. The result is

only well-defined if the local process is a member of the parent process

array. The inner body code in

if(p.member())

if(x.crd() == 0)

if(y.crd() == 0) {

...

}

will only execute on the first process from p, with coordinates(0,0).

3.3.Distributed arraysSome or all of the dimensions of a multi-dimensional array

can be declared to be distributed ranges. In general a distributed range is


14/37



represented by an object of class Range. A Range object defines a range of

integer subscripts, and defines how they are mapped into a process array

dimension. In fact the Dimension class introduced in the previous sectionis a subclass of Range. In this case the integer range is just the range of

coordinate values associated with the dimension. Each value in the range

is mapped, of course, to the process (or slice of processes) with that

coordinate. This kind of range is also called a primitive range. More

complex subclasses of Range implement more elaborate maps from

integer ranges to process dimensions. Some of these will be introduced in

later sections. For now we concentrate on arrays constructed with

Dimension objects as their distributed ranges.

The syntax of section 2 is extended in the following way to support

distributed arrays

A distributed range object may appear in place of an integer extentin the ``constructor'' of the array (the expression following the new

keyword).

If a particular dimension of the array has a distributed range, thecorresponding slot in the type signature of the array should include

a # symbol.

In general the constructor of the distributed array must be followedby an on clause, specifying the process group over which the array

is distributed. Distributed ranges of the array must be distributed

over distinct dimensions of this group.

Assume p, x and y are declared as in the previous section, then

float [[#,#,]] a = new float [[x, y, 100]] on p ;

defines a as a 2 by 2 by 100 array of floating point numbers. Because the

first two dimensions of the array are distributed ranges--dimensions of p-
http://../extraloginronJava98node2.html#multiDimhttp://../extraloginronJava98node2.html#multiDim


15/37



-a is actually realized as four segments of 100 elements, one in each of the

processes of p. The process in p with coordinates i, j holds the section a

[[i, j, :]].The distributed array a is equivalent in terms of storage to four

local arrays defined by

float [] b = new float [100] ;

But because a is declared as a collective object we can apply

collective operations to it. The HPJlib functions introduced in section 2

apply equally well to distributed arrays, but now they imply inter-

processor communication.

float [[#,#,]] a = new float [[x, y, 100]] on p,

b = new float [[x, y, 100]] on p ;

HPJlib.shift(a, b, -1, 0, CYCL) ;

The shift operation causes the local values of a to be overwritten

with values of b from a processor adjacent in the x dimension.

There is a catch in this. When subscripting the distributed

dimensions of an array it is simply disallowed to use subscripts that refer

to off-processor elements. While this:

int i = x.crd(), j = y.crd() ;

a [i, j, 20] = a [i, j, 21] ;

is allowed, this:

int i = x.crd(), j = y.crd() ;

a [i, j, 20] = b [(i + 1) % 2, j, 20] ;

is forbidden. The second example could apparently be implemented

using a nearest neighbour communication, quite similar to the shift
http://../extraloginronJava98node2.html#multiDimhttp://../extraloginronJava98node2.html#multiDim


16/37



example above. But our language imposes an strict policy distinguishing

it from most data parallel languages: while library functions may

introduce communications, language primitives such as arraysubscripting never imply communication.

If subscripting distributed dimensions is so restricted, why are the

i, j subscripts on the arrays needed at all? In the examples of this section

these subscripts are only allowed one value on each processor. Well, the

inconvience of specifying the subscripts will be reduced by language

constructs introduced later, and the fact that only one subscript value is

local is a special feature of the primitive ranges used here. The higher

level distributed ranges introduced later map multiple elements to

individual processes. Subscripting will no longer look so redundant.

3.4.The on construct and the active process groupif(p.member()) {

...

}

appeared. Our language provides a short way of writing this construct

on(p) {

...

}

In fact the on construct provides some extra value. Informally we

said in section 3 that the active process group is restricted to p inside the

body of the p.member() conditional construct. The language incorporates

a more formal idea of an active process group (APG). At any point of

execution some process group is singled out as the APG. An on(p)
http://../extraloginronJava98node3.html#procArrayshttp://../extraloginronJava98node3.html#procArrays


17/37



construct specifically changes the value of the APG to p. On exit from the

construct, the APG is restored to its value on entry.

Elevating the active process group to a part of the language allows

some simplifications. For example, it provides a natural default for the on

clause in array constructors. More importantly, formally defining the

active process group simplifies the statement of various rules about what

operations are legal inside distributed control constructs like on.

3.5.Other featuresWe have already described most of the important language

features we propose to implement. Two additional features that are quite

important in practice but have not been discussed are subranges and

subgroups. A subrange is simply a range which is a regular section of

some other range, created by syntax like x [0 : 49]. Subranges are created

tacitly when a distributed array is subscripted with a triplet, and they can

also be used directly to create distributed arrays with general HPF-like

alignments. A subgroup is some slice of a process array, formed by

restricting process coordinates in one or more dimensions to single

values. Again they may be created implicitly by section subscripting, this

time using a scalar subscript. They also formally describe the state of the

active process group inside at and over constructs.

The framework described is much more powerful than space

allows us to demonstrate in this paper. This power comes in part from

the flexibility to add features by extending the libraries associated with

the language. We have only illustrated the simplest kinds of distribution


18/37



format. But any HPF 1.0 array distribution format, plus various others,

can be incorporated by extending the Range hierarchy in the run-time

library. We have only illustrated shift and writeHalo operations from thecommunication library, but the library also includes much more powerful

operations for remapping arrays and performing irregular data accesses.

Our intention is to provide minimal language support for distributed

arrays, just enough to facilitate further extension through construction of

new libraries.

For a more complete description of a slightly earlier version of the

proposed language, see

4. Considerations in HPJava language design

This section discuss some design and implementation issues in the

HPJava language. The language is briefly reviewed, then the class library

that forms the foundation of the translation scheme is described. Through

example codes, we illustrate how HPJava source codes can be translated

straightforwardly to ordinary SPMD Java programs calling this library.

This is followed by a discussion of the rationale for introducing the

language in the first place, and of how various language features have

been designed to facilitate efficient implementation.

4.1. Translation scheme

The initial HPJava compiler is implemented as a source-to-source

translator converting an HPJava program to a Java node program, with

calls to runtime functions. The runtime system is built on the NPAC


19/37



PCRC runtime library, which has a kernel implemented in C++ and a

Java interface implemented in Java and C++.

4.1.1. Java packages for HPspmd programming

The current runtime interface for HPJava is called adJava. It

consists of two Java packages. The first is the HPspmd runtime proper. It

includes the classes needed to translate language constructs. The second

package provides communication and some simple I/O functions. These

two packages will be outlined in this section.

The classes in the first package include an environment class,

distributed array ``container classes'', and related classes describing

process groups and index ranges. The environment class SpmdEnv

provides functions to initialize and finalize the underlying

communication library (currently MPI). Constructors call native functionsto prepare the lower level communication package. An important field,

apg, defines the group of processes that is cooperating in ``loose

synchrony'' at the current point of execution.

The other classes in this package correspond directly to HPJava

built-in classes. The first hierarchy is based on Group. A group, or

process group, defines some subset of the processes executing the SPMD

program. Groups have two important roles in HPJava. First they are used

to describe how program variables such as arrays are distributed or

replicated across the process pool. Secondly they are used to specify

which subset of processes execute a particular code fragment. Important

members of adJava Group class include the pair on(), no() used to

translate the on construct.


20/37



Figure 1: The HPJava Group hierarchy

The most common way to create a group object is through the

constructor for one of the subclasses representing a process grid. The

subclass Procs represents a grid of processes and carries information on

process dimensions: in particular an inquiry function dim(r) returns a

range object describing the -th process dimension. Procs is further

subclassed by Procs0, Procs1, Procs2, ...which provide simpler

constructors for fixed dimensionality process grids. The class hierarchy of

groups and process grids is shown in figure1.

The second hierarchy in the package is based on Range. A range is

a map from the integer interval into some process dimension

(ie, some dimension of a process grid). Ranges are used to parametrize

distributed arrays and the overall distributed loop.
http://considerations/node4.html#fig:procshttp://considerations/node4.html#fig:procshttp://considerations/node4.html#fig:procshttp://considerations/node4.html#fig:procs


21/37



Figure 2: The HPJava Range hierarchy

The most common way to create a range object is to use the

constructor for one of the subclasses representing ranges with specific

distribution formats. The current class hierarchy is given in figure 2.

Simple block distribution format is implemented by BlockRange, while

CyclicRange and BlockCyclicRange represent other standard

distribution formats of HPF. The subclass CollapsedRange represents a

sequential (undistributed range). Finally, DimRange represents the range

of coordinates of a process dimension itself--just one element is mapped

to each process.

The related adJava class Location represents an individual location

in a particular distributed range. Important members of the adJava Range

class include the function location(i) which returns the th location in a

range and its inverse, idx(l), which returns the global subscript associated

with a given location. Important members of the Location class include

at() and ta(), used in the implementation of the HPJava that at construct.
http://considerations/node4.html#fig:rangeshttp://considerations/node4.html#fig:rangeshttp://considerations/node4.html#fig:ranges


22/37



Finally in this package we have the rather complex hierarchy of

classes representing distributed arrays. HPJava global arrays declared

using [[ ]] are represented by Java objects belonging to classes such as:Array1dI, Array1cI,

Array2ddI, Array2dcI, Array2cdI, Array2ccI,

...

Array1dF, Array1cF,

Array2ddF, Array2dcF, Array2cdF, Array2ccF,

...

Generally speaking the class Arrayndc...T represents -

dimensional distributed array with elements of type T, currently one of I,

F, ..., meaning int, float, ... . The penultimate part of the class name is a

string of ``c''s and ``d''s specifying whether each dimension is collapsed

or distributed. These correlate with presence or absence of an asterisk in

slots of the HPJava type signature. The concrete Array... classes

implement a series of abstract interfaces. These follow a similar naming

convention, but the root of their names is Section rather than Array (so

Array2dcI, for example, implements Section2dcI). The hierarchy of

Section interfaces is illustrated in figure 3.
http://considerations/node4.html#fig:arrayshttp://considerations/node4.html#fig:arrays


23/37



Figure 3: The adJava Section hierarchy

The need to introduce the Section interfaces should be evident from

the hierarchy diagram. The type hierarchy of HPJava involves a kind of

multiple inheritance. The array type int [[*, *]], for example, is a

specialization of both the types int [[*, ]] and int [[, *]]. Java allows

``multiple inheritance'' only from interfaces, not classes.

Here we mention some important members of the Section

interfaces. The inquiry dat() returns an ordinary one dimensional Java

array used to store the locally held elements of the distributed array. The

member pos(i, ...), which takes arguments, returns the local offset of the

element specified by its list of arguments. Each argument is either a

location (if the corresponding dimension is distributed) or an integer (if it

is collapsed). The inquiry grp() returns the group over which elements of

the array are distributed. The inquiry rng(d) returns the th range of the

array.


24/37



The second package in adJava is the communication library. The

adJava communication package includes classes corresponding to the

various collective communication schedules provided in the NPACPCRC kernel. Most of them provide of a constructor to establish a

schedule, and an execute method, which carries out the data movement

specified by the schedule. The communication schedules provided in this

package are based on the NPAC runtime library. Different

communication models may eventually be added through further

packages.

The collective communication schedules can be used directly by the

programmer or invoked through certain wrapper functions. A class

named Adlib is defined with static members that create and execute

communication schedules and perform simple I/O functions. This class

includes, for example, the following methods, each implemented by

constructing the appropriate schedule and then executing it.

static public void remap(Section dst, Section src)

static public void shift(Section dst, Section src,

int shift, int dim, int mode)

static public void copy(Section dst, Section src)

static public void writeHalo(Section src,

int[] wlo, int[] whi, int[] mode)

Use of these functions will be illustrated in later examples.

Polymorphism is achieved by using arguments of class Section.

4.2. Issues in the language design

With some of the implementation mechanisms exposed, we can

better discuss the language design itself.


25/37



4.2.1. Extending the Java language

The first question to answer is why use Java as a base language?

Actually, the programming model embodied in HPJava is largely

language independent. It can bound to other languages like C, C++ and

Fortran. But Java is a convenient base language, especially for initial

experiments, because it provides full object-orientation--convenient for

describing complex distributed data--implemented in a relatively simple

setting, conducive to implementation of source-to-source translators. It

has been noted elsewhere that Java has various features suggesting it

could be an attractive language for science and engineering

4.2.2. Datatypes in HPJava

In a parallel language, it is desirable to have both local variables(like the ones in MPI programming) and global variables (like the ones in

HPF programming). The former provide flexibility and are ideal for task

parallel programming; the latter are convenient especially for data

parallel programming.

In HPJava, variable names are divided into two sets. In generalthose declared using ordinary Java syntax represent local variables and

those declared with [[ ]] represent global variables. The two sectors are

independent. In the implementation of HPJava the global variables have

special data descriptors associated with them, defining how their

components are divided or replicated across processes. The significance

of the data descriptor is most obvious when dealing with procedure calls.


26/37



Passing array sections to procedure calls is an important

component in the array processing facilities of Fortran90 [1]. The data

descriptor of Fortran90 will include stride information for each arraydimension. One can assume that HPF needs a much more complex kind

of data descriptor to allow passing distributed arrays across procedure

boundaries. In either case the descriptor is not visible to the programmer.

Java has a more explicit data descriptor concept; its arrays are considered

as objects, with, for example, a publicly accessible length field. In HPJava,

the data descriptors for global data are similar to those used in HPF, but

more explicitly exposed to programmers. Inquiry functions such as grp(),

rng() have a similar role in global data to the field length in an ordinary

Java array.

Keeping two data sectors seems to complicate the language and its

syntax. But it provides convenience for both task and data parallel

processing. There is no need for things like the LOCAL mechanism in

HPF to call a local procedure on the node processor. The descriptors for

ordinary Java variables are unchanged in HPJava. On each node

processor ordinary Java data will be used as local varables, like in an MPI

program

4.2.3. Programming convenience

The language provides some special syntax for the programmer's

convenience. Unlike the syntax for data declaration, which has

fundamental significance in the programming model, these extensions

are purely provide syntactic conveniences.
http://../extraloginronconsiderationsnode13.html#Fortran90http://../extraloginronconsiderationsnode13.html#Fortran90http://../extraloginronconsiderationsnode13.html#Fortran90http://../extraloginronconsiderationsnode13.html#Fortran90


27/37



There are a limited number of Java operators overloaded. A group

object can be restricted by a location using the / operation, and a sub-

range or location can be obtained from a range using the [ ] operatorenclosing a triplet expression or an integer, These pieces of syntax can be

considered as shorthand for suitable constructors in the corresponding

classes. This is comparable to the way Java provides special syntax

support for String class constructor.

Another kind of overloading occurs in location shift, which is used

to support ghost regions. A shift operator + is defined between a location

and an integer. It will be illustrated in the examples in the next section.

This is a restricted operation--it has meaning (and is legal) only in an

array subscript expression.

5. Discussion and related work

We have described a conservative set of extensions to Java. In the

context of an explicitly SPMD programming environment with a good

communication library, we claim these extensions provide much of the

concise expressiveness of HPF, without relying on very sophisticated

compiler analysis. The object-oriented features of Java are exploited to

give an elegant parameterization of the distributed arrays of the extended

language. Because of the relatively low-level programming model,

interfacing to other parallel-programming paradigms is more natural

than in HPF. With suitable care, it is possible to make direct calls to, say,

MPI from within the data parallel program. In [3] we suggest a concrete

Java binding for MPI.
http://../extraloginronJava98node10.html#JavaMPIhttp://../extraloginronJava98node10.html#JavaMPIhttp://../extraloginronJava98node10.html#JavaMPIhttp://../extraloginronJava98node10.html#JavaMPI


28/37



We will mention two related projects. Spar [11] is a Java-based

language for array-parallel programming. Like our language it

introduces multi-dimensional arrays, array sections, and a parallel loop.There are some similarities in syntax, but semantically Spar is very

different to our language. Spar expresses parallelism but not explicit data

placement or communication--it is a higher level language. ZPL [10] is a

new programming language for scientific computations. Like Spar, it is

an array language. It has an idea of performing computations over a

region, or set of indices. Within a compound statement prefixed by a

region specifier, aligned elements of arrays distributed over the same

region can be accessed. This idea has certain similarities to our over

construct. Communication is more explicit than Spar, but not as explicit

as in the language discussed in this article.

5.1.Implementation of CollectivesIn this section we will discusses Java implementation of the Adlib

collective operations. For illustration we concentrate on the important

Remap operation. Although it is a powerful and general operation, it is

actually one of the more simple collectives to implement in the HPJava

framwork. General algorithms for this primitive have been described by

other authors. For example it is essentially equivalent to the operation

called Regular_Section_Copy_Sched in [2]. In this section we want to

illustrate how this kind of operation can be implemented in term of the

particular Range and Group hierarchies of HPJava (complemented by

suitable set of messaging primitives). All collective operations in the

library are based on communication schedule objects. Each kind of

operation has an associated class of schedules. Particular instances of
http://../extraloginronJava98node10.html#Sparhttp://../extraloginronJava98node10.html#ZPLhttp://../extraloginronadlibnode10.html#Patrihttp://../extraloginronadlibnode10.html#Patrihttp://../extraloginronJava98node10.html#ZPLhttp://../extraloginronJava98node10.html#Spar


29/37



these schedules, involving particular data arrays and other parameters,

are created by the class constructors. Executing a schedule initiates the

communications required to effect the operation. A single schedule maybe executed many times, repeating the same communication pattern. In

this way, especially for iterative programs, the cost of computations and

negotiations involved in constructing a schedule can often be amortized

over many executions. This paradigm was pioneered in the

CHAOS/PARTI libraries [8]. If a communication pattern is to be executed

only once, simple wrapper functions can be made available to construct a

schedule, execute it, then destroy it. The overhead of creating the

schedule is essentially unavoidable, because even in the single-use case

individual data movements generally have to be sorted and aggregated,

for efficiency. The associated data structures are just those associated

with schedule construction. Constructor and public method of the remap

schedule for distributed arrays of float element can be described as

follows:

The remap schedule combines two functionalities: it reorganizes

data in the way indicated by the distribution formats of source and

destination array. Also, if the destination array has a replicated

distribution format, it broadcasts data to all copies of the destination.

Here we will concentrate on the former aspect, which is handled by an

object of class RemapSkeleton contained in every Remap object. During
http://../extraloginronadlibnode10.html#CHAOShttp://../extraloginronadlibnode10.html#CHAOS


30/37



construction of a RemapSkeleton schedule, all send messages, receive

messages, and internal copy operations implied by execution of the

schedule are enumerated and stored in light-weight data structures.These messages have to be sorted before sending, for possible message

agglomeration, and to ensure a deadlock-free communication schedule.

These algorithms, and maintenance of the associated data structures, are

dealt with in a base class of RemapSkeleton called BlockMessSchedule.

The API for the superclass is outlined in Figure 11. To set-up such a low-

level schedule, one makes a series of calls to sendReq and recvReq to

define the required messages. Messages are characterized by an offset in

some local array segment, and a set of strides and extents parameterizing

a multi-dimensional patch of the (flat Java) array. Finally the build()

operation does any necessary processing of the message lists. The

schedule is executed in a ``forward'' or ``backward'' direction by invoking

gather() or scatter().

Figure 11: API of the class BlockMessSchedule
http://node4.html/#fig:schapihttp://node4.html/#fig:schapi


31/37



5.2. Titanium

The Titanium language is designed to support high-performancescientific applications. Historically, few languages that made such a claim

have achieved a significant degree of serious use by scientific

programmers. Among the reasons are the high learning curve for such

languages, the dependence on heroic parallelizing compiler technology

and the consequent absence of compilers and tools, and the

incompatibilities with languages used for libraries. Our goal is to provide

a language that gives its users access to modern program structuring

through the use of object-oriented technology, that enables its users to

write explicitly parallel code to exploit their understanding of the

computation, and that has a compiler that uses optimizing compiler

technology where it is reliable and gives predictable results. The starting

design point for Titanium is Java. We have chosen Java for several

reasons. Although Titanium project extend the Java language for

scientific computing but compile down to C or C++.Those approaches

offer the hope of tuning for higher ultimate performance, but sacrifice

various benefits of the full Java platform.

5.3. FIDIL

The multidimensional array support in HPJava is strongly

influenced by FIDIL maps and domains [6, 11]. HPJava, however, sacrifices

expressiveness for performance. FIDIL maps have arbitrary shapes.

FIDIL has only a general domain type, thus making it harder to optimize

code that uses the more common rectangular kind.


32/37



5.4. Split-C

The parallel execution model and global address space support in

HPJava are closely related to Split-C [5] and AC [3]. HPJava shares a

common communication layer with Split-C on distributed memory

machines, which we have extended as part of the HPJava project to run

on shared memory machines. Split-C differs from HPJava in that the

default pointer type is local rather than global; a local pointer default

simplifies interfacing to existing sequential code, but a global default

makes it easier to port shared memory applications to distributed

memory machines. Split-C uses sequential consistency as its default

consistency model, but provides explicit operators to allow non-blocking

operations to be used. In AC the compiler introduces non-blocking

memory operations automatically, using only dependence information,

not parallel program analysis.


33/37



6. Conclusion

Our experience thus far is that Java is a good choice as a base

language: it is easy to extend, and its safety features greatly simplify the

compiler writers task. We also believe that extending Java is easier than

obtaining high performance within Javas strict language specs (assuming

that the latter is at all feasible). Many of the features of HPJava would be

hard or impossible to achieve as Java libraries, and the compiler wouldnot be able to perform static analysis and optimizations on them. HPJava

will be most helpful for problems that have some degree of regularity. To

a first approximation, the HPJavas domain is similar to HPFs. Many of

the most challenging problems in modern computational science have

irregular structure, so the value of our language features in those

domains is more controversial.


34/37


http://www.communitygrids.iu.edu/IC2.htmlhttp://www.hpjava.org/index.html


35/37



CONTENTS

1

Introduction2 Over view of HPJava

2.1 HPJava History2.2 HPJava Philosophy

3 Characteretics

3.1 Multidimensional arrays

3.2 Process arrays

3.3 Distributed arrays

3.4 The on construct and the active process group

3.5 Other features

4 Considerations in HPJava language design

4.1 Translation scheme

4.1.1 Java packages for HPspmd programming

4.2 Issues in the language design

4.2.1 Extending the Java language

4.2.2 Datatypes in HPJava

4.2.3 Programming convenience

5 Discussion and related work

5.1 Implementation of Collectives

5.2 Titanium

5.3 FIDIL

5.4 Split-C

6 Conclusion

7 References


36/37



ABSTRACT

The idea that Java may enable new programming environments,combining attractive user interfaces with high performance computation,

is gaining increasing attention amongst computational scientists. Java

boasts a direct simplicity reminiscent of Fortran, but also incorporates

many of the important ideas of modern object-oriented programming. Of

course it comes with an established track-record in the domains of Web

and Internet programming.

The language outlined here provides HPF-like distributed arrays as

language primitives, and new distributed control constructs to facilitate

access to the local elements of these arrays. In the SPMD mold, the model

allows processors the freedom to independently execute complex

procedures on local elements: it is not limited by SIMD-style array

syntax. All access to non-local array elements must go through library

functions--typically collective communication operations. This puts an

extra onus on the programmer; but making communication explicit

encourages the programmer to write algorithms that exploit locality, and

simplifies the task of the compiler writer. On the other hand, by

providing distributed arrays as language primitives we are able to

simplify error-prone tasks such as converting between local and global

array subscripts and determining which processor holds a particular

element. As in HPF, it is possible to write programs at a natural level of

abstraction where the meaning is insensitive to the detailed mapping of

elements. Lower-level styles of programming are also possible.


37/37


ACKNOWLEDGEMENTS

I express my sincere thanks to Prof. M.N Agnisarman

Namboothiri (Head of the Department, Computer Science and

Engineering, MESCE), Mr. Zainul Abid (Staff incharge) for their kind co-

operation for presenting the seminar.

I also extend my sincere thanks to all other members of the facultyof Computer Science and Engineering Department and my friends for

their co-operation and encouragement.

Rony V John

hpjava seminar report

Documents