crossing the line: distributed computing across network and filesystem boundaries
TRANSCRIPT
Crossing The Line:
Distributed Computing Across
Network and Filesystem
Boundaries
Native-Language-Based Distributed Computing
Blending Java and Native Languages to achieve cross-platform, cross-
network, cross-filesystem distributed computing.
The Objective
Linux
NT
Cluster
SGI
Solaris
Cluster
Spawn ThisHere
Data Servers
Data RepositoryUpdate Database, Post-process data
Implications
• Collaborative programming environment
• Provisional access to computational resources beyond local networks.
• Data mining (send processes to the data).
• Optimal process mapping.
• Nomadic virtual environment.
Demands on the System:
• Protocol for porting of static forms of the executable program collectives across to the remote host, resolve its dependencies, check against security violations, and instantiate as a process.
• Whom to trust?
• API for message-passing library calls.
Initial Approach: Use Java
• Java provides a means to load processes obtained from remote resources using the java.lang.ClassLoader class.
• The Java SecurityManager provides a general security framework.
• Java bytecode representation provides uniformity in a heterogeneous environment (and more! …later).
Initial Implementation:
• Java-based communication substrate with Java programming bindings to message-passing functions.
• Commands such as add, delete, spawn, kill, and similar PVM-style commands to configure the environment.
• Additional commands to merge, split and register virtual environments.
Initial Implementation:
• Provided the requisite mechanisms to “soft-install” processes upon remote resources without accessing the filesystem.
• Allowed for distributed parallelization and synchronization of processes within the environment.
Functionality...
• The Java-based implementation proved to be well suited for “computationally-lite” distributed computing tasks across network boundaries.
• Although speedups were observable for parallelization of tasks over clusters, performance for traditional distributed computing tasks left a lot to be desired.
Pros/Cons of using Java
• Pros:– Java offers tremendous potential to the user in terms of
portability and heterogeneous execution
– Bytecode Representation, RMI, Object Serialization
• Cons:– As an interpreted language, Java suffers a significant
performance penalty.
– As with any new language, the thought of rewriting existing codes brings reluctance, lack of enthusiasm.
Pros/Cons of using Java
16.8917.02
41.2742.67
24.4284.73
103.4
437.9
0 100 200 300 400 500
Times (in seconds) to multiply two 500x500 matrices
C-wrappedLINPACK
implementation
Java-wrappedLINPACK
implementation
Standalone Cimplementation
Standalone JavaImplementation
UnoptimizedJIT/Optimized
So, What Language to Use?
Java is a highly-portable language
Java adheres to the “Write once, run anywhere” philosophy
Java has a well-established collection of scientific library bindings
Java’s executional speed is suitable for HPC
C/Fortran/C++ are highly-portable languages
C/Fortran/C++ adhere to the “Write once, run anywhere” philosophyC/Fortran/C++ have well-established scientific library bindings
C/Fortran/C++ executional speeds are suitable for HPC
So, What Language to Use?
Java is a highly-portable language
Java adheres to the “Write once, run anywhere” philosophy
C/Fortran/C++ have well-established scientific library bindings
C/Fortran/C++ executional speeds are suitable for HPC
Utilize Java for its portability and standardization,but focus on using Java as a wrapper for porting
of native code in the form of shared libraries
Solution: Blend Java with C/Fortran
• Use Java for the initial introduction of the program collective to the remote host. The wrapper class may be analyzed for class dependencies, shared library usage, security violations, etc.
• Use C/Fortran codes as the computational engine of the process. Compiled into shared libraries (.so’s or .DLL’s), they can be encapsulated within the program collective and loaded onto the remote resource.
The (New) Objective:
Linux
NT
Cluster
SGI
Solaris
Cluster
Spawn ThisHere
Native Library,FORTRAN BLAS
perhaps
How does it work?
Request to create Java-based process, “A_process” Local search for
A_process failsRequest for class A_process
Bytecode for A_process.class...011AF01222EBABEFAC 22EBABEF
A C
Class file is run throughthe “BYTE GRINDER.”
...22EBABEF A
List ofMethod Calls Native Libraries
List
List ofDependency Classes
The “BYTE GRINDER”
• List of method calls aids in imposing security on processes obtained remotely, run locally.
• Libraries in the native library list can be obtained from the requesting user or from a trusted third party.
• Classes in the dependency class list are analyzed similarly.
How does it do that?
• Following the Java Virtual Machine specification, the incoming bytecode is “analyzed.”– Magic Number (CAFEBABE)– Major Version– Minor Version– Constant Pool Count
• Construction of the constant pool. (Watch out for those double and long entries!)
How does it do that?
– Read super-class entry from the constant pool– Read list of Interfaces from constant pool
references– Read list of fields from the constant pool.– Method opcode listings.– Java Opcode of each method is analyzed for
invocation of calls such as “System.load.” Argument yields the native library dependency.
• Socket calls, File manipulations, etc.
Other Things That Make it Work:
• Processes are (sub-subclassed) extensions of the java.lang.Thread class, which allows for its execution to be started, stopped, suspended, prioritized, serialized or likewise governed by the remote host.
• JNI: Automatic header file generation and a protocol for interfacing with C/C++ codes (which are then used to interface to Fortran).
Work in progress:
• Implement new features in the recent release of the JDK 1.2 relating to native method calls.
• Optimize message passing mechanism in the substrate.
• Implement full security mechanisms.
• Generate cross network IceT demo Apps.
Summary
• IceT extends the scope of distributed computing environments by– locating and migrating static processes across
filesystems and by
– dynamically merging and splitting virtual environments.
• IceT provides a provisional mechanism for supporting program collectives running concurrently, across multiple networks, and existing in multiple fazes.