multiplatform support for the openmsa/osams reference ... · multiplatform support for the...

Simulation Interoperability Workshop, Paper 09S-SIW-033

1

Multiplatform Support for the OpenMSA/OSAMS Reference Implementation

Craig N. Lammers (Contractor) Maria E. Valinski (Contractor)

Jeffrey S. Steinman, Ph.D. (Contractor)

Joint Program Executive Office, Chemical and Biological Defense Software Support Activity

Space and Naval Warfare Systems Center, Pacific

53560 Hull Street San Diego, CA 92152-5001

[email protected] [email protected]

[email protected]

Key Words: Parallel and Distributed Simulation, Multicore, Manycore, PDMS, OpenMSA, OSAMS, WarpIV Kernel, SPEEDES

ABSTRACT: This paper discusses issues encountered in porting the Open Modeling and Simulation Architecture (OpenMSA) and the Open System Architecture for Modeling and Simulation (OSAMS) open source reference implementation from UNIX (Linux, Mac OS X, Solaris, HP-UX, etc.) to the Microsoft Windows (XP and Vista) operating system. The OpenMSA architecture provides a high-performance parallel-and-distributed event-processing framework for modeling and simulation. OSAMS facilitates high-performance plug-and-play model interoperability within the OpenMSA. Together, these architectures support parallel and distributed processing on multicore computer systems using shared memory and standard network protocols to facilitate interactions between distributed models. Several striking differences in how (a) processes are created, (b) shared memory is created/accessed, and (c) network communications are supported make this port interesting and challenging. In addition to the many lessons learned, this paper presents timing results of several parallel and distributed performance benchmarks on four-processor multicore Windows, Linux, and Mac OS X platforms. 1.0 Introduction

The Open Modeling and Simulation Architecture (OpenMSA) is an open-source architecture that provides a high-performance parallel-and-distributed discrete event-processing framework for Modeling and Simulation (M&S). The Open System Architecture for Modeling and Simulation (OSAMS) facilitates high-performance plug-and-play model interoperability within the OpenMSA.

At the core, the OpenMSA utilizes network and shared memory communication to coordinate message passing between multiple processes in parallel and distributed execution environments. Until recently, OpenMSA/OSAMS only supported UNIX-based systems including Linux, Mac OS X, Solaris, and HP-UX. Differences in process creation and inter-process communication introduced roadblocks that prohibited support for Windows systems. Multiplatform support for OpenMSA/OSAMS enables the same models to execute on parallel-and-distributed environments on both Windows and UNIX-based systems without modification.

This paper discusses the differences between inter-process communication services on Windows and UNIX-based systems, a solution for supporting multiplatform process creation in OpenMSA/OSAMS, and parallel and distributed performance benchmarks on multicore Windows, Linux, and Mac OS X platforms.

2.0 Multiplatform Porting Issues

This section discusses the issues encountered during the UNIX to Windows software port. Specifically, the following subsections cover network communication, shared memory communication, process creation, the system command, environment variables, secure string functions, and compilation.

2.1 Shared Memory Communication

A shared memory segment is a block of memory that can be simultaneously accessed by multiple processes. Changes made to the memory by one process are immediately reflected in other processes. By allowing multiple processes access to the block of memory, shared


2

memory communication eliminates the need to duplicate data in multi-core applications. To guarantee coordination between multiple processes, flags are set to specify when it is safe to read or write to a shared memory segment. OpenMSA/OSAMS utilizes shared memory to efficiently communicate between multiple processes.

Initially, a process creates a shared memory segment of a specified size with specific access permissions. Given a unique key, other processes can attach themselves to one or more segments in order to read and write to the shared memory. The shared memory segment is mapped into the address space of a process when it attaches to the segment, and un-mapped from the address space when it detaches from the segment. When a shared memory segment is no longer needed, a process cleans up the resources associated with the segment.

There are two major differences between using shared memory communication in the UNIX and Windows environments. The first difference corresponds to the creation of a shared memory segment. Windows requires the developer to specify a unique key during creation of the shared memory segment, whereas UNIX can automatically generate unique keys. This key is required for multiple processes to access a particular shared memory segment. The second difference corresponds to cleaning up shared memory resources. In UNIX, shared memory resources for a segment are cleaned given the key of the segment. In Windows, shared memory resources for a segment are cleaned given the reference handle that is returned from creating or accessing a segment. In addition to the segment key, the Windows handle to the segment must be saved to clean up shared memory resources. Table 1 lists the UNIX and Windows functional equivalents to the shared memory utilities. [3]

Table 1: Mapping of shared memory utilities in UNIX and Windows. Note the shmget function has a dual purpose in UNIX. A shared memory segment is created if the key is set to IPC_PRIVATE and the IPC_CREAT bit is set in the flag.

UTILITY UNIX WINDOWS

Create shared memory segment

shmget CreateFileMapping

Access shared memory segment

shmget OpenFileMapping

Map segment into address space of a process

shmat OpenFileMapping MapViewOfFile

Unmap segment from a address space of a process

shmdt UnmapViewOfFile

Clean up segment resources shmctl CloseHandle

Wrapper functions were developed in OpenMSA/OSAMS to abstract the cross-platform nuances away from the developer. When creating a shared memory segment in Windows, OpenMSA/OSAMS automatically generates a

unique key based upon the process ID and a global counter, and returns the key by reference. When a process creates or accesses a shared memory segment for the first time in Windows, OpenMSA/OSAMS saves the segment handle in a global tree using the segment key as the key for the tree element. This completely hides the Windows handle from the developer, so the integer key is all that is required to uniquely identify a shared memory segment. When cleaning up the shared memory resources for a specific segment, the wrapper function retrieves the handle from the tree given the segment key.

2.2 Network Communication

OpenMSA/OSAMS provides a variety of services that leverage network communication between machines. At the heart of OpenMSA/OSAMS, a high-speed communications framework integrates shared memory with reliable communication across a network using the Transmission Control Protocol and Internet Protocol (TCP/IP). In addition, interoperability with Distributed Interactive Simulation (DIS) is provided using User Datagram Protocol and Internet Protocol (UDP/IP) connectionless streams. TCP/IP and UDP/IP are the fundamental network communication standards that define the connection, communication, and data transfer procedure between two points.

The Berkeley Software Distribution (BSD), or some variant thereof, provides the core network programming services in most UNIX-based systems. The Windows Sockets (Winsock) Applications Programming Interface (API), derived from BSD, provides the core network programming services in Windows systems. This software port utilized the latest Winsock 2 API.

Providing cross-platform support for socket applications is relatively straightforward. With some minor exceptions, the BSD and Winsock APIs almost directly map to one another. There are, however, some differences.

For starters, Windows requires initializing the Winsock framework prior to using any of the socket functions. The Winsock framework is initialized using the WSAStartup function and terminated via the WSACleanup function. The Winsock version is specified as a parameter to the startup function.

With a few exceptions, socket function names are relatively consistent between the BSD and Winsock APIs. Sockets are closed in Winsock via the closesocket function, versus the close function in BSD. To control the I/O mode of a socket, Winsock provides an ioctlsocket function as an alternative to the ioctl and fcntl BSD functions. These differences are summarized in Table 2. [1]


3

Table 2: Discrepancies in socket function names.

SOCKET UTILITY BSD WINSOCK

Close a socket close closesocket

Control I/O mode of a socket ioctl/fcntl ioctlsocket

Socket function return types are relatively consistent between BSD and Winsock. With the exception of the socket and accept functions, every function listed in Table 3 returns an integer in both BSD and Winsock implementations. Upon success, the socket and accept functions return a socket descriptor. The socket descriptor is represented as an integer in BSD and an abstract SOCKET data type in Winsock.

Return values on function failure vary between BSD and Winsock. The return values on failure for some of the more common functions are listed in Table 3.

Table 3: Socket function return values on failure.

FUNCTION BSD FAILURE WINSOCK FAILURE

socket -1 INVALID_SOCKET

bind -1 SOCKET_ERROR

listen -1 SOCKET_ERROR

accept -1 INVALID_SOCKET

close/ closesocket

-1 SOCKET_ERROR

connect -1 SOCKET_ERROR

send -1 SOCKET_ERROR

sendto -1 SOCKET_ERROR

recv -1 SOCKET_ERROR

recvfrom -1 SOCKET_ERROR

shutdown -1 SOCKET_ERROR

ioctl/ ioctlsocket

-1 SOCKET_ERROR

select -1 SOCKET_ERROR

setsockopt -1 SOCKET_ERROR

As of Winsock 2.0, the INVALID_SOCKET and SOCKET_ERROR constants are both defined as -1. Therefore, it is possible to account for the difference in data type and return value for the socket and accept functions by casting the socket descriptor to an integer, as shown in Code Segment 1. [2]

int socketFd = int(socket(AF_INET, SOCK_STREAM, 0)); if (socketFd == -1) { cout << “socket() error...” << endl; }

Code Segment 1: Casting a socket descriptor to an integer.

To avoid potential compiler warnings, usage of the return-type constants and platform-specific data types by

conditional compilation are recommended [2]. Conditional compilation is shown in Code Segment 2.

#ifdef WINSOCK SOCKET socketFd = socket(AF_INET, SOCK_STREAM, 0); if (socketFd == INVALID_SOCKET) { cout << “socket() error...” << endl; } #else int socketFd = socket(AF_INET, SOCK_STREAM, 0); if (socketFd == -1) { cout << “socket() error...” << endl; } #endif

Code Segment 2: Addressing cross-platform discrepancies with conditional compilation.

Socket error code retrieval varies between BSD and Winsock. In BSD, error codes are retrieved using the errno variable. In Winsock, calling the WSAGetLastError function retrieves error codes. In addition, error codes do not directly map to one another. A BSD to Winsock mapping of some of the more common socket error codes is given in Table 4.

Table 4: BSD to Winsock mapping of common socket error codes.

BSD WINSOCK DESCRIPTION

EAGAIN WSAEWOULDBLOCK Operation on non-blocking socket cannot be completed immediately

EPIPE WSAESHUTDOWN Attempting to read or write on a closed socket

EINPROGRESS WSAEINPROGRESS Blocking operation is now in progress

OpenMSA/OSAMS provides a high-speed communication library that leverages the fundamental network programming utilities discussed in this section. This library enables asynchronous and synchronous message passing, a client/server Object Request Broker (ORB), a grid system, a multi-replication framework, DIS utilities, and a parallel application launcher.

2.3 Process Creation

Processes are able to create new processes using two fundamental approaches: fork and spawn. Fork essentially enables a process to clone itself. The clone is an exact copy of the original with a separate address space. Both the original and clone continue program execution from the point of the fork. [6] Servers often fork processes in order to simultaneously service multiple clients; OpenMSA/OSAMS forks processes to achieve parallelism on multicore machines. Processes are forked in UNIX systems using the fork function. Windows, unfortunately, does not offer an equivalent to fork.


4

Windows does, however, offer the ability to spawn new processes. Spawn creates a child process to execute a command. Unlike fork, processes that are spawned start from the beginning of the execution. Various methods are available to spawn processes. UNIX systems can spawn processes using fork and exec. Windows offers a low-level CreateProcess function that enables a process to spawn a new process. As an alternative, the system command can launch new processes and is supported on both UNIX and Windows platforms. Due to its relative simplicity and multiplatform support, the system command was elected to spawn processes in OpenMSA/OSAMS. The system command is described in more detail in Section 2.4.

Applications in OpenMSA/OSAMS begin parallel execution after the startup of the high-speed communications framework. Prior to the software port, the startup service automatically forked processes to launch the requested number of nodes for parallel processing. Next, shared memory segments are created within the communications infrastructure to coordinate all of the inter-process communications services. Network connections, when executing in a cluster environment, are also established during the startup process. [5]

The high-speed communications startup service was modified to offer both fork and spawn startup modes. Windows systems are restricted to startup via spawn; UNIX systems are free to choose between either of the startup modes. As opposed to forking a process, child processes that are spawned start execution from the beginning of the program so care must be taken to prevent reprocessing undesirable code sequences.

OpenMSA/OSAMS applications start out with a single primary process that launches the other secondary processes. During spawn startup, the primary process first interacts with the shared memory server to create all of the shared memory segments that are necessary to support high-speed communication between all processes that will be launched. Afterwards, system calls are made in place of fork to launch the appropriate number of secondary processes. The primary process continues execution following the spawn, so the number of newly created secondary processes is one less the number of local nodes. Prior to each system call, environment variables are set specifying the (1) local node ID of the new process and (2) process ID of the current process. Each newly created process can then retrieve these state variables from the environment.

Once created, each new process re-enters the high-speed communications spawn startup function. Environment variables are immediately retrieved to identify the local node ID of each spawned process. Next, each spawned

process creates a shared memory client and connects to the shared memory server. Finally, barrier synchronization takes place to ensure all nodes have completed the startup procedure. Once synchronized, the simulation can begin processing events.

2.3.1 Process Management

Mechanisms for querying and controlling processes vary between UNIX and Windows. Table 5 lists the functions required to retrieve the ID, kill, and put a process to sleep in UNIX and Windows environments. Note that unlike UNIX, Windows does not maintain parent-child relationships between processes. [4] In Windows, processes that are spawned run completely independent of the parent process. When a parent process dies in UNIX, existing children are adopted by process ID 1. This does not happen in Windows.

Table 5: Common mechanisms for querying and controlling processes in UNIX and Windows.

TASK UNIX WINDOWS

Get process ID getpid GetCurrentProcessId

Get parent PID getppid N/A

Kill a process given its PID

kill OpenProcess TerminateProcess CloseHandle

Put process to sleep sleep (sec) Sleep (msec)

The task of killing a process requires some explanation. In UNIX systems, a parent can terminate a child process given the process ID of the child. In Windows, a process can terminate any process given its handle. The handle of a process is obtained by passing its process ID to the OpenProcess function. The process can then be terminated via TerminateProcess, and the handle to the process is closed using the CloseHandle function.

2.4 System Command

The system command enables an application to launch a program or execute a command on the operating system. Unix and Windows environments both offer this ability through the system function, which accepts a command and its arguments as a string. OpenMSA/OSAMS utilizes this capability to spawn processes.

When a system command is invoked in UNIX, the operating system launches a new shell as a sub-process, executes the command string, and returns control back to the program upon completion of the command. In Windows, the behavior of the system command varies based on the content of the command string. Preceding the command string by cmd.exe invokes the Microsoft Windows command prompt, allows access to shell


5

commands, and executes the command as a sub-process. Alternatively, preceding the command by start executes the command as an independent process in a separate console window. Adding the /B option hides the console window, and this is as close as Windows gets in terms of running a UNIX system command in the background.

There are differences in the format of the command string in Windows and UNIX operating systems. In UNIX, a single forward slash is used as the delimiter between directories and filenames in a path. In Windows, two backward slashes are used as the delimiter. Windows executables most often contain an “.exe” or “.bat” file extension. Thankfully, these extensions do not need to be specified in the system command when launching an executable, since files of these types are automatically identified as executable files per the PATHEXT system environment variable.

Multiple statements can be executed from a single system command. This is often useful for changing directories prior to running an execution. UNIX separates statements in a command string with a semicolon, whereas Windows uses an ampersand.

Table 6: System command format in UNIX and Windows environments.

ITEM UNIX WINDOWS

Path delimiter / \\

Command separator ; &

Environment variable $ENV_NAME $(ENV_NAME)

Background task & Requires using start

Environment variables contained within a command are represented differently. In UNIX, environment variables are preceded by a dollar sign and of the form $ENV_NAME, where ENV_NAME represents the name of an environment variable. In Windows, environment variables are preceded by a dollar sign and enclosed within parentheses, i.e. $(ENV_NAME). Variations in the format of system commands in UNIX and Windows environments are summarized in Table 6.

A wrapper function was developed in OpenMSA/OSAMS to provide multiplatform support for system commands. The function accepts a simpler UNIX-style command string and, when running Windows, converts the format of the string to a valid Windows system command prior to execution. The function does not, however, map shell commands.

2.5 Environment Variables

An environment variable is a unique name-value pair contained within each active process. Processes are free to

create, access, modify, and remove environment variables using a set of operating system-dependent functions. Environment variables are passed to new processes created via system calls or fork and exec. OpenMSA/OSAMS uses environment variables to retrieve installation paths and pass state variables to new processes created via system calls.

Table 7 lists the functions required to set (create/modify), get (access), and unset (remove) an environment variable in UNIX and Windows systems.

Table 7: Mapping of environment variable functions in UNIX and Windows. Note the SetEnvironmentVariable function has a dual purpose in Windows. Environment variables are unset by passing NULL as the value parameter to the function.

UTILITY UNIX WINDOWS

Set environment variable

setenv SetEnvironmentVariable

Get environment variable

getenv GetEnvironmentVariable

Unset environment variable

unsetenv SetEnvironmentVariable

There are differences between setting and retrieving environment variables in UNIX and Windows. First, Windows does not provide an option to prevent modifying the value of an existing environment variable in a set operation. If the environment variable exists, it will be overwritten. UNIX, however, can provide protection from setting environment variables that are already defined. Second, for security reasons, Windows requires passing in a buffer to retrieve the value of an environment variable. UNIX simply returns a pointer to the environment variable value.

Environment variable wrapper functions were developed to work around these minor nuances. The functions enable the developer to set and retrieve environment variables as strings, integers, and doubles. In Windows, an optional check was implemented to prevent from overwriting the value of a pre-existing environment variable. In order to provide a consistent and simpler interface in OpenMSA/OSAMS, the UNIX approach was mimicked for retrieving environment variables by returning a pointer to the variable. Internally, the value of the environment variable was copied into a global buffer and returned as a pointer to the buffer. Therefore, environment variables retrieved as strings should be copied into a buffer if the value is required beyond the call of the function.

2.6 Secure String Functions

To improve the security of the Windows operating system, Microsoft deprecated the core string functions in


6

order to reduce the likelihood of buffer overflow. Some of the core string functions, along with their secure equivalent, are listed in Table 8.

Table 8: Mapping of common string functions to their secure equivalent on Windows.

STRING FUNCTION SECURE EQUIVALENT

strcasecmp _stricmp

strcpy strcpy_s

strncpy strncpy_s

strcat strcat_s

strncat strncat_s

strdup _strdup

sprintf sprintf_s

The arguments to the _stricmp and _strdup secure string functions are identical to their non-secure counterparts. The remaining secure string functions require the size of the destination buffer to protect from memory corruption. This provides a nice safeguard. However, it is entirely feasible for a program to pass in an incorrect buffer size or offset destination buffer. The secure functions offer no protection from poor programming.

Warnings regarding the use of Windows secure string functions can be disabled with a compiler option. To provide a robust software port, this option was not used. Instead, macros were implemented in OpenMSA/OSAMS to map the more common string functions to Windows secure string functions. The mappings assumed the destination buffer had enough room for the source data. Ultimately, a program should verify the destination buffer has enough room for the source data prior to the copy.

2.7 Compilation

The OpenMSA/OSAMS source code was compiled on Fedora Core 9 using the gcc 4.3.0 compiler, Mac OS X using the gcc 4.2.1 compiler, and Microsoft Windows XP and Vista using Microsoft Visual Studio Express 2008. Visual Studio was chosen because it is the most commonly used development environment on Windows.

OpenMSA/OSAMS provides a make system for UNIX installations. Windows offers an nmake utility. However, documentation for the utility was sparse at best so a decision was made to compile all code from within a Visual Studio project file.

The OpenMSA/OSAMS infrastructure is composed of a single library and various executable programs. In UNIX systems, the source code can be compiled into either a static (.a) or dynamic (.so) library. In Windows systems, the source code can currently be compiled into a

static (.lib) library. Compiling the source code into a dynamic (.dll) library in Windows would have required significant modification to the entire source code. For example, every function, class, and global variable that needs to be accessible from a DLL must be preceded by a macro. [4] Linking with a static library results in a larger executable program since all required functionality from the library is included with the executable. However, with the extra footprint comes improved process initialization performance.

Visual Studio was useful in uncovering a few run-time errors that went undetected on the Mac and Linux. These errors involved over stepping array boundaries in recursive routines.

Several unusual compilation and linking errors were encountered during the software port. As a general programming rule, macro names should be defined in all uppercase letters with words separated by underscores to avoid conflict with function and method names. Microsoft does not follow this rule. Microsoft defines several macros using mixed-case letters that unfortunately matched the names of several OpenMSA/OSAMS methods. At compile time, this caused the Microsoft macros to be expanded in place of the OpenMSA/OSAMS methods, resulting in puzzling errors stating the methods were not defined. Specifically, the following Microsoft macros caused conflicts: SendMessage, GetClassName, and SetPort. Rather than rename the OpenMSA/OSAMS methods, the Microsoft macros were un-defined in places where conflicts arose.

3.0 Multiplatform Performance

This section discusses the timing results of several parallel and distributed performance benchmarks on four-processor multicore Windows, Linux, and Mac OS X platforms. The machines specifically consisted of a 2.4 GHz quad core Dell Inspiron 530 dual-boot to run Windows XP and Fedora Core 9, and a slightly faster 3 GHz four-processor Mac Pro running OS X Leopard. Each machine had a 100 MB/sec network card and more than sufficient memory to run the tests.

Figure 1 indicates the time required to compile the OpenMSA/OSAMS library with and without optimization in each platform. Code that is compiled optimized typically improves run-time performance while reducing the size of executables. In turn, symbol tables are stripped out to prevent debugging code and compilation generally takes longer. The source code was compiled into a dynamic library in Mac and Linux, and a static library in Windows. Unlike the gcc compiler in Mac and Linux, the Visual Studio compiler was unable to build the library in


7

parallel. More time was required to compile the library unoptimized versus optimized in Windows since Visual Studio was writing to database files to enable run-time checks and more rigorous debugging.

Figure 1: OpenMSA/OSAMS library compilation time.

Figure 2 indicates the size of the library for each platform. Naturally, a static library occupies a larger footprint than a dynamic library. Optimization typically reduces the size of a library because symbol tables supporting debugging and profiling are stripped, but that was clearly not the case with Windows static libraries.

Figure 2: OpenMSA/OSAMS library size.

Figure 3 summarizes timing results for fundamental machine operations. Each operation was performed within an inline function called repeatedly inside a for-loop. By default, Visual Studio does not expand inline functions. Overheads of the for-loop and function call were subtracted from each operation. In comparison to Windows and Mac, Linux compiled optimized proved to be fast when conducting addition, subtraction, and multiplication, but slow when computing transcendental

functions, exponentials, logarithms, square roots, and floating-point division. The Mac performed well overall, but was significantly slower than Linux and Windows at floating-point addition, subtraction, and multiplication when compiled optimized. In comparison to Linux, Windows compiled optimized performed well when computing transcendental functions, exponentials, logarithms, square roots, and floating-point division, was comparable to Linux at fixed and floating-point multiplication and fixed-point division, and was slower than Linux at fixed and floating-point addition and subtraction. Each benchmark performed significantly slower when compiled unoptimized versus optimized.

Figure 3: Machine benchmarks.

Figure 4 and Figure 5 summarize performance results for various network communication tests. Each test measures the number of messages sent per second.

Figure 4 shows the results of network message throughput using TCP/IP to communicate between two processes executing on the same machine. For the synchronous ping-pong test, a client sends a 1-byte message to the server and waits for a 1-byte acknowledgement. This cycle is repeated for five seconds to average the measured performance. For the synchronous variable-length data test, the client sends a variable length message to the server and waits for a 1-byte reply. The asynchronous ping-pong test has the client and server sending 1-byte messages to each other without waiting for replies. Linux was nearly twice as efficient as Mac and Windows when sending small messages. Compiling optimized did not have a significant impact on the results.


8

Figure 4: Network message throughput.

Figure 5 shows the results of a network bandwidth test. The test has the client asynchronously sending 1-million byte messages to the server. For these larger messages, Linux and Mac were three times more efficient than Windows. Compiling optimized did not have a significant impact on the results.

Figure 5: Network bandwidth performance.

Figure 6 and Figure 7 report the timing results and speedup of a simple OpenMSA/OSAMS parallel simulation using one to four nodes. Compiling optimized reduced the execution time by a factor of four in Windows and a factor of two in Linux and Mac. Windows demonstrated near linear speedup, but when compiled optimized next to Linux and Mac, the test executed almost twice as slow.

Figure 6: Run time results of event processing test.

Figure 7: Scalability of event processing test.

A high-level particle-based transport and dispersion model was developed that leverages many of the modeling constructs provided by OpenMSA/OSAMS. In this model, individual particles are dispersed in response to random wind gusts. Sensors placed about the grid region generate detections as particles come within range. The run-time performance results, shown in Figure 8, were gathered executing the particle simulation for 10,800 seconds with 40 sensors and 10,000 particles moving about 4,641 grid cells. The simulation executed more than twice as fast on Linux versus Windows. When compiled optimized, the simulation ran roughly twice as fast in Windows and 30% faster in Mac and Linux.


9

Figure 8: Chem/Bio particle simulation run-time performance.

Figure 9 shows the speedup of the OpenMSA/OSAMS particle simulation. Interestingly enough, the simulation showed identical and near-linear speedup in Linux, Mac, and Windows platforms. However, keep in mind that Linux and Mac exhibited faster run times than Windows.

Figure 9: Chem/Bio particle simulation scalability.

Tests were conducted to measure the performance of various shared memory communication operations between four nodes on the Mac, Windows, and Linux platforms. The operations consist of synchronization services, reduction algorithms, data distribution mechanisms, and message passing services. Synchronization services are used to coordinate multiple processes by either halting the execution of each node until all nodes have reached the same point (barrier synchronization), or signifying when each has progressed to the same point (fuzzy synchronization). Reduction algorithms allow applications to synchronously obtain the sum, maximum, or minimum of values passed as arguments to each node. Data distribution mechanisms are used to partition, distribute, and collect data between

nodes. Message passing services allow applications to send messages in an asynchronous or coordinated manner. Results of the shared memory communication operations between four nodes on the Mac are given in Figure 10.

Figure 10: Mac 4 nodes.

Results of the shared memory communication operations between four nodes on Windows are given in Figure 11.

Figure 11: Windows 4 nodes.

Results of the shared memory communication operations between four nodes on Linux are given in Figure 12. Overall, operations took the least amount of time to execute on the Linux platform. Operations required roughly 50% more time to complete on the Mac. On Windows, reduction operations were on par with Linux but all other operations were twice as slow.


10

Figure 12: Linux 4 nodes.

Figure 13, Figure 14, and Figure 15 show the results of shared memory communication bandwidth tests conducted between four nodes on the Mac, Windows, and Linux platforms. Each platform tested both asynchronous and coordinated message passing via unicast, destination-based multicast, and broadcast communication modes. Unicast sends a message from one node to another node, destination-based multicast sends a message to multiple nodes, and broadcast sends a message to all other nodes. Message sizes ranged from 1 to 100,000 bytes. Results of the bandwidth tests using four nodes on the Mac are given in Figure 13.

Figure 13: Message bandwidth on Mac using 4 nodes.

Results of the bandwidth tests using four nodes on Windows are given in Figure 14.

Figure 14: Message bandwidth on Windows using 4 nodes.

Results of the bandwidth tests using four nodes on Linux are given in Figure 15. Overall, bandwidth was roughly equivalent across Mac, Windows, and Linux platforms for messages between 1 and 1,000 bytes. However, the performance varied between systems for messages sizes beyond 1,000 bytes. Bandwidth peaked for messages around 9,000 bytes in size on the Mac. On Linux, bandwidth appeared to top off as the message size approached 30,000 bytes. On Windows, unicast message bandwidth peaked around 1,100 bytes, multicast message bandwidth peaked around 12,000 bytes, and broadcast messages peaked around 11,000 bytes.

Figure 15: Message bandwidth on Linux using 4 nodes.

4.0 Summary and Conclusions

This paper discussed issues encountered in porting the OpenMSA/OSAMS high-performance discrete-event modeling and simulation architecture from UNIX (Linux, Mac OS X, Solaris, HP-UX, etc.) to the Microsoft Windows (XP and Vista) operating system. Differences


11

between process creation and inter-process communication were discussed, in addition to a cross-platform solution that worked around Windows inability to fork a process. Overall, performance benchmarks indicated that a typical discrete-event simulation experiment demonstrated scalable speedup on both UNIX and Windows multicore systems. The execution time, however, was twice as slow on Windows versus UNIX multicore systems. Communications did not appear to be the bottleneck, but rather the problem resided in the slower code generated by the Visual Studio compiler.

5.0 Acknowledgements

This work was sponsored by the Joint Program Executive Office for Chemical and Biological Defense (JPEO-CBD), Software Support Activity (SSA) through the Space and Naval Warfare Systems Center, Pacific (SSC-Pacific) in San Diego, CA. The authors of this paper would like to thank Alex Vagus for his contributions in helping make this effort successful.

6.0 Author Biographies

CRAIG N. LAMMERS is a Vice President and Senior Analyst at WarpIV Technologies, Inc. He is currently providing technical support for the JPEO-CBD, SSA, and is the program manager of an effort with AFRL, Rome Labs conducting new research and development in parallel and distributed simulation to support formal estimation and prediction of complex systems using the WarpIV Kernel. His professional interests are in the areas of modeling and simulation, artificial intelligence, user interface design, and music. Mr. Lammers received his Bachelor of Science degree in Industrial and Systems Engineering from the University of Michigan, Dearborn in 2001, where he graduated with high honors.

MARIA E. VALINSKI is a Vice President and Senior Software Engineer at WarpIV Technologies, Inc. She is currently providing modeling, simulation, and integration support for the JPEO-CBD, SSA, and is the lead software engineer for the Joint Virtual Network Centric Warfare (JVNCW) program that involves modeling wireless communications on supercomputers using the WarpIV Kernel for the Navy, Army, and Air Force. Ms. Valinski

received her Bachelor of Science degree in Computer Science from East Stroudsburg University in 1997.

JEFFREY S. STEINMAN, President and CEO of WarpIV Technologies, Inc. received his Ph.D. in High Energy Physics from UCLA in 1988. In 1990, Dr. Steinman developed the Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) framework that eventually transitioned to industry and several mainstream programs including BMDS SIM, JMASS, JSIMS, and EADTB. Dr. Steinman currently provides technology and open systems architecture support for the JPEO-CBD, SSA, where he is helping to establish and implement its long-range M&S strategic plan. Dr. Steinman has published more than 60 papers and articles in the field of high performance simulation, has five U.S. patents in high performance M&S technology, is a frequent invited speaker at conferences, seminars, and forums, and is the principle designer/developer of the open source WarpIV Kernel Reference Implementation of the OpenMSA, OSAMS, and OpenCAF architectures. Dr. Steinman is currently heading the newly formed Parallel and Distributed Modeling and Simulation Standing Study Group (PDMS-SSG) that operates within the Simulation Interoperability Standards Organization (SISO).

7.0 References [1] “Porting Socket Applications to Winsock”, available:

http://msdn.microsoft.com/en-us/library/ms740096.aspx

[2] Foster James and McClure Stuart, 2005. “Sockets, Shellcode, Porting & Coding.” Syngress Publishing, Rockland, MA.

[3] Hart Johnson, 2005. “Windows System Programming, Third Edition.” Addison-Wesley Professional.

[4] Richter Jeffrey and Nasarre Christophe, 2007. “Windows via C/C++.” Microsoft Press.

[5] Steinman Jeff, 2005. “WarpIV Kernel: High Speed Communications.”

[6] Stevens Richard, 1998. “UNIX Network Programming Volume 1, Second Edition. Networking APIs: Sockets and XTI.” Prentice Hall PTR, Upper Saddle River, NJ.

multiplatform support for the openmsa/osams reference ... · multiplatform support for the...

Documents