1 pdms – 2 hour tutorial

78
1 PDMS – 2 Hour Tutorial

Upload: letruc

Post on 22-Dec-2016

248 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 PDMS – 2 Hour Tutorial

1 PDMS – 2 Hour Tutorial

Page 2: 1 PDMS – 2 Hour Tutorial

•  Multicore computing revolution –  The need for change…

•  Proposed Open Unified Technical Framework (OpenUTF) architecture standards –  OpenMSA, OSAMS, OpenCAF as future standards

•  Introduction to parallel computing –  Programming models –  High Speed Communications (HSC) through shared memory

•  Synchronization and Parallel Discrete Event Simulation (PDES) –  Event Management –  Time Management

•  Open discussion

PDMS – 2 Hour Tutorial 2

Page 3: 1 PDMS – 2 Hour Tutorial

MULTICORE Future of computing is…

PDMS – 2 Hour Tutorial 3

Page 4: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 4

I skate to where the puck is going to be, not where it has been! Wayne Gretzky

Page 5: 1 PDMS – 2 Hour Tutorial

•  Performance wall –  Clock speed and power consumption –  Memory access bottlenecks –  Single instruction level parallelism

•  Multiple processors (cores) on a single chip is the future –  No foreseeable limit to the number of cores per chip –  Requires software to be written differently

•  Supercomputing community consensus: Low-level parallel programming is too hard –  Threads, shared memory, locks/semaphores, race-conditions,

repeatability, etc., are too hard and expensive to develop and debug (fine-grained HPC is not for your average programmer)

–  Message-passing is much easier but can be less efficient –  High-level approaches, tools, and frameworks are needed (OpenUTF,

new compilers, languages, math libraries, memory management, etc.)

PDMS – 2 Hour Tutorial 5

Page 6: 1 PDMS – 2 Hour Tutorial

Computer/Blade/Cluster

Board

BoardBoard

Chip

Node

Node

Node

Node

Chip

Node

Node

Node

Node

Chip

Node

Node

Node

Node

Chip

Node

Node

Node

Node

CloudCompu3ng,Net‐centric,GIG,

SystemsofSystems

PDMS – 2 Hour Tutorial 6

World of computing is rapidly changing and will soon demand new parallel and distributed service-oriented programming methodologies and technical frameworks.

Experts say that parallel and distributed programming is too hard for normal development teams. The Open Unified Technical Framework abstracts low-level programming details.

Page 7: 1 PDMS – 2 Hour Tutorial

•  Microsoft –  Sponsor of the by-invitation-only 2007 Manycore Computing Workshop

that brought together the who’s who of supercomputing together –  Unanimous consensus on the need for multicore computing software

tools and frameworks for developers (e.g., OpenUTF)

•  Apple –  Snow Leopard will have no new features (focus on multicore computing) –  The next version of Apple's OS X operating system will include

breakthroughs in programming parallel processors, Apple CEO Steve Jobs told The New York Times in an interview after this week's Worldwide Developers Conference. "The way the processor industry is going is to add more and more cores, but nobody knows how to program those things," Jobs said. "I mean, two, yeah; four, not really; eight, forget it.

http://bits.blogs.nytimes.com/2008/06/10/apple-in-parallel-turning-the-pc-world-upside-down/

PDMS – 2 Hour Tutorial 7

Page 8: 1 PDMS – 2 Hour Tutorial

Next generation chips Intel has disclosed details on a chip that will compete directly with Nvidia and ATI and may take it into unchartered technological and market-segment waters. Larrabee will be a stand-alone chip, meaning it will be very different than the low-end–but widely used–integrated graphics that Intel now offers as part of the silicon that accompanies its processors. And Larrabee will be based on the universal Intel x86 architecture.

…The number of cores in each Larrabee chip may vary, according to market segment. Intel showed a slide with core counts ranging from 8 to 48, claiming performance scales almost linearly as more cores are added: that is, 16 cores will offer twice the performance of eight cores.

http://i4you.wordpress.com/2008/08/05/intel-details-future-larrabee-graphics-chip

PDMS – 2 Hour Tutorial 8

Page 9: 1 PDMS – 2 Hour Tutorial

Next generation chips

Intel touts 8-core Xeon monster Nehalem-EX

Intel gave a demo yesterday of its eight-core, 2.3 billion-transistor Nehalem-EX, which is set to launch later this year… Nehalem EX has up to 8 cores, which gives a total of 16 threads per socket.

By Jon Stokes | Last updated May 28, 2009 8:25 AM CT

http://arstechnica.com/hardware/news/2009/05/intel-touts-8-core-xeon-monster.ars

PDMS – 2 Hour Tutorial 9

Page 10: 1 PDMS – 2 Hour Tutorial

COMPOSABLE SYSTEMS Open Unified Technical Framework (OpenUTF)…

PDMS – 2 Hour Tutorial 10

OpenUTF Kernel

Model Components

Service Components

Page 11: 1 PDMS – 2 Hour Tutorial

•  Simulation is not as cost effective as it should be – we need to do things differently… Revolutionary, not evolutionary change!

•  Multicore computing revolution demands change in software development methodology – need standardized framework

•  New architecture standards – we should be building models, not simulations

•  Model and Service components developed in a common framework – automates integration for Test and Evaluation

•  Verification and Validation – need a common test framework with standard processes

•  Open source – Overcomes the technology/cost barrier and supports widespread community involvement

PDMS – 2 Hour Tutorial 11

Page 12: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 12

10 ms 1 ms 100 µs 10 µs 1 µs 100 ns 10 ns 1 ns

Page 13: 1 PDMS – 2 Hour Tutorial

Requires assessment of the current state Existing tools, technologies, methodologies, data models, existing interfaces, policies, requirements, business models, contract language, lessons learned, impediments to progress, etc.

Requires the right vision for the future Lowered costs, better quality, faster end-to-end execution, easier to use and maintain, feasible technology, optimal use of workforce skill sets, multiuse concepts, composability, modern computational architectures, multiplatform, net-centric, etc.

Requires an executable transition strategy Incremental evolution, risk reduction, phased capability, accurately assessed transition costs, available funding, prioritization, community buy-in and participation, formation of new standards

PDMS – 2 Hour Tutorial 13

Page 14: 1 PDMS – 2 Hour Tutorial

1.  Engine and Model Separation 2.  Optimized Communications 3.  Abstract Time 4.  Scheduling Constructs 5.  Time Management 6.  Encapsulated Components 7.  Hierarchical Composition 8.  Distributable Composites 9.  Abstract Interfaces 10.   Interaction Constructs 11.   Publish/Subscribe 12.   Data Translation Services 13.   Multiple Applications

14.   Platform Independence 15.   Scalability 16.   LVC Interoperability Standards 17.   Web Services 18.   Cognitive Behavior 19.   Stochastic Modeling 20.   Geospatial Representations 21.   Software Utilities 22.   External Modeling Framework 23.   Output Data Formats 24.   Test Framework 25.   Community-wide Participation

14 PDMS – 2 Hour Tutorial

Page 15: 1 PDMS – 2 Hour Tutorial

•  OpenMSA – Layered Technology –  Focuses on parallel and distributed computing technologies –  Modularizes technologies through a layered architecture –  Contains OSAMS and OpenCAF –  Proven technologies based on experience with large programs –  Cost effective strategy for developing scalable computing technology –  Provides interoperability without sacrificing performance –  Facilitates sequential, parallel, and distributed computing paradigms

•  OSAMS – Model/Service Composability –  Focuses on interfaces and software development methodology to

support highly interoperable plug-and-play model/service components –  Provided by OpenMSA but could be supported by other architectures

•  OpenCAF – Cognitive Intelligent Behavior –  Thoughts and stimulus, goal-oriented behaviors, decision branch

exploration, five dimensional excursions –  Provided as an extension to OSAMS

PDMS – 2 Hour Tutorial 15

Page 16: 1 PDMS – 2 Hour Tutorial

OpenCAF • Behaviors • Cognitive Thought Processes

• 5D Simulation • Goal-oriented Optimization

OSAMS • Modularity • Composability • Interoperability • Flexibility • Programming Constructs

• VV&A

OpenMSA • Open Source • Technology • HPC/Multicore • Performance • Synchronization

OpenUTF • Architecture • Standards • Net-centricity • Data Models

HPC

Network

Scheduling

Modeling Framework

Services

Models

Behavior Representation

Cognitive Rule Triggering

Bayesian Branching

Goals and State Machines

Decision Support

Composites

Pub/Sub Services

LVC Interoperability

Web-based SOA

16 PDMS – 2 Hour Tutorial

Page 17: 1 PDMS – 2 Hour Tutorial

Operating System Services Threads

General Software Utilities (OSAMS) ORB Network Services

Internal High Speed Communications External Distributed Communications Rollback Framework

Rollback Utilities (OSAMS) Persistence (OSAMS)

Standard Template Library (OSAMS) Event Management Services

Time Management Standard Modeling Framework (OSAMS, OpenCAF)

Distributed Simulation Management Services (OSAMS – Pub/Sub Data Distribution) SOM/FOM Data Translation Services

External Modeling Framework (EMF)

& Distributed Blackboard

Gateway Interfaces (HLA, DIS,

TENA, Web-based SOA)

HPC-RTI Bridge

Model & Service Component – Repository Entity Composite – Repository

CASE Tools

Direct Federate

Abstract Federate

HLA Federate

LVC – Federation & Enterprises

External System Visualization/Analysis

17 PDMS – 2 Hour Tutorial

Page 18: 1 PDMS – 2 Hour Tutorial

18

Thought 1

Stimulus - Perception (Short Term Memory)

Thought 2

Thought N

Data Processing Behaviors, Tasks, Notifications, Abstract Methods, Uncertainty

Data Received Federation Objects and/or Interactions

Prioritized Goals State, Action & Task Management

Tasks Tasks Tasks

(5D Branching)

Reas

onin

g En

gine

PDMS – 2 Hour Tutorial

Page 19: 1 PDMS – 2 Hour Tutorial

Outputs

Inputs

  Left brain reasoning

  Inputs are ints, doubles, or Boolean

  Inputs are prioritized when they are associated with RBRs

  Inputs can be fed into multiple reasoning nodes

  Outputs can be inputs to other reasoning nodes

  Feedback loops are permitted

W X Y Z

B C A

19

Based on OpenUTF Kernel Sensitivity List •  Sensitive variables (stimulus) are registered with sensitive methods (thoughts) •  Thoughts are automatically triggered whenever registered stimulus is modified •  Thoughts can modify other stimulus to triggers additional thoughts •  Terminates when solution converges or when reaching max thoughts

PDMS – 2 Hour Tutorial

Page 20: 1 PDMS – 2 Hour Tutorial

Outputs

Inputs

  Learned reasoning

  Inputs are ints, doubles, or Boolean

  TBR is trained and then utilized to produce outputs (can be continually trained during execution)

  Inputs can be fed into multiple reasoning nodes

  Outputs can be inputs to other reasoning nodes

  Feedback loops are permitted

W X Y Z

B C A

20 PDMS – 2 Hour Tutorial

Page 21: 1 PDMS – 2 Hour Tutorial

1 =ωW +ωX +ωY +ωZ

A = ωWˆ W +ωX

ˆ X +ωYˆ Y +ωZ

ˆ Z [ ] ×TW 2TX1TY1TZ 3

  Right brain reasoning

  Inputs are normalized, weighted, and summed

  Sum is multiplied by the product of thresholds to produce the output

  Output is normalized

  Inputs can be fed into multiple reasoning nodes

  Outputs can be inputs to other reasoning nodes

  Feedback loops are permitted

ωW

TW1

TW2

TW3 0

1 TX1

TX2 0

1

TY1

0

1 TZ1

TZ3

TZ1 0

1

ωX ωY ωZ

TZ2

Output

Inputs

W X Y Z

A

21 PDMS – 2 Hour Tutorial

Page 22: 1 PDMS – 2 Hour Tutorial

  Arbitrary graphs can be constructed from Rules, Neural Nets, and Emotions

  Outputs of graphs can trigger changes to behaviors by reprioritizing goals

  Behaviors are only triggered once reasoning is completed

22

Emotion Based Reasoning

Training Based Reasoning

Rule Based Reasoning

PDMS – 2 Hour Tutorial

Page 23: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 23

Net Centric Enterprise Framework Composable

Systems LVC Web GCCS Data Visualization

Composable Plug and Play OpenUTF

Kernel Service

Components Model

Components Abstract

Interfaces V&V Test

Framework

Monolithic Applications •  Collection of Hardwired Services

Simulations •  Collection of Hardwired Models

Page 24: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 24

Page 25: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 25

•  Reusable Software Components •  Plug and Play Composability •  Conceptual Model Interoperability •  Pub/Sub Data Distribution & abstract Interfaces •  V&V Test Framework •  Performance Benchmarks

•  Parallel and Distributed Operation •  Scalable Run-time Performance •  Platform/OS Independence •  OpenMSA: Technology •  OSAMS: Modeling Constructs •  OpenCAF: Behavior Representation

•  Composable Systems •  LVC (HLA, DIS, TENA) •  Web Services (SOA) •  Data Model •  C4I/GCCS •  Visualization and Analysis

Page 26: 1 PDMS – 2 Hour Tutorial

OpenUTF Kernel

PDMS – 2 Hour Tutorial 26

Composable System

Plug-and-play Model/Service Components

Net-centric Operation: - Enterprise Frameworks - Command and Control - Standard Data Models

Legacy Interoperability: - Distributed Federation - Training, Analysis, Test - FOM/SOM

Standalone Operation: - Laptops, Desktops, Clusters, HPC - Standalone Operation - Pub/Sub Data Distribution

Page 27: 1 PDMS – 2 Hour Tutorial

•  Transparently hosts hierarchical services using the same interfaces as model components

•  SOAP interface connects services to external applications

•  Collections of related services are dynamically configured and distributed across processors on multicore systems

•  Services internally communicate through pub/sub services and decoupled abstract interfaces

•  Seamlessly supports LVC integration

PDMS – 2 Hour Tutorial 27

Composite Net Centric System on Multicore Computer

Subscribed Data Received Published Data Provided

Abstract Services Provided Abstract Services Invoked

Services communicate through Pub/Sub data exchanges and abstract interfaces

Composites are distributed across processors to achieve parallel performance

Web Services

Net-centric SOA/LVC on Networks of Single-processor and Multicore Computers

Dynamically configured structure

LVC Interface

Page 28: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 28

Global Installation & Make System

Installation & Make System

DAS

ETS

T&D

Models

Weather

CCSI

ATP-45

Services

Polymorphic Methods

Interactions

Federation Objects

XML Interfaces

Web Services

Interfaces

Source Include Library

DAS

ETS

T&D

Weather

CCSI

ATP-45

Verification

DAS

ETS

T&D

Weather

CCSI

ATP-45

Validation

DAS

ETS

T&D

Weather

CCSI

ATP-45

Benchmarks

Tests

Component Repository

Installation & Make System

OpenUTF Kernel

320,000 Lines of Code

OpenUTF

•  General concept… –  Government maintained software configuration management –  Automatic platform-independent installation & make system –  Test framework (verification, validation, and benchmarks) –  Will seamlessly support mainstream interoperability standards –  Designed for secure community-wide software distribution

Page 29: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 29

OpenUTF Kernel

LVC Interoperability

Standards

Web Standards

Models

Services

V&V Test Framework Data &

Interfaces

Development Tools

Composability Tools

Visualization Tools

Analysis Tools

Page 30: 1 PDMS – 2 Hour Tutorial

PARALLEL COMPUTING Introduction to…

PDMS – 2 Hour Tutorial 30

16 Node Hypercube Topology Log2(N) worst case hops

2D Mesh Topology (m+n) worst case hops

3D Mesh Topology (l+m+n) worst case hops

Page 31: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 31

Startup

Initialize

Compute

Communicate

Store Results

File

Process Cycle

Initialize

Compute

Communicate

Store Results

File

Process Cycle

Initialize

Compute

Communicate

Store Results

File

Process Cycle

Node 0 Node 1 Node N-1

Page 32: 1 PDMS – 2 Hour Tutorial

•  Parallel computing vs. distributed computing –  Parallel computing maps computations, data, and/or object instances of

within an application to multiple processors to obtain scalable speedup •  Normally occurs on a single multicore computer, but can operate

across multiple machines •  The entire application crashes if one node or thread crashes

–  Distributed computing interconnects loosely coupled applications within a network environment to support interoperable execution

•  Normally occurs on multiple networked machines, but can operate on a single multicore computer

•  Dynamic connectivity supports fault tolerance but loses scalability

•  Speedup(N) = T1 / TN

•  Efficiency(N) = Speedup / N

•  RelativeEfficiency(M,N) = (M / N) [Speedup(N) / Speedup(M)]

PDMS – 2 Hour Tutorial 32

Page 33: 1 PDMS – 2 Hour Tutorial

•  Time driven (or time stepping) is the simplest approach for (double time=0.0; time < END_TIME; time+=STEP) {

UpateSystem(time);

Communicate();

}

•  The discrete event approach (or event stepping) manages activities within the system more efficiently –  Events occur at a point in time and have no duration –  Events do not have to correspond to physical activities (pseudo-events) –  Events occur for individual object instances, not for the entire system –  Events when processed can modify state variables and/or schedule new

events

•  Parallel discrete event simulation offers unique synchronization challenges…

PDMS – 2 Hour Tutorial 33

Page 34: 1 PDMS – 2 Hour Tutorial

•  Distributed net-centric computing –  Programs communicate through a network interface

•  TCP/IP, HTTPS, SOA and Web Services, Client/Server, CORBA, Federations, Enterprises, Grid Computing, NCES, etc.

•  Parallel multicore computing –  Processors directly communicate through high speed mechanisms

•  Threads, shared memory, message passing

PDMS – 2 Hour Tutorial 34

Sequential Program

Multi Threaded

Shared Memory

Message Passing

Page 35: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 35

Shared Memory Server

Shared Memory Server

Shared Memory Server

Cluster Server

Parallel Application

Parallel Application

Parallel Application

Page 36: 1 PDMS – 2 Hour Tutorial

•  Startup and Terminate –  Forks processes –  Cleans up shared memory

•  Miscellaneous services –  Node info, shared memory tuning

parameters, etc.

•  Synchronization –  Hard and fuzzy barriers

•  Global reductions –  Min, Max, Sum, Product, etc. –  Performance Statistics –  Can support user-defined

operations

•  Synchronized data distribution –  Broadcast, Scatter, Gather, Form

Matrix

•  Asynchronous Message Passing

–  Unicast, destination-based multicast, broadcast

–  Automatic or user-defined memory allocation

–  Up to 256 message types

•  Coordinated Message Passing –  Patterned after the Crystal Router –  Synchronized operation

guarantees all messages received by all nodes

–  Unicast, destination-based multicast, broadcast

•  ORB Services –  Remote asynchronous method

invocation with user-specified interfaces

PDMS – 2 Hour Tutorial 36

Page 37: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 37

Node 0

Node 1

Node 2

Node 3

Node 4

Example of a global synchronization on five processing nodes

Stage 0 Stage 1 Stage 2 Stage 3

Wait Until Completed

Final Result

Page 38: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 38

Page 39: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 39

Slots (circular buffer) Node 0

Node 1

Node 2

Node 3

Tail

Head

Output Messages (circular buffer)

One shared memory block per node Slots manage incoming messages for each node

Circular buffer manages outgoing messages

Steps in sending a message: 1.  Write header and message to head in senders

output message buffer.

2.  Write index of msg header in the receiving node shared memory slot for the senders node.

Steps in receiving a message 1.  Iterate over slot mgrs to find messages

2.  Read message using index in the slot

3.  Mark the header as being read

Potential technical issues Cache coherency

Instruction synchronization

Page 40: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 40

Circular Buffer

Tail

Head

Circular Buffer

Head

Tail

Tail chasing Head Head chasing Tail

Page 41: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 41

Header 1

Header 2

Header n

int NumBytes

int Index

unsigned short Packet

unsigned short NumPackets char DummyChar0 char DummyChar1 char DummyChar2 char ReadFlag

Mes

sage

For

mat

Head

er F

orm

at

Page 42: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 42

Page 43: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 43

Page 44: 1 PDMS – 2 Hour Tutorial

SYNCHRONIZATION Parallel Discrete Event Simulation (PDES)…

PDMS – 2 Hour Tutorial 44

Page 45: 1 PDMS – 2 Hour Tutorial

•  Standardized processing cycle interfaces to support any time management algorithm –  Uses virtual functions on scheduler to specialize processing steps –  Supports reentrant applications (e.g., HPC-RTI, graphical interfaces,

etc.)

•  Highly optimized internal algorithms for managing events –  Optimized and flexible event queue infrastructure –  Native support for sequential, conservative, and optimistic processing –  Internal usage of free lists to reduce memory allocation overheads –  Optimized memory management with high speed communications

•  Statistics gathering and debug support –  Rollback and rollforward application testing –  Automatic statistics gathering (live critical path analysis, message

statistics, event processing and rollbacks, memory usage, etc.) –  Merged trace file generation for debugging parallel simulations that can

be tailored to include rollback information, performance data, and user output

PDMS – 2 Hour Tutorial 45

Page 46: 1 PDMS – 2 Hour Tutorial

•  Time Management Modes are generically implemented through class inheritance from the WpScheduler –  OpenMSA provides a generic framework to support basic parallel and

distributed event processing operations, which makes it easy to implement new time management algorithms

–  OpenMSA creates the object implementing the requested time management algorithm at run time

–  The base class WpScheduler provides generic event management services for sequential, conservative, and optimistic processing

–  WpWarpSpeed, WpSonicSpeed, WpLightSpeed, and WpHyperWarpSpeed time management objects inherit from WpScheduler to implement their specific event processing and synchronization algorithms

PDMS – 2 Hour Tutorial 46

Page 47: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 47

Execute { Initialize Process Up To (End Time) Terminate }

Process Up To (Time) { while (GVT < Time) { Process GVT Cycle } }

main { Plug in User SimObjs Plug in User Components Plug in User Events Execute }

Initialize { Launch processes Establish Communications Construct/Initialize SimObjs Schedule Initial Events }

Terminate { Terminate All SimObjs Print Final Statistics Shut Down Communications }

Process GVT Cycle { Process Events & User Functions Update GVT Commit Events Print GVT Statistics }

Page 48: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 48

*

*

*

*

Page 49: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 49

*

* 1

Page 50: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 50

Processed Events Doubly Linked List

Future Pending Events Priority Queue

Simulation Time Rollback Queue

Event Messages

Scheduler: A priority queue of Logical Processes (i.e., Simulation Objects) ordered by next event time

Simulation Time

Page 51: 1 PDMS – 2 Hour Tutorial

•  Priority queue uses new self-correcting tree data structure that employs a heuristic to keep the tree roughly balanced –  Tree data structure efficiently supports three critical operations

•  Element insertion in O(log2(n)) time •  Element retraction in O(log2(n)) time •  Element removal in O(1) time

–  Does not require storage of additional information in tree nodes to keep the tree balanced

•  Tracks depth on insert and find operations to adjust tree organization through specially combined multi-rotation operations

•  Goal is to minimize long left/left and/or right/right chains of elements in the tree

–  Competes with STL Red-Black Tree •  Beats STL when compiled unoptimized •  Slightly worse than STL when compiled optimized

PDMS – 2 Hour Tutorial 51

Page 52: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 52

OptimalDepth = log2(NumElements)NumRotations = ActualDepth −OptimalDepth

Rotation heuristic decreases depth to keep the tree roughly balanced

Page 53: 1 PDMS – 2 Hour Tutorial

•  Rollback Manager –  Manages list of rollbackable items that were created as rollbackable

operations are performed –  Each event provides a rollback manager

•  Global pointer is set before the event is processed •  Rollbacks are performed in reverse order to undo operations

•  Rollback Items –  Each rollbackable operation generates a Rollback Item that is managed

by the Rollback Manager •  Rollback utilities include (1) native data types, (2) memory

operations, (3) container classes, (4) strings, and (5) various misc. operations

–  Rollback Items inherit from the base class to provide four virtual functions

•  Rollback, Rollforward, Commit, Uncommit

PDMS – 2 Hour Tutorial 53

Page 54: 1 PDMS – 2 Hour Tutorial

•  Distributed Synchronization

•  Conservative Vs. Optimistic Algorithms

•  Rollbacks in the Time Warp Algorithm

•  The Event Horizon

•  Breathing Time Buckets

•  Breathing Time Warp

•  WarpSpeed

•  Four Flow Control Techniques

PDMS – 2 Hour Tutorial 54

Page 55: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 55

Page 56: 1 PDMS – 2 Hour Tutorial

•  Conservative algorithms impose one or more constraints –  Object interactions limited to just “neighbors” (e.g., Chandy-Misra) –  Object interactions have non-zero time scales (e.g., lookahead) –  Object interactions follow FIFO constraint

•  Optimistic algorithms impose no constraints but require a more sophisticated engine –  Support for rollbacks (and advanced features for rollforward) –  Require flow control to provide stability –  Optimistic approaches can sometimes support real-time applications

better...

•  The most important thing is for applications to develop their models to maximize parallelism –  Simulations will generally not execute in parallel faster than their critical

path

PDMS – 2 Hour Tutorial 56

Page 57: 1 PDMS – 2 Hour Tutorial

E

F

D

A

B

C

G

57 PDMS – 2 Hour Tutorial

Page 58: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 58

D FIFO Input Q

Scheduled input events and time from C

Self-scheduled events and time from D

Scheduled input events and time from E

Scheduled output events and time to F

Scheduled output events and time to B

FIFO

FIFO

FIFO Input

Q

FIFO Input

Q

Page 59: 1 PDMS – 2 Hour Tutorial

•  GVT is defined as the minimum time-tag of: –  Unprocessed event –  Unsent message –  Message or antimessage in transit

•  Theoretically, GVT changes as events are processed –  In practice, GVT is updated periodically by a GVT update algorithm

•  To correctly provide time management services to the outside world, GVT must be updated synchronously between internal nodes

PDMS – 2 Hour Tutorial 59

Page 60: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 60

Page 61: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 61

Page 62: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 62

Page 63: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 63

20,000 10,000 0 0

10

20

30

40

50

60

70

80

90

100

Time Warp

Breathing Time Buckets

Simulation Time

CPU

Tim

e Proximity Detection (32 Nodes) 259 Ground Sensors 1099 Aircraft

Page 64: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 64

20,000 10,000 0 0

100,000

200,000

300,000

400,000

500,000

Simulation Time

Even

ts a

nd R

ollb

acks

Processed Events

Time Warp Rollbacks

Breathing Time Buckets Rollbacks

Proximity Detection (32 Nodes) 259 Ground Sensors 1099 Aircraft

Page 65: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 65

Page 66: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 66

Generated Messages

Generated Messages

Page 67: 1 PDMS – 2 Hour Tutorial

•  Opposite problems when comparing Breathing Time Buckets and Time Warp

•  Imagine mapping events into a global event queue

•  Events processed by runaway nodes have good chance of being rolled back

•  Should hold back messages from runaway nodes

PDMS – 2 Hour Tutorial 67

Page 68: 1 PDMS – 2 Hour Tutorial

•  Example with four nodes –  Time Warp: Messages released as events are processed –  Breathing Time Buckets: Messages held back –  GVT: Flushes messages out of network while processing events –  Commit: Releases event horizon messages and commits events

PDMS – 2 Hour Tutorial 68

Wall Time

Page 69: 1 PDMS – 2 Hour Tutorial

•  Abstract representation of logical time uses 5 tie-breaking fields to guarantee unique time tags –  double Time Simulated physical time of the event –  int Priority1 First user settable priority field –  int Priority2 Second user settable priority field –  int Counter Event counter of the scheduling SimObj –  int UniqueId Globally unique Id of the scheduling SimObj

•  Guaranteed logical times –  The OpenUTF automatically increments the SimObj event Counter to

guarantee that each SimObj schedules its events with unique time tags •  Note, Counter may “jump” to ensure that events have increasing

time tags •  SimObj Counter = max(SimObj Counter, Event Counter) + 1

–  The OpenUTF automatically stores the UniqueId of the SimObj in event time tags to guarantee that events scheduled by different SimObjs are unique

PDMS – 2 Hour Tutorial 69

Page 70: 1 PDMS – 2 Hour Tutorial

•  Four algorithms, selectable at run-time, are currently supported in the OpenUTF reference implementation –  LightSpeed for fast sequential processing

•  Optimistic processing overheads are removed •  Parallel processing overheads are removed

–  SonicSpeed for ultra-fast sequential parallel and conservative event processing

•  Highly optimized event management (no bells and whistles) –  WarpSpeed for optimistic parallel event processing with four new flow

control techniques to ensure stability •  Cascading antimessages can be eliminated •  Individual event lookahead evaluation for message-sending risk •  Message sending risk based on uncommitted event CPU time •  Run-time adaptable flow control for risk and optimistic processing

–  HyperWarpSpeed for supporting five-dimensional simulation •  Branch excursions, event splitting/merging, parallel universes

PDMS – 2 Hour Tutorial 70

Page 71: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 71

GVT Time

GVT Time

Hold Back Messages

Ok to Send Messages

Cas

e 1

Cas

e 2

Page 72: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 72

Time Send Messages Hold Back Message

Risk Lookahead

Page 73: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 73

Time

Processing Threshold Exceeded Hold Back Messages

Tcpu

0

Tcpu

1

Tcpu

2 Tc

pu3

Tcpu

4

Tcpu

5

Tcpu

6

Page 74: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 74

NR

ollb

acks

Time

Unstable - Decrease Nopt

Stable - Slightly Increase Nopt

NA

ntim

essa

gess

Time

Unstable - Decrease Nrisk

Stable - Slightly Increase Nrisk

Page 75: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 75

Page 76: 1 PDMS – 2 Hour Tutorial

PDMS – 2 Hour Tutorial 76

Page 77: 1 PDMS – 2 Hour Tutorial

OPEN DISCUSSION Final thoughts…

PDMS – 2 Hour Tutorial 77

Page 78: 1 PDMS – 2 Hour Tutorial

•  Participate in the PDMS Standing Study Group (PDMS-SSG) –  Simulation Users –  Model Developers –  Technologists –  Sponsors –  Program Managers –  Policy Makers

•  Receive OpenUTF hands-on training for the open source reference implementation –  One-week hands-on-training events can be arranged for groups if there

is enough participation

•  Begin considering OpenUTF architecture standards –  OpenMSA… layered technology –  OSAMS… plug-and-play components –  OpenCAF… representation of intelligent behavior

PDMS – 2 Hour Tutorial 78