1 pdms – 2 hour tutorial
TRANSCRIPT
1 PDMS – 2 Hour Tutorial
• Multicore computing revolution – The need for change…
• Proposed Open Unified Technical Framework (OpenUTF) architecture standards – OpenMSA, OSAMS, OpenCAF as future standards
• Introduction to parallel computing – Programming models – High Speed Communications (HSC) through shared memory
• Synchronization and Parallel Discrete Event Simulation (PDES) – Event Management – Time Management
• Open discussion
PDMS – 2 Hour Tutorial 2
MULTICORE Future of computing is…
PDMS – 2 Hour Tutorial 3
PDMS – 2 Hour Tutorial 4
I skate to where the puck is going to be, not where it has been! Wayne Gretzky
• Performance wall – Clock speed and power consumption – Memory access bottlenecks – Single instruction level parallelism
• Multiple processors (cores) on a single chip is the future – No foreseeable limit to the number of cores per chip – Requires software to be written differently
• Supercomputing community consensus: Low-level parallel programming is too hard – Threads, shared memory, locks/semaphores, race-conditions,
repeatability, etc., are too hard and expensive to develop and debug (fine-grained HPC is not for your average programmer)
– Message-passing is much easier but can be less efficient – High-level approaches, tools, and frameworks are needed (OpenUTF,
new compilers, languages, math libraries, memory management, etc.)
PDMS – 2 Hour Tutorial 5
Computer/Blade/Cluster
Board
BoardBoard
Chip
Node
Node
Node
Node
Chip
Node
Node
Node
Node
Chip
Node
Node
Node
Node
Chip
Node
Node
Node
Node
CloudCompu3ng,Net‐centric,GIG,
SystemsofSystems
PDMS – 2 Hour Tutorial 6
World of computing is rapidly changing and will soon demand new parallel and distributed service-oriented programming methodologies and technical frameworks.
Experts say that parallel and distributed programming is too hard for normal development teams. The Open Unified Technical Framework abstracts low-level programming details.
• Microsoft – Sponsor of the by-invitation-only 2007 Manycore Computing Workshop
that brought together the who’s who of supercomputing together – Unanimous consensus on the need for multicore computing software
tools and frameworks for developers (e.g., OpenUTF)
• Apple – Snow Leopard will have no new features (focus on multicore computing) – The next version of Apple's OS X operating system will include
breakthroughs in programming parallel processors, Apple CEO Steve Jobs told The New York Times in an interview after this week's Worldwide Developers Conference. "The way the processor industry is going is to add more and more cores, but nobody knows how to program those things," Jobs said. "I mean, two, yeah; four, not really; eight, forget it.
http://bits.blogs.nytimes.com/2008/06/10/apple-in-parallel-turning-the-pc-world-upside-down/
PDMS – 2 Hour Tutorial 7
Next generation chips Intel has disclosed details on a chip that will compete directly with Nvidia and ATI and may take it into unchartered technological and market-segment waters. Larrabee will be a stand-alone chip, meaning it will be very different than the low-end–but widely used–integrated graphics that Intel now offers as part of the silicon that accompanies its processors. And Larrabee will be based on the universal Intel x86 architecture.
…The number of cores in each Larrabee chip may vary, according to market segment. Intel showed a slide with core counts ranging from 8 to 48, claiming performance scales almost linearly as more cores are added: that is, 16 cores will offer twice the performance of eight cores.
http://i4you.wordpress.com/2008/08/05/intel-details-future-larrabee-graphics-chip
PDMS – 2 Hour Tutorial 8
Next generation chips
Intel touts 8-core Xeon monster Nehalem-EX
Intel gave a demo yesterday of its eight-core, 2.3 billion-transistor Nehalem-EX, which is set to launch later this year… Nehalem EX has up to 8 cores, which gives a total of 16 threads per socket.
By Jon Stokes | Last updated May 28, 2009 8:25 AM CT
http://arstechnica.com/hardware/news/2009/05/intel-touts-8-core-xeon-monster.ars
PDMS – 2 Hour Tutorial 9
COMPOSABLE SYSTEMS Open Unified Technical Framework (OpenUTF)…
PDMS – 2 Hour Tutorial 10
OpenUTF Kernel
Model Components
Service Components
• Simulation is not as cost effective as it should be – we need to do things differently… Revolutionary, not evolutionary change!
• Multicore computing revolution demands change in software development methodology – need standardized framework
• New architecture standards – we should be building models, not simulations
• Model and Service components developed in a common framework – automates integration for Test and Evaluation
• Verification and Validation – need a common test framework with standard processes
• Open source – Overcomes the technology/cost barrier and supports widespread community involvement
PDMS – 2 Hour Tutorial 11
PDMS – 2 Hour Tutorial 12
10 ms 1 ms 100 µs 10 µs 1 µs 100 ns 10 ns 1 ns
Requires assessment of the current state Existing tools, technologies, methodologies, data models, existing interfaces, policies, requirements, business models, contract language, lessons learned, impediments to progress, etc.
Requires the right vision for the future Lowered costs, better quality, faster end-to-end execution, easier to use and maintain, feasible technology, optimal use of workforce skill sets, multiuse concepts, composability, modern computational architectures, multiplatform, net-centric, etc.
Requires an executable transition strategy Incremental evolution, risk reduction, phased capability, accurately assessed transition costs, available funding, prioritization, community buy-in and participation, formation of new standards
PDMS – 2 Hour Tutorial 13
1. Engine and Model Separation 2. Optimized Communications 3. Abstract Time 4. Scheduling Constructs 5. Time Management 6. Encapsulated Components 7. Hierarchical Composition 8. Distributable Composites 9. Abstract Interfaces 10. Interaction Constructs 11. Publish/Subscribe 12. Data Translation Services 13. Multiple Applications
14. Platform Independence 15. Scalability 16. LVC Interoperability Standards 17. Web Services 18. Cognitive Behavior 19. Stochastic Modeling 20. Geospatial Representations 21. Software Utilities 22. External Modeling Framework 23. Output Data Formats 24. Test Framework 25. Community-wide Participation
14 PDMS – 2 Hour Tutorial
• OpenMSA – Layered Technology – Focuses on parallel and distributed computing technologies – Modularizes technologies through a layered architecture – Contains OSAMS and OpenCAF – Proven technologies based on experience with large programs – Cost effective strategy for developing scalable computing technology – Provides interoperability without sacrificing performance – Facilitates sequential, parallel, and distributed computing paradigms
• OSAMS – Model/Service Composability – Focuses on interfaces and software development methodology to
support highly interoperable plug-and-play model/service components – Provided by OpenMSA but could be supported by other architectures
• OpenCAF – Cognitive Intelligent Behavior – Thoughts and stimulus, goal-oriented behaviors, decision branch
exploration, five dimensional excursions – Provided as an extension to OSAMS
PDMS – 2 Hour Tutorial 15
OpenCAF • Behaviors • Cognitive Thought Processes
• 5D Simulation • Goal-oriented Optimization
OSAMS • Modularity • Composability • Interoperability • Flexibility • Programming Constructs
• VV&A
OpenMSA • Open Source • Technology • HPC/Multicore • Performance • Synchronization
OpenUTF • Architecture • Standards • Net-centricity • Data Models
HPC
Network
Scheduling
Modeling Framework
Services
Models
Behavior Representation
Cognitive Rule Triggering
Bayesian Branching
Goals and State Machines
Decision Support
Composites
Pub/Sub Services
LVC Interoperability
Web-based SOA
16 PDMS – 2 Hour Tutorial
Operating System Services Threads
General Software Utilities (OSAMS) ORB Network Services
Internal High Speed Communications External Distributed Communications Rollback Framework
Rollback Utilities (OSAMS) Persistence (OSAMS)
Standard Template Library (OSAMS) Event Management Services
Time Management Standard Modeling Framework (OSAMS, OpenCAF)
Distributed Simulation Management Services (OSAMS – Pub/Sub Data Distribution) SOM/FOM Data Translation Services
External Modeling Framework (EMF)
& Distributed Blackboard
Gateway Interfaces (HLA, DIS,
TENA, Web-based SOA)
HPC-RTI Bridge
Model & Service Component – Repository Entity Composite – Repository
CASE Tools
Direct Federate
Abstract Federate
HLA Federate
LVC – Federation & Enterprises
External System Visualization/Analysis
17 PDMS – 2 Hour Tutorial
18
Thought 1
Stimulus - Perception (Short Term Memory)
Thought 2
Thought N
Data Processing Behaviors, Tasks, Notifications, Abstract Methods, Uncertainty
Data Received Federation Objects and/or Interactions
Prioritized Goals State, Action & Task Management
Tasks Tasks Tasks
(5D Branching)
Reas
onin
g En
gine
PDMS – 2 Hour Tutorial
Outputs
Inputs
Left brain reasoning
Inputs are ints, doubles, or Boolean
Inputs are prioritized when they are associated with RBRs
Inputs can be fed into multiple reasoning nodes
Outputs can be inputs to other reasoning nodes
Feedback loops are permitted
W X Y Z
B C A
19
Based on OpenUTF Kernel Sensitivity List • Sensitive variables (stimulus) are registered with sensitive methods (thoughts) • Thoughts are automatically triggered whenever registered stimulus is modified • Thoughts can modify other stimulus to triggers additional thoughts • Terminates when solution converges or when reaching max thoughts
PDMS – 2 Hour Tutorial
Outputs
Inputs
Learned reasoning
Inputs are ints, doubles, or Boolean
TBR is trained and then utilized to produce outputs (can be continually trained during execution)
Inputs can be fed into multiple reasoning nodes
Outputs can be inputs to other reasoning nodes
Feedback loops are permitted
W X Y Z
B C A
20 PDMS – 2 Hour Tutorial
€
1 =ωW +ωX +ωY +ωZ
A = ωWˆ W +ωX
ˆ X +ωYˆ Y +ωZ
ˆ Z [ ] ×TW 2TX1TY1TZ 3
Right brain reasoning
Inputs are normalized, weighted, and summed
Sum is multiplied by the product of thresholds to produce the output
Output is normalized
Inputs can be fed into multiple reasoning nodes
Outputs can be inputs to other reasoning nodes
Feedback loops are permitted
ωW
TW1
TW2
TW3 0
1 TX1
TX2 0
1
TY1
0
1 TZ1
TZ3
TZ1 0
1
ωX ωY ωZ
TZ2
Output
Inputs
W X Y Z
A
21 PDMS – 2 Hour Tutorial
Arbitrary graphs can be constructed from Rules, Neural Nets, and Emotions
Outputs of graphs can trigger changes to behaviors by reprioritizing goals
Behaviors are only triggered once reasoning is completed
22
Emotion Based Reasoning
Training Based Reasoning
Rule Based Reasoning
PDMS – 2 Hour Tutorial
PDMS – 2 Hour Tutorial 23
Net Centric Enterprise Framework Composable
Systems LVC Web GCCS Data Visualization
Composable Plug and Play OpenUTF
Kernel Service
Components Model
Components Abstract
Interfaces V&V Test
Framework
Monolithic Applications • Collection of Hardwired Services
Simulations • Collection of Hardwired Models
PDMS – 2 Hour Tutorial 24
PDMS – 2 Hour Tutorial 25
• Reusable Software Components • Plug and Play Composability • Conceptual Model Interoperability • Pub/Sub Data Distribution & abstract Interfaces • V&V Test Framework • Performance Benchmarks
• Parallel and Distributed Operation • Scalable Run-time Performance • Platform/OS Independence • OpenMSA: Technology • OSAMS: Modeling Constructs • OpenCAF: Behavior Representation
• Composable Systems • LVC (HLA, DIS, TENA) • Web Services (SOA) • Data Model • C4I/GCCS • Visualization and Analysis
OpenUTF Kernel
PDMS – 2 Hour Tutorial 26
Composable System
Plug-and-play Model/Service Components
Net-centric Operation: - Enterprise Frameworks - Command and Control - Standard Data Models
Legacy Interoperability: - Distributed Federation - Training, Analysis, Test - FOM/SOM
Standalone Operation: - Laptops, Desktops, Clusters, HPC - Standalone Operation - Pub/Sub Data Distribution
• Transparently hosts hierarchical services using the same interfaces as model components
• SOAP interface connects services to external applications
• Collections of related services are dynamically configured and distributed across processors on multicore systems
• Services internally communicate through pub/sub services and decoupled abstract interfaces
• Seamlessly supports LVC integration
PDMS – 2 Hour Tutorial 27
Composite Net Centric System on Multicore Computer
Subscribed Data Received Published Data Provided
Abstract Services Provided Abstract Services Invoked
Services communicate through Pub/Sub data exchanges and abstract interfaces
Composites are distributed across processors to achieve parallel performance
Web Services
Net-centric SOA/LVC on Networks of Single-processor and Multicore Computers
Dynamically configured structure
LVC Interface
PDMS – 2 Hour Tutorial 28
Global Installation & Make System
Installation & Make System
DAS
ETS
T&D
Models
Weather
CCSI
ATP-45
Services
Polymorphic Methods
Interactions
Federation Objects
XML Interfaces
Web Services
Interfaces
Source Include Library
DAS
ETS
T&D
Weather
CCSI
ATP-45
Verification
DAS
ETS
T&D
Weather
CCSI
ATP-45
Validation
DAS
ETS
T&D
Weather
CCSI
ATP-45
Benchmarks
Tests
Component Repository
Installation & Make System
OpenUTF Kernel
320,000 Lines of Code
OpenUTF
• General concept… – Government maintained software configuration management – Automatic platform-independent installation & make system – Test framework (verification, validation, and benchmarks) – Will seamlessly support mainstream interoperability standards – Designed for secure community-wide software distribution
PDMS – 2 Hour Tutorial 29
OpenUTF Kernel
LVC Interoperability
Standards
Web Standards
Models
Services
V&V Test Framework Data &
Interfaces
Development Tools
Composability Tools
Visualization Tools
Analysis Tools
PARALLEL COMPUTING Introduction to…
PDMS – 2 Hour Tutorial 30
16 Node Hypercube Topology Log2(N) worst case hops
2D Mesh Topology (m+n) worst case hops
3D Mesh Topology (l+m+n) worst case hops
PDMS – 2 Hour Tutorial 31
Startup
Initialize
Compute
Communicate
Store Results
File
Process Cycle
Initialize
Compute
Communicate
Store Results
File
Process Cycle
Initialize
Compute
Communicate
Store Results
File
Process Cycle
Node 0 Node 1 Node N-1
• Parallel computing vs. distributed computing – Parallel computing maps computations, data, and/or object instances of
within an application to multiple processors to obtain scalable speedup • Normally occurs on a single multicore computer, but can operate
across multiple machines • The entire application crashes if one node or thread crashes
– Distributed computing interconnects loosely coupled applications within a network environment to support interoperable execution
• Normally occurs on multiple networked machines, but can operate on a single multicore computer
• Dynamic connectivity supports fault tolerance but loses scalability
• Speedup(N) = T1 / TN
• Efficiency(N) = Speedup / N
• RelativeEfficiency(M,N) = (M / N) [Speedup(N) / Speedup(M)]
PDMS – 2 Hour Tutorial 32
• Time driven (or time stepping) is the simplest approach for (double time=0.0; time < END_TIME; time+=STEP) {
UpateSystem(time);
Communicate();
}
• The discrete event approach (or event stepping) manages activities within the system more efficiently – Events occur at a point in time and have no duration – Events do not have to correspond to physical activities (pseudo-events) – Events occur for individual object instances, not for the entire system – Events when processed can modify state variables and/or schedule new
events
• Parallel discrete event simulation offers unique synchronization challenges…
PDMS – 2 Hour Tutorial 33
• Distributed net-centric computing – Programs communicate through a network interface
• TCP/IP, HTTPS, SOA and Web Services, Client/Server, CORBA, Federations, Enterprises, Grid Computing, NCES, etc.
• Parallel multicore computing – Processors directly communicate through high speed mechanisms
• Threads, shared memory, message passing
PDMS – 2 Hour Tutorial 34
Sequential Program
Multi Threaded
Shared Memory
Message Passing
PDMS – 2 Hour Tutorial 35
Shared Memory Server
Shared Memory Server
Shared Memory Server
Cluster Server
Parallel Application
Parallel Application
Parallel Application
• Startup and Terminate – Forks processes – Cleans up shared memory
• Miscellaneous services – Node info, shared memory tuning
parameters, etc.
• Synchronization – Hard and fuzzy barriers
• Global reductions – Min, Max, Sum, Product, etc. – Performance Statistics – Can support user-defined
operations
• Synchronized data distribution – Broadcast, Scatter, Gather, Form
Matrix
• Asynchronous Message Passing
– Unicast, destination-based multicast, broadcast
– Automatic or user-defined memory allocation
– Up to 256 message types
• Coordinated Message Passing – Patterned after the Crystal Router – Synchronized operation
guarantees all messages received by all nodes
– Unicast, destination-based multicast, broadcast
• ORB Services – Remote asynchronous method
invocation with user-specified interfaces
PDMS – 2 Hour Tutorial 36
PDMS – 2 Hour Tutorial 37
Node 0
Node 1
Node 2
Node 3
Node 4
Example of a global synchronization on five processing nodes
Stage 0 Stage 1 Stage 2 Stage 3
Wait Until Completed
Final Result
PDMS – 2 Hour Tutorial 38
PDMS – 2 Hour Tutorial 39
Slots (circular buffer) Node 0
Node 1
Node 2
Node 3
Tail
Head
Output Messages (circular buffer)
One shared memory block per node Slots manage incoming messages for each node
Circular buffer manages outgoing messages
Steps in sending a message: 1. Write header and message to head in senders
output message buffer.
2. Write index of msg header in the receiving node shared memory slot for the senders node.
Steps in receiving a message 1. Iterate over slot mgrs to find messages
2. Read message using index in the slot
3. Mark the header as being read
Potential technical issues Cache coherency
Instruction synchronization
PDMS – 2 Hour Tutorial 40
Circular Buffer
Tail
Head
Circular Buffer
Head
Tail
Tail chasing Head Head chasing Tail
PDMS – 2 Hour Tutorial 41
Header 1
Header 2
Header n
int NumBytes
int Index
unsigned short Packet
unsigned short NumPackets char DummyChar0 char DummyChar1 char DummyChar2 char ReadFlag
Mes
sage
For
mat
Head
er F
orm
at
PDMS – 2 Hour Tutorial 42
PDMS – 2 Hour Tutorial 43
SYNCHRONIZATION Parallel Discrete Event Simulation (PDES)…
PDMS – 2 Hour Tutorial 44
• Standardized processing cycle interfaces to support any time management algorithm – Uses virtual functions on scheduler to specialize processing steps – Supports reentrant applications (e.g., HPC-RTI, graphical interfaces,
etc.)
• Highly optimized internal algorithms for managing events – Optimized and flexible event queue infrastructure – Native support for sequential, conservative, and optimistic processing – Internal usage of free lists to reduce memory allocation overheads – Optimized memory management with high speed communications
• Statistics gathering and debug support – Rollback and rollforward application testing – Automatic statistics gathering (live critical path analysis, message
statistics, event processing and rollbacks, memory usage, etc.) – Merged trace file generation for debugging parallel simulations that can
be tailored to include rollback information, performance data, and user output
PDMS – 2 Hour Tutorial 45
• Time Management Modes are generically implemented through class inheritance from the WpScheduler – OpenMSA provides a generic framework to support basic parallel and
distributed event processing operations, which makes it easy to implement new time management algorithms
– OpenMSA creates the object implementing the requested time management algorithm at run time
– The base class WpScheduler provides generic event management services for sequential, conservative, and optimistic processing
– WpWarpSpeed, WpSonicSpeed, WpLightSpeed, and WpHyperWarpSpeed time management objects inherit from WpScheduler to implement their specific event processing and synchronization algorithms
PDMS – 2 Hour Tutorial 46
PDMS – 2 Hour Tutorial 47
Execute { Initialize Process Up To (End Time) Terminate }
Process Up To (Time) { while (GVT < Time) { Process GVT Cycle } }
main { Plug in User SimObjs Plug in User Components Plug in User Events Execute }
Initialize { Launch processes Establish Communications Construct/Initialize SimObjs Schedule Initial Events }
Terminate { Terminate All SimObjs Print Final Statistics Shut Down Communications }
Process GVT Cycle { Process Events & User Functions Update GVT Commit Events Print GVT Statistics }
PDMS – 2 Hour Tutorial 48
*
*
*
*
PDMS – 2 Hour Tutorial 49
*
* 1
PDMS – 2 Hour Tutorial 50
Processed Events Doubly Linked List
Future Pending Events Priority Queue
Simulation Time Rollback Queue
Event Messages
Scheduler: A priority queue of Logical Processes (i.e., Simulation Objects) ordered by next event time
Simulation Time
• Priority queue uses new self-correcting tree data structure that employs a heuristic to keep the tree roughly balanced – Tree data structure efficiently supports three critical operations
• Element insertion in O(log2(n)) time • Element retraction in O(log2(n)) time • Element removal in O(1) time
– Does not require storage of additional information in tree nodes to keep the tree balanced
• Tracks depth on insert and find operations to adjust tree organization through specially combined multi-rotation operations
• Goal is to minimize long left/left and/or right/right chains of elements in the tree
– Competes with STL Red-Black Tree • Beats STL when compiled unoptimized • Slightly worse than STL when compiled optimized
PDMS – 2 Hour Tutorial 51
PDMS – 2 Hour Tutorial 52
€
OptimalDepth = log2(NumElements)NumRotations = ActualDepth −OptimalDepth
Rotation heuristic decreases depth to keep the tree roughly balanced
• Rollback Manager – Manages list of rollbackable items that were created as rollbackable
operations are performed – Each event provides a rollback manager
• Global pointer is set before the event is processed • Rollbacks are performed in reverse order to undo operations
• Rollback Items – Each rollbackable operation generates a Rollback Item that is managed
by the Rollback Manager • Rollback utilities include (1) native data types, (2) memory
operations, (3) container classes, (4) strings, and (5) various misc. operations
– Rollback Items inherit from the base class to provide four virtual functions
• Rollback, Rollforward, Commit, Uncommit
PDMS – 2 Hour Tutorial 53
• Distributed Synchronization
• Conservative Vs. Optimistic Algorithms
• Rollbacks in the Time Warp Algorithm
• The Event Horizon
• Breathing Time Buckets
• Breathing Time Warp
• WarpSpeed
• Four Flow Control Techniques
PDMS – 2 Hour Tutorial 54
PDMS – 2 Hour Tutorial 55
• Conservative algorithms impose one or more constraints – Object interactions limited to just “neighbors” (e.g., Chandy-Misra) – Object interactions have non-zero time scales (e.g., lookahead) – Object interactions follow FIFO constraint
• Optimistic algorithms impose no constraints but require a more sophisticated engine – Support for rollbacks (and advanced features for rollforward) – Require flow control to provide stability – Optimistic approaches can sometimes support real-time applications
better...
• The most important thing is for applications to develop their models to maximize parallelism – Simulations will generally not execute in parallel faster than their critical
path
PDMS – 2 Hour Tutorial 56
E
F
D
A
B
C
G
57 PDMS – 2 Hour Tutorial
PDMS – 2 Hour Tutorial 58
D FIFO Input Q
Scheduled input events and time from C
Self-scheduled events and time from D
Scheduled input events and time from E
Scheduled output events and time to F
Scheduled output events and time to B
FIFO
FIFO
FIFO Input
Q
FIFO Input
Q
• GVT is defined as the minimum time-tag of: – Unprocessed event – Unsent message – Message or antimessage in transit
• Theoretically, GVT changes as events are processed – In practice, GVT is updated periodically by a GVT update algorithm
• To correctly provide time management services to the outside world, GVT must be updated synchronously between internal nodes
PDMS – 2 Hour Tutorial 59
PDMS – 2 Hour Tutorial 60
PDMS – 2 Hour Tutorial 61
PDMS – 2 Hour Tutorial 62
PDMS – 2 Hour Tutorial 63
20,000 10,000 0 0
10
20
30
40
50
60
70
80
90
100
Time Warp
Breathing Time Buckets
Simulation Time
CPU
Tim
e Proximity Detection (32 Nodes) 259 Ground Sensors 1099 Aircraft
PDMS – 2 Hour Tutorial 64
20,000 10,000 0 0
100,000
200,000
300,000
400,000
500,000
Simulation Time
Even
ts a
nd R
ollb
acks
Processed Events
Time Warp Rollbacks
Breathing Time Buckets Rollbacks
Proximity Detection (32 Nodes) 259 Ground Sensors 1099 Aircraft
PDMS – 2 Hour Tutorial 65
PDMS – 2 Hour Tutorial 66
Generated Messages
Generated Messages
• Opposite problems when comparing Breathing Time Buckets and Time Warp
• Imagine mapping events into a global event queue
• Events processed by runaway nodes have good chance of being rolled back
• Should hold back messages from runaway nodes
PDMS – 2 Hour Tutorial 67
• Example with four nodes – Time Warp: Messages released as events are processed – Breathing Time Buckets: Messages held back – GVT: Flushes messages out of network while processing events – Commit: Releases event horizon messages and commits events
PDMS – 2 Hour Tutorial 68
Wall Time
• Abstract representation of logical time uses 5 tie-breaking fields to guarantee unique time tags – double Time Simulated physical time of the event – int Priority1 First user settable priority field – int Priority2 Second user settable priority field – int Counter Event counter of the scheduling SimObj – int UniqueId Globally unique Id of the scheduling SimObj
• Guaranteed logical times – The OpenUTF automatically increments the SimObj event Counter to
guarantee that each SimObj schedules its events with unique time tags • Note, Counter may “jump” to ensure that events have increasing
time tags • SimObj Counter = max(SimObj Counter, Event Counter) + 1
– The OpenUTF automatically stores the UniqueId of the SimObj in event time tags to guarantee that events scheduled by different SimObjs are unique
PDMS – 2 Hour Tutorial 69
• Four algorithms, selectable at run-time, are currently supported in the OpenUTF reference implementation – LightSpeed for fast sequential processing
• Optimistic processing overheads are removed • Parallel processing overheads are removed
– SonicSpeed for ultra-fast sequential parallel and conservative event processing
• Highly optimized event management (no bells and whistles) – WarpSpeed for optimistic parallel event processing with four new flow
control techniques to ensure stability • Cascading antimessages can be eliminated • Individual event lookahead evaluation for message-sending risk • Message sending risk based on uncommitted event CPU time • Run-time adaptable flow control for risk and optimistic processing
– HyperWarpSpeed for supporting five-dimensional simulation • Branch excursions, event splitting/merging, parallel universes
PDMS – 2 Hour Tutorial 70
PDMS – 2 Hour Tutorial 71
GVT Time
GVT Time
Hold Back Messages
Ok to Send Messages
Cas
e 1
Cas
e 2
PDMS – 2 Hour Tutorial 72
Time Send Messages Hold Back Message
Risk Lookahead
PDMS – 2 Hour Tutorial 73
Time
Processing Threshold Exceeded Hold Back Messages
Tcpu
0
Tcpu
1
Tcpu
2 Tc
pu3
Tcpu
4
Tcpu
5
Tcpu
6
PDMS – 2 Hour Tutorial 74
NR
ollb
acks
Time
Unstable - Decrease Nopt
Stable - Slightly Increase Nopt
NA
ntim
essa
gess
Time
Unstable - Decrease Nrisk
Stable - Slightly Increase Nrisk
PDMS – 2 Hour Tutorial 75
PDMS – 2 Hour Tutorial 76
OPEN DISCUSSION Final thoughts…
PDMS – 2 Hour Tutorial 77
• Participate in the PDMS Standing Study Group (PDMS-SSG) – Simulation Users – Model Developers – Technologists – Sponsors – Program Managers – Policy Makers
• Receive OpenUTF hands-on training for the open source reference implementation – One-week hands-on-training events can be arranged for groups if there
is enough participation
• Begin considering OpenUTF architecture standards – OpenMSA… layered technology – OSAMS… plug-and-play components – OpenCAF… representation of intelligent behavior
PDMS – 2 Hour Tutorial 78