what mum never told me about parallel simulation

45
What Mum Never Told Me about Parallel Simulation Karim Djemame Informatics Research Lab. & School of Computing University of Leeds

Upload: ronli

Post on 02-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

What Mum Never Told Me about Parallel Simulation. K arim Djemame Informatics Research Lab. & School of Computing University of Leeds. Plan of the Lecture. Goals Learn about issues in the design and execution of Parallel Discrete Event Simulation (PADS). Overview - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What Mum Never Told Me about Parallel Simulation

What Mum Never Told Me about Parallel Simulation

Karim DjemameInformatics Research Lab. &

School of ComputingUniversity of Leeds

Page 2: What Mum Never Told Me about Parallel Simulation

Plan of the Lecture

Goals• Learn about issues in

the design and execution of Parallel Discrete Event Simulation (PADS)

Overview• Discrete Event Simulation – a

Review• Parallel Simulation – a Definition• Applications• Synchonisation Algorithms

• Conservative• Optimistic• Synchronous

• Parallel Simulation Languages• Performance Issues• Conclusion

Page 3: What Mum Never Told Me about Parallel Simulation

Why Simulation?

Mathematical models too abstract for complex systems

Building real systems with multiple configurations too expensive

Simulation is a good compromise!

Page 4: What Mum Never Told Me about Parallel Simulation

Discrete Event Simulation (DES)

• a DES system can be viewed as a collection of simulated objects and a sequence of event computations

• Changes in state of the model occur at discrete points in time

• The passage of time is modelled using a simulation clock

• Event scheduling is the most well used • provides locality in time: each event describes related

actions that may all occur in a single instant

• The model maintains a list of events (Event List) that• have been scheduled• have not occurred yet

Page 5: What Mum Never Told Me about Parallel Simulation

Processing the Event List on a Uni-processor Computer

• An event contains two fields of information- the event it represents (eg. arrival in a queue)- time of occurrence: time when the event should happen -

also timestamp

e1 e2 en 7 9 20...

EVL time event

• The event list- contains the events- is always ordered by increasing occurrence of

time• The events are processed sequentially by a single processor

Page 6: What Mum Never Told Me about Parallel Simulation

Event-Driven Simulation Engine

e1 e2 en 7 9 20...EVL

• Remove 1st event (lowest time of occurrence) from EVL• Execute corresponding event routine; modify state (S) accordingly• Based on new S, schedule new future events

e1 e2 en 7 9 20...EVL

e314

e2 e3 en 9 14 20...EVL

(1)

(2)

(3)

Page 7: What Mum Never Told Me about Parallel Simulation

Why change? It ’s so simple!

Models becomes larger and larger The simulation time is overwhelming or the

simulation is just untractable Example:

parallel programs with millions of lines of codes, mobile networks with millions of mobile hosts, Networks with hundreds of complex switches,

routers multicast model with thousands of sources, ever-growing Internet, and much more...

Page 8: What Mum Never Told Me about Parallel Simulation

Some Figures to Convince...

ATM network models Simulation at the cell-level, 200 switches 1000 traffic sources, 50Mbits/s 155Mbits/s links, 1 simulation event per cell arrival.

simulation time increases as link speed increases, usually more than 1 event per cell arrival, how scalable is traditional simulation?

More than 26 billions events to simulate 1 second!30 hours if 1 event is processed in 1us

Page 9: What Mum Never Told Me about Parallel Simulation

Motivation for Parallel Simulation

Sequential simulation very slow Sequential simulation does not exploit the

parallelism inherent in models

So why not use multiple processors ?

• Variety of parallel simulation protocols• Availability of parallel simulation tools to

achieve a certain speedup over the sequential simulator

Page 10: What Mum Never Told Me about Parallel Simulation

Processing the Event List on a Multi-Processor Computer

• The events are processed by many processors. Example:

• Processor 1 generates event 3 at 9 to be processed by processor 2

Processors

Time

p1 p2

7

9

14

Event 1

Event 2

In parallelEvent 3

• Processor 2 has already processed event 2 at 14• Problem:

- the future can affect the past !- this is the causality problem

Page 11: What Mum Never Told Me about Parallel Simulation

Causal Dependencies

e1, 7 e2, 9 e3, 14 e4, 20 e5, 27 e6, 40

e1, 7 e2, 9

e3, 14

e4, 20

e5, 27

e6, 40

EVL

EVL

• Scheduled events in timestamp order

• Sequence ordered by causal dependencies

• Causal dependencies mean restrictions• The sequence of events (e1, e2, e4, e6) can be executed in parallel with (e3, e5)• If any event were simulated with e1: violation of causal dependencies

Page 12: What Mum Never Told Me about Parallel Simulation

Parallel Simulation - Principles

Execution of a discrete event simulation on a parallel or distributed system with several physical processors

The simulation model is decomposed into several sub-models (Logical Processes, LP) that can be executed in parallel spatial partitioning LPs communicate by sending timestamped

messages

Fundamental concepts each LP can be at a different simulation time local causality constraint: events in each LP must be

executed in time stamp order

Page 13: What Mum Never Told Me about Parallel Simulation

Parallel Simulation – example 1

logical process (LP)

packetheventt

parallel

Page 14: What Mum Never Told Me about Parallel Simulation

Parallel Simulation – example 2

Logical processes (LPs) modelling airports, air traffic sectors, aircraft, etc.

LPs interact by exchanging messages (events modelling aircraft departures, landings, etc.)

LPLP

LPLP

LPLPLPLP

LPLP

Page 15: What Mum Never Told Me about Parallel Simulation

Synchronisation Mechanisms

Synchronisation Algorithms Conservative: avoids local causality violations

by waiting until it ’s safe to proceed a message or event

Optimistic: allows local causality violations but provisions are done to recover from them at runtime

Synchronous: all LPs process messages/events with the same timestamp in parallel

Page 16: What Mum Never Told Me about Parallel Simulation

PDES Applications

VLSI circuit simulation Parallel computing Communication networks Combat scenarios Health care systems Road traffic Simulation of models

Queueing networks Petri nets Finite state machines

Page 17: What Mum Never Told Me about Parallel Simulation

Conservative Protocols

Architecture of a conservative LPThe Chandy-Misra-Bryant protocolThe lookahead ability

Page 18: What Mum Never Told Me about Parallel Simulation

Architecture of a Conservative LP

LPs communicate by sending non-decreasing timestamped messages

each LP keeps a static FIFO channel for each LP with incoming communication

each FIFO channel (input channel, IC) has a clock ci that ticks according to the timestamp of the topmost message, if any, otherwise it keeps the timestamp of the last message

LPB LPA

LPC LPD

c1=tB1

tB1tB

2

tC3tC

4tC5

tD4

c2=tC3

c3=tD3

Page 19: What Mum Never Told Me about Parallel Simulation

A Simple Conservative Algorithm

each LP has to process event in time-stamp order to avoid local causality violations

The Chandy-Misra-Bryant algorithm

while (simulation is not over) { determine the ICi with the smallest Ci

if (ICi empty) wait for a message else { remove topmost event from ICi

process event }}

Page 20: What Mum Never Told Me about Parallel Simulation

Safe but Has to Block

LPB LPA

LPC LPD

36

147

10

5

IC1

IC2

IC3

min IC event

12

31

42

53

BLOCK3

61

729

Page 21: What Mum Never Told Me about Parallel Simulation

Blocks and Even Deadlocks!

SA

B

M

merge point

BLOCKED

S sends allmessages to B

444 446

Page 22: What Mum Never Told Me about Parallel Simulation

How to Solve Deadlock: Null-Messages

SA

B

M

Use of null-messages for artificial propagation of simulation time

10 10

4410 445

67

12

10

UNBLOCKED

What frequency?

Page 23: What Mum Never Told Me about Parallel Simulation

How to Solve Deadlock: Null-Messages

a null-message indicates a Lower Bound Time Stampminimum delay between links is 4LP C initially at simulation time 0

11 910 7A B C

4

LP C sends a null-message with time stamp 4

LP A sends a null-message with time stamp 8

8

LP B sends a null-message with time stamp 12

12

LP C can process event with time stamp 7

12

Page 24: What Mum Never Told Me about Parallel Simulation

The Lookahead Ability

Null-messages are sent by an LP to indicate a lower bound time stamp on the future messages that will be sent

null-messages rely on the « lookahead » ability communication link delays server processing time (FIFO)

lookahead is very application model dependent and need to be explicitly identified

Page 25: What Mum Never Told Me about Parallel Simulation

Conservative: Pros & Cons

Pros simple, easy to implement good performance when lookahead is large

(communication networks, FIFO queue) Cons

pessimistic in many cases large lookahead is essential for performance no transparent exploitation of parallelism performances may drop even with small

changes in the model (adding preemption, adding one small lookahead link…)

Page 26: What Mum Never Told Me about Parallel Simulation

Optimistic Protocols

Architecture of an optimistic LPTime Warp

Page 27: What Mum Never Told Me about Parallel Simulation

Architecture of an Optimistic LP

LPs send timestamped messages, not necessarily in non-decreasing time stamp order

no static communication channels between LPs, dynamic creation of LPs is easy

each LP processes events as they are received, no need to wait for safe events

local causality violations are detected and corrected at runtime

Most well known optimistic mechanism: Time Warp

LPB LPA

LPC LPD

tB1tB

2 tC3tC

4 tC5 tD

4

Page 28: What Mum Never Told Me about Parallel Simulation

Processing Events as They Arrive

11

LPB

13

LPD

18

LPB

22

LPC

25

LPD

28

LPC

36

LPB

32

LPD

LPB

LPA

LPC

LPD

LPA

processed!

what to do with late messages?

Page 29: What Mum Never Told Me about Parallel Simulation

TimeWarp

Do,

Undo,

Redo

Page 30: What Mum Never Told Me about Parallel Simulation

TimeWarp Rollback - How?

Late messages (stragglers) are handled with a rollback mechanism undo false/uncorrect local computations,

• state saving: save the state variables of an LP• reverse computation

undo false/uncorrect remote computations,• anti-messages: anti-messages and (real) messages

annihilate each other process late messages re-process previous messages: processed events are

NOT discarded!

Page 31: What Mum Never Told Me about Parallel Simulation

Need for a Global Virtual Time

Motivations an indicator that the simulation time advances reclaim memory (fossil collection)

Basically, GVT is the minimum of all LPs ’ logical simulation time timestamp of messages in transit

GVT garantees that events below GVT are definitive events no rollback can occur before the GVT state points before GVT can be reclaimed anti-messages before GVT can be reclaimed

Page 32: What Mum Never Told Me about Parallel Simulation

Time Warp - Overheads

Periodic state savings states may be large, very large! copies are very costly

Periodic GVT computations costly in a distributed architecture, may block computations,

Rollback thrashing cascaded rollback, no advancement!

Memory! memory is THE limitation

Page 33: What Mum Never Told Me about Parallel Simulation

Optimistic Mechanisms: Pros & Cons

Pros exploits all the parallelism in the model,

lookahead is less important transparent to the end-user can be general-purpose

Cons very complex, needs lots of memory large overheads (state saving, GVT,

rollbacks…)

Page 34: What Mum Never Told Me about Parallel Simulation

Mixed/Adaptive Approaches

General framework that (automatically) switches to conservative or optimistic

Adaptive approaches may determine at runtime the amount of conservatism or optimism

conservative optimistic

mixed

messages

performance

optimistic

conservative

Page 35: What Mum Never Told Me about Parallel Simulation

Synchronous Protocols

Architecture of a synchronous LP

Page 36: What Mum Never Told Me about Parallel Simulation

Synchronous Protocols

TOUS pour UN

et UN pour TOUS!

The Three MusketeersAlexandre Dumas (1802 – 1870)

Page 37: What Mum Never Told Me about Parallel Simulation

A Simple Synchronous Algorithm

avoids local causality violations LP: same data structures of a single sequential simulator Global clock shared among all LPS – same value Some data structures are private

LPB LPA

LPC LPC

My min timestamp is 5

My min timestamp

is 12

My min timestamp is 10

My min timestamp is 8

Global clock = 5

Page 38: What Mum Never Told Me about Parallel Simulation

A Simple Synchronous Algorithm

Clock = 0;while (simulation is not over) { t = minimum_timestamp(); clock = global_minimum(); simulate_events(clock); synchronise();}

Basic operations1. Computation of Minimum timestamp – reduction

operation2. Event Consumption3. Message distribution4. Message Reception – barrier operation

Page 39: What Mum Never Told Me about Parallel Simulation

Synchronous Mechanisms: Pros & Cons

Pros simple, easy to implement good performance if parallelism exploited

with a moderate synchonisation cost Cons

pessimistic in many cases Worst case: simulator behaves like the

sequential one performance may drop if cost of LPs

synchronisation (reduction, barrier) is high

Page 40: What Mum Never Told Me about Parallel Simulation

PDES Simulation Languages

• a number of PDES languages have been developed in recent years

• PARSEC• Compose• ModSim• etc

• Most of these languages are general purpose languages

PARSEC• Developed at UCLA Parallel Computing Lab. •Availability - http://pcl.cs.ucla.edu/projects/parsec/• Simplicity • Efficient event scheduling mechanism.

PDES Languages

Page 41: What Mum Never Told Me about Parallel Simulation

• Optimistic discrete event simulator developed by PADS group of Georgia Institute of Technology

http://www.cc.gatech.edu/computing/pads/tech-parallel-gtw.html

• Support small granularity simulation

• GTW runs on shared-memory multiprocessor machines

• Sun Enterprise, SGI Origin

• TeD: Telecommunications Description Language

•language that has been developed mainly for modeling telecommunicating network elements and protocols

• Jane: simulator-independent Client/Server-based graphical interface and scripting tool for interactive parallel simulations

•TeD/GTW simulations can be executed using the Jane system

Georgia Tech Time Warp (GTW)

Page 42: What Mum Never Told Me about Parallel Simulation

BYOwS !

• BYOwS : Build Your Own Simulator• Choose a programming language

• C, C++, Java

• Learn basic MPI• MPI: Message Passing Interface• Point-to-Point Communication• Available on the school Linux machines

• Implement a simple PDES protocol• Case study: a simple queueing network

Page 43: What Mum Never Told Me about Parallel Simulation

Parallel Simulation Today

Lots of algorithms have been proposed variations on conservative and optimistic adaptives approaches

Few end-users Compete with sequential simulators in terms of user

interface, generability, ease of use etc.

Research mainly focus on applications, ultra-large scale simulations tools and execution environments (clusters) Federated simulations

• different simulators interoperate with each other in executing a single simulation

– battle field simulation, distributed multi-user games

Page 44: What Mum Never Told Me about Parallel Simulation

Parallel Simulation - Conclusion

Pros reduction of the simulation time increase of the model size

Cons causality constraints are difficult to maintain need of special mechanisms to synchronize the

different processors increase both the model and the simulation kernel

complexity

Challenges ease of use, transparency.