deadlock detection for distributed process networks

ICASSP 2005

Deadlock Detection for Distributed Process Networks

Alex G. Olson & Brian L. EvansThe University of Texas at Austin

ICASSP 2005

Motivation for Formal Models Applications may require higher input/output and

computational rates than one CPU can handle Exploit parallelism for high performance Parallel (one machine) or distributed (many

machines) Pitfalls of parallel/distributed programming

Synchronization, shared memory, and deadlock Debugging concurrent code on many processors

Formal models have provable properties Determinacy: programs are correct by construction Validation: only debug each component separately Scalability: faster execution with more CPUs

ICASSP 2005

ApplicationsApplication Input

Data RateComputation

RateOutput

Data Rate

Sonar Beamforming[Allen & Evans 00]

160MB/s

4-20 GFLOPS 72MB/s

Bzip2 (block-zip)Compression

1-4MB/s

~1-4 GIPS (approx)

1-4MB/s

MPEG4 Encoding (4CIF)

18MB/s

~2 GIPS

~1MB/s

H.264 Video Server(QCIF) [Banerjee 02]

1MB/s

~1GIPS

~40KB/s

Design Space Exploration [Vissers & Wolf, 1999]

Image Processing [Webb et al., 1999]

ICASSP 2005

Process Networks [Kahn, 1974]

Concurrently executing processes Communicate only over one-way

unbounded channels (FIFO queues) Read one input port at a time

Node execution suspended until enough data available Data that has been read is dequeued from channel

Samples (tokens) flow along arcs Samples have value but not time information Flow of (untimed) data drives computation

Determinate execution Any scheduling algorithm that obeys above rules will

produce same history of tokens on arcs

ICASSP 2005

Bounding Size of PN Queues Bounded Scheduling [Parks & Lee, 1995]

Write to a full queue suspends node execution On global deadlock, resize smallest queue Favors incomplete bounded execution (non-determinate)

Computational PN [Allen & Evans, 2000] Processes may consume fewer tokens than read All memory allocation can be handled by queues

Bounded Scheduling [Geilen & Basten, 2003] Show local deadlock may not lead to global deadlock

Deadlock detection required for bounded communication, but no framework detects local deadlock

Artificial deadlock

ICASSP 2005

Deadlock Detection Algorithm Mitchell & Merritt’s algorithm [1984]

Detects local and global deadlocks

Exactly one process detects deadlock Simplifies deadlock resolution

Pair of labels (numbers) used for deadlock detection

Deadlock detected when a label makes a “round-trip” among set of blocked processes

ICASSP 2005

Mitchell-Merritt Example

1,1

1,1

1,3

1,3

1,2

1,2

1,4

1,4

2,1

2,1

3,3

3,3

4,2

4,2

1,4

1,4

Initial State

Blocking Step

Public (count, pid)

Private(count, pid)

BUSY Write to B

BUSY

BUSY

A B

DC

Read from C

Read from A

Arrows indicate waiting.Artificial deadlock without feedback.

BUSY

A B

DC

ICASSP 2005

Mitchell-Merritt Example

Public Label

Private LabelTransmit Step

4,2

2,1

4,2

3,3

4,2

4,2

1,4

1,4

Deadlock Detected

4,2

2,1

4,2

3,3

4,2

4,2

1,4

1,4

A B

DC

A B

DC

ICASSP 2005

Implementation Distributed framework for Computational

Process Networks TCP sockets for communication Transmit and receive queues (zero-copy) C++, POSIX threads

http://www.ece.utexas.edu/~bevans/projects/pn

ICASSP 2005

Execution Performance

Overhead <1μs per read/write

(Reader and Writer each process 1 token at a time)

4

64

16

2561024

1, 2000 0

20 000

40 000

60 000

80 000

100 000

120 000

1 10 100 1000 10000

Token Size (bytes)[log scale]

Th

rou

gh

pu

t (K

byt

es/s

)

ICASSP 2005

Execution Performance

Overhead <1μs per read/write

(Only Reader processes 1 token at a time)

4

16

64

1, 5900

1024256

0

20 000

40 000

60 000

80 000

100 000

120 000

1 10 100 1000 10000

Token Size (bytes)[log scale]

Th

rou

gh

pu

t (K

byt

es/s

)

ICASSP 2005

Conclusion Formal models simplify parallel design,

implementation, and debugging

Communication in PN model follows “Single-Resource” semantics

Mitchell-Merritt algorithm applicable to non-distributed, parallel, and distributed PN’s Can be used to implement bounded-memory

scheduling algorithms

deadlock detection for distributed process networks

Documents