deadlock detection for distributed process networks
DESCRIPTION
Deadlock Detection for Distributed Process Networks. Alex G. Olson & Brian L. Evans The University of Texas at Austin. Motivation for Formal Models. Applications may require higher input/output and computational rates than one CPU can handle Exploit parallelism for high performance - PowerPoint PPT PresentationTRANSCRIPT
ICASSP 2005
Deadlock Detection for Distributed Process Networks
Alex G. Olson & Brian L. EvansThe University of Texas at Austin
ICASSP 2005
Motivation for Formal Models Applications may require higher input/output and
computational rates than one CPU can handle Exploit parallelism for high performance Parallel (one machine) or distributed (many
machines) Pitfalls of parallel/distributed programming
Synchronization, shared memory, and deadlock Debugging concurrent code on many processors
Formal models have provable properties Determinacy: programs are correct by construction Validation: only debug each component separately Scalability: faster execution with more CPUs
ICASSP 2005
ApplicationsApplication Input
Data RateComputation
RateOutput
Data Rate
Sonar Beamforming[Allen & Evans 00]
160MB/s
4-20 GFLOPS 72MB/s
Bzip2 (block-zip)Compression
1-4MB/s
~1-4 GIPS (approx)
1-4MB/s
MPEG4 Encoding (4CIF)
18MB/s
~2 GIPS
~1MB/s
H.264 Video Server(QCIF) [Banerjee 02]
1MB/s
~1GIPS
~40KB/s
Design Space Exploration [Vissers & Wolf, 1999]
Image Processing [Webb et al., 1999]
ICASSP 2005
Process Networks [Kahn, 1974]
Concurrently executing processes Communicate only over one-way
unbounded channels (FIFO queues) Read one input port at a time
Node execution suspended until enough data available Data that has been read is dequeued from channel
Samples (tokens) flow along arcs Samples have value but not time information Flow of (untimed) data drives computation
Determinate execution Any scheduling algorithm that obeys above rules will
produce same history of tokens on arcs
ICASSP 2005
Bounding Size of PN Queues Bounded Scheduling [Parks & Lee, 1995]
Write to a full queue suspends node execution On global deadlock, resize smallest queue Favors incomplete bounded execution (non-determinate)
Computational PN [Allen & Evans, 2000] Processes may consume fewer tokens than read All memory allocation can be handled by queues
Bounded Scheduling [Geilen & Basten, 2003] Show local deadlock may not lead to global deadlock
Deadlock detection required for bounded communication, but no framework detects local deadlock
Artificial deadlock
ICASSP 2005
Deadlock Detection Algorithm Mitchell & Merritt’s algorithm [1984]
Detects local and global deadlocks
Exactly one process detects deadlock Simplifies deadlock resolution
Pair of labels (numbers) used for deadlock detection
Deadlock detected when a label makes a “round-trip” among set of blocked processes
ICASSP 2005
Mitchell-Merritt Example
1,1
1,1
1,3
1,3
1,2
1,2
1,4
1,4
2,1
2,1
3,3
3,3
4,2
4,2
1,4
1,4
Initial State
Blocking Step
Public (count, pid)
Private(count, pid)
BUSY Write to B
BUSY
BUSY
A B
DC
Read from C
Read from A
Arrows indicate waiting.Artificial deadlock without feedback.
BUSY
A B
DC
ICASSP 2005
Mitchell-Merritt Example
Public Label
Private LabelTransmit Step
4,2
2,1
4,2
3,3
4,2
4,2
1,4
1,4
Deadlock Detected
4,2
2,1
4,2
3,3
4,2
4,2
1,4
1,4
A B
DC
A B
DC
ICASSP 2005
Implementation Distributed framework for Computational
Process Networks TCP sockets for communication Transmit and receive queues (zero-copy) C++, POSIX threads
http://www.ece.utexas.edu/~bevans/projects/pn
ICASSP 2005
Execution Performance
Overhead <1μs per read/write
(Reader and Writer each process 1 token at a time)
4
64
16
2561024
1, 2000 0
20 000
40 000
60 000
80 000
100 000
120 000
1 10 100 1000 10000
Token Size (bytes)[log scale]
Th
rou
gh
pu
t (K
byt
es/s
)
ICASSP 2005
Execution Performance
Overhead <1μs per read/write
(Only Reader processes 1 token at a time)
4
16
64
1, 5900
1024256
0
20 000
40 000
60 000
80 000
100 000
120 000
1 10 100 1000 10000
Token Size (bytes)[log scale]
Th
rou
gh
pu
t (K
byt
es/s
)
ICASSP 2005
Conclusion Formal models simplify parallel design,
implementation, and debugging
Communication in PN model follows “Single-Resource” semantics
Mitchell-Merritt algorithm applicable to non-distributed, parallel, and distributed PN’s Can be used to implement bounded-memory
scheduling algorithms