a complexity theory for parallel computation · a complexity theory for parallel computation the...

48
A Complexity Theory for Parallel Computation The questions: I What is the computing power of “reasonable” parallel computation models? Is it possible to make technology independent statements? I Is it possible to relate parallel computations to sequential computations? I What are the limits of parallel computations: are there algorithmic problems with efficient sequential computations which are hard to parallelize? The answers: I we relate parallel time and sequential space I and define the notion of P -completeness which allows to identify algorithmic problems which are (apparently) hard to parallelize. 1 / 48

Upload: buiduong

Post on 04-Jun-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

A Complexity Theory for Parallel Computation

The questions:I What is the computing power of “reasonable” parallel computation

models?Is it possible to make technology independent statements?

I Is it possible to relate parallel computations to sequentialcomputations?

I What are the limits of parallel computations:are there algorithmic problems with efficient sequential computationswhich are hard to parallelize?

The answers:I we relate parallel time and sequential spaceI and define the notion of P-completeness which allows to identify

algorithmic problems which are (apparently) hard to parallelize.

1 / 48

Parallel Time

The question: which algorithmic problems with input size n can besolved by a parallel computer in time T (n)?

Observe that we bound the computation time, but not the numberof processors.

I We assume an unlimited supply of processors.I Initially only n processors are active.I An active processor may activate at most one inactive processor

per step.Which parallel computer? Any “reasonable” parallel computer,maybe the latest computer generation in 2100!Is there a connection between parallel time and sequential space?

I Intuition: it is possible to simulate a reasonable parallel computerwith run time T (n) in sequential space poly(T (n)).

I Can we simulate a sequential computer with space complexity T (n)by a parallel machine with run time poly(T (n))?

A Complexity Theory for Parallel Computation Space Complexity 2 / 48

Space Complexity I

The architecture of an off-line Turing machine:

an input tape which stores the input $w1 · · ·wn$. The dollar symbolis used as end marker, w1, . . . ,wn belong to the input alphabet Σ.

a work tape which is infinite to the left and to the right. The tapealphabet is binary.

an output tape which is infinite to the right. Cells store letters froman out alphabet Γ.

Modifying the tapes:I The input tape has a read-only head which can move to the left or

right neighbor neighbor of a cell. Initially the head visits the cellstoring the leftmost letter of the input.

I The work tape has a read/write head with identical movementoptions. Initially all cells store the blank symbol B.

I The output tape has a write-only head.

A Complexity Theory for Parallel Computation Space Complexity 3 / 48

Space Complexity II

The computation of an off-line Turing machine M:

The Turing machine computes byI reading its input,I modifying its work tapeI and printing onto its output tape.

If the machine stops with output 1, then we say that the machineaccepts the input; the machine rejects the input, if it writes a 0.For Turing machine M and input alphabet Σ,L(M) = w ∈ Σ∗ | M accepts w is the language accepted by M.The space complexity of M:

I spaceM(w) is the number of different cells of the work tape whichare visited when computing on w .

I spaceM(n) := maxw∈Σn spaceM(w) is the worst-case spacecomplexity of M on inputs of length n.

A Complexity Theory for Parallel Computation Space Complexity 4 / 48

Space Complexity III

Let s : N→ N be a function. Then

DSPACE(s) = L ⊆ 0,1∗ | L = L(M) for a Turing machine Mwith spaceM(n) = O(s(n))

is the complexity class of all languages computable in (deterministic)space s.

Important space classes:

DSPACE(0) is the class of regular languages.DL = DSPACE(log2 n): deterministic logarithmic space is requiredto remember an input position.PSPACE =

⋃k∈N DSPACE(nk ) is an extremely powerful

complexity class containing P, NP, polynomial time quantumcomputations . . .

A Complexity Theory for Parallel Computation Space Complexity 5 / 48

How Robust Are Space Complexity Classes?

Space classes do not change, if we replace Turing machines byreasonable deterministic machine models.What about nondeterministic machines?

I Define nspaceM(w) as the maximal number of cells of the work tape,which are visited during an accepting computation on input w .

I nspaceM(n) := maxw∈Σn∩L(M) nspaceM(w) is the nondeterministicspace of M for input size n.

I Finally: NSPACE(s) :=

L ⊆ 0,1∗ | L = L(M) for a nondeterministic Turingmachine with nspaceM(n) = O(s(n))

.

Theorem of Savitch: NSPACE(s) ⊆ DSPACE(s2).Important space class: NL = NSPACE(log2 n), all languagesrecognizable in nondeterministic logarithmic space.

A Complexity Theory for Parallel Computation Space Complexity 6 / 48

The Computation Graph of A Nondeterministic TM

Let M be a nondeterministic off-line Turing machine with spacecomplexity at most s. Let w be an input of M.

A configuration consists of the contents of the work tape, thecurrent state and positions of the input and work tape heads.The computation graph GM(w) of M on input w :

I The configurations on input w are the nodes of GM(w).I An edge (k1, k2) from configuration k1 to configuration k2 belongs to

GM(w) iff configuration k2 is a successor of k1.I An additional node yes is added and, for any accepting

configuration k , edges (k , yes) are inserted.

M accepts input w iff there is a path in GM(w) from the initialconfiguration k0 to node yes.If M uses space s(n), then M has at most2O(s(n)) ·O(1) · (n + 2) · s(n) = n · 2O(s(n)) configurations.

A Complexity Theory for Parallel Computation Space Complexity 7 / 48

NC and Logarithmic Space

- DL ⊆ NL- and any language in NL can be accepted in time O(log2

2 n) with apolynomial number of processors.

DL ⊆ NL: obvious.Let L be an arbitrary language in NL.

I Let L = L(M) for a nondeterministic Turing machine usinglogarithmic space.

I GM(w) has N = poly(|w |) nodes and can be constructed inlogarithmic time with a polynomial number of processors:

reserve one processor for any pair of configurations to check,whether the pair corresponds to a transition.

I M accepts w iff there is a path in GM(w) from the initialconfiguration k0 to the unique accepting configuration yes.

We need a real fast solution of the transitive closure problem:O(log2

2 N) time with poly(N) processors will do.

A Complexity Theory for Parallel Computation Logarithmic Space 8 / 48

The Transitive Closure Problem Revisited

Determine the transitive closure of a graph G with N nodes.Use circuits.

We have parallelized Warshall’s algorithm , but its speed isinsufficient.Instead we use boolean matrix multiplication.

I Let A be the adjacency matrix of G. Set all diagonal entries to one(i.e., insert self loops) and call the new matrix B.

I B[i , j] = 1 iff there is a path of length one from i to j .I And inductively, Br+1[i , j] =

∨Nk=1 Br [i , k ] ∧ B[k , j]: Br+1[i , j] = 1 iff

there is a path of length r from i to k and an edge from k to j .I There is a path from i to j iff BN−1[i , j] = 1.

Compute B,B2, . . . ,B2j,B2j+1

, . . . ,B2tfor t = dlog2 Ne.

t = dlog2 Ne matrix products. Per matrix product depth O(log2 N)and size O(N3). All in all: depth O(log2

2 N) and size O(N3 · log2 N).

A Complexity Theory for Parallel Computation Logarithmic Space 9 / 48

Uniform Circuit Families

We consider ¬,∧,∨-circuits as a parallel computing model.

- Depth corresponds to parallel time.- Is there a connection to sequential space?

To apply circuits to inputs of various input length, we have to workwith families (Sn | n ∈ N) of circuits Sn for input size n.We should require that the description of the family is onlymoderately complex.

I We say that the circuit family is uniform if, given input 1n, its circuitSn can be described by a deterministic off-line Turing machine Mwith space complexity O(log2 size(Sn)).

I M has to eventually print all edges of Sn onto its output tape as wellas all nodes with their assigned operation, resp. with their assignedinput bit.

A Complexity Theory for Parallel Computation Parallel Time and Sequential Space 10 / 48

Parallel Time and Sequential Space I

Assume that s(n) = Ω(log2 n). A nondeterministic off-line Turingmachine M with space complexity s(n) can be simulated by a uniformcircuit family (Sn | n ∈ N) in time O(s2(n)).

Input w belongs to L(M) iff there is a path in GM(w) from the initialconfiguration k0 to the accepting configuration yes.Let N be the number of nodes of GM(w).

I Compute the transitive closure of GM(w) in depth O(log22 N).

I We are required to build a uniform circuit family.F Configurations have length O(log2 n + s(n))F and space O(log2(n) + s(n)) is sufficient to output all valid transitions

between configurations on input 1n.

We are done, since N = 2O(s(n)) and log22 N = s2(n).

A Complexity Theory for Parallel Computation Parallel Time and Sequential Space 11 / 48

Parallel Time and Sequential Space II

A uniform circuit family (Sn | n ∈ N) of depth s(n) can be simulated bya deterministic off-line Turing machine M within space O(s2(n)).

How does M work?Set n = |w |. Since Sn has depth s(n), the circuit has at most2s(n)+1 − 1 nodes.

I Evaluate Sn via depth-first search with a stack of height O(s(n)).I Since the stack holds nodes from the circuit, all we need is space

complexity O(s2(n)).

Indeed, already space O(s(n)) suffices. Why?We have shown the Theorem of Savitch:

I A nondeterministic Turing machine with space s(n) can besimulated by a uniform circuit family of depth O(s2(n)).

I The uniform circuit family can be simulated by a deterministicTuring machine with space O(s2(n)).

A Complexity Theory for Parallel Computation Parallel Time and Sequential Space 12 / 48

The Parallel Computation Thesis

We have found a quadratic relationship between the depth ofcircuits and the space complexity of Turing machines.The depth of circuit families and the computing time of PRAMs isalso polynomially related.The following conjecture (parallel computation thesis) is thereforenatural:

parallel time for any reasonable parallel computing model ispolynomially related to the space complexity for sequentialcomputations.

The thesis is mainly of theoretical interest, since the size of theparallel computing model is not restricted.However, when trying to design a parallel algorithm,

I first check if low-space algorithms exist.I If not, then there may be only little speedup possible.

A Complexity Theory for Parallel Computation Parallel Time and Sequential Space 13 / 48

The Class NC

NC is the class of all problems with parallel algorithmsrunning in time poly(log2 n)

with a polynomial number of processors.

NC is a subset of P: each parallel step can be simulated inpolynomial time.As a consequence NC is the class of problems with

efficient sequential algorithms and drastic parallel speedups.

The ClassNC 14 / 48

Which Algorithmic Problems Belong To NC?

Many important problems of linear algebra:I the matrix-vector product,I the matrix-matrix product,I computing the determinant and solving linear systems,I and the Fast Fourier Transform.

Many dynamic programming problems such as:I the transitive closure problem,I the all-pairs-shortest path problem,I pairwise alignmentI and the RNA secondary structure prediction.

computing connected components and minimum spanning trees.Sorting.All context-free languages (and hence all regular languages).

P-Completeness 15 / 48

Which Problems Probably Do Not Belong To NC?

Which languages in P do not admit “extreme” speedups?Important observation: there are hardest languages L ∈ P:

I if L has real fast parallel algorithms with polynomially manyprocessors, then all problems in P have fast algorithms withpolynomially many processors.

I Hence, in all likelihood a hardest language has no extremespeedups.

The roadmap:I Define reductions to compare languages with respect to their

potential speedup.I Show that many important hardest languages exist.

P-Completeness 16 / 48

Reductions

L1 is reducible to L2 (L1 ≤par L2), iff there is a “translation” T withw ∈ L1 ⇔ T (w) ∈ L2.

T has to be computable in poly-logarithmic time with a polynomialnumber of processors.

Assume L1 ≤par L2 and that L2 belongs to NC.The reduction should compare languages with respect toachievable speedups. Does L1 also belong to NC?

I run the translation T on input w to compute T (w) inpoly-logarithmic time with a polynomial number of processors.

I Since L2 ∈ NC, determine whether T (w) belongs to L2 inpoly-logarithmic time with a polynomial number of processors.

I Accept w for language L1 iff T (w) ∈ L2.

And L1 also belongs to NC.

P-Completeness Parallel Reductions 17 / 48

P-complete Problems: The Concept

- A language K is P-hard iff L ≤par K for all languages L ∈ P.- K is P-complete iff K is P-hard and K belongs to P.

What happens, if a P-hard language K belongs to NC?

We show that all languages in P have “extreme” speedups.Let L ∈ P be arbitrary.

I L ≤par K , since K is P-hard.I But then L ∈ NC, as we have just seen, since K belongs to NC.

If some P-hard language belongs to NC, then P = NC and allproblems in P have extreme speedups.

P-Completeness Parallel Reductions 18 / 48

How To Show P-Completeness?

- If L1 ≤par L2 and L2 ≤par L3, then L1 ≤par L3.- If K is P-hard and K ≤par L, then L is P-hard.

Why is the reduction transitive?I If L1 ≤par L2 holds because of translation T1I and if L2 ≤par L3 holds because of translation T2,I then L1 ≤par L3 holds because of translation T2 T1.

Assume that K is P-hard and that K ≤par L holds.I Let M ∈ P be arbitrary. Then M ≤par K , since K is P-hard.I Moreover K ≤par L holds. And M ≤par L follows by transitivity.I Hence L is P-hard.

P-Completeness Parallel Reductions 19 / 48

The Circuit Value Problem

Let S be a circuit with a single output gate.Determine whether S accepts an input w .

I.e., CVP = (S,w) | the circuit S accepts w .

CVP is an easy problem for sequential computation:evaluate a circuit with, say, depth-first search.

Hence CVP ∈ P.However a parallel evaluation in poly-logarithmic time seems hardfor circuits of large depth.

Show that CVP is P-complete.

P-Completeness The Circuit Value Problem 20 / 48

Proof Outline

Show L ≤par CVP for an arbitrary language L ∈ P.

What do we know about L?There is a Turing machine M with a single tape which computes intime t(n) = O(nk ) and

w ∈ L⇔ M accepts w .

The goal: simulate M by a circuit family (Sn | n ∈ N) of polynomialsize such that

M accepts w ⇔ S|w | accepts w .

Set T (w) = (S|w |,w). Then

w ∈ L⇔ M accepts w ⇔ S|w | accepts w ⇔ (S|w |,w) ∈ CVP.

How to choose S|w |?

P-Completeness The Circuit Value Problem 21 / 48

The Two-Dimensional Structure of S|w |

Assume that w is input for M and that w has length n.

M runs in time t(n) and hence it visits only the cells withaddresses −t(n),−t(n) + 1, · · · ,−1,0,+1, · · · , t(n).

I Here we assume that letter wi is written on the cell with address iI and that the read/write head of M initially “sits” on the cell with

address 0.We simulate M on input w with a circuit Sn which is built like atwo-dimensional mesh.

I The i th “row” of Sn reflects the configuration of M on input w at timei and its nodes correspond to the tape cells,

I remember the current state of M and indicate the current position ofthe read/write head.

The circuit has to compute the configuration at time i + 1 from theconfiguration at time i .

P-Completeness The Circuit Value Problem 22 / 48

Updating Configurations

We work with small subcircuits Si,j which are responsible for cell jof the tape at time i .

I Subcircuit Si+1,j “asks” subcircuit Si,j if the head is visiting cell j .I If the answer is negative, then Si+1,j assigns the letter remembered

by Si,j to cell j .I If the answer is positive:

F Si,j also stores the current state which it transmits together with thecurrent contents of cell j to Si+1,j .

F Si+1,j determines the new contents of cell j , the new state and theneighbor which will be scanned at time i + 1.

All subcircuits Si,j are identical except for the subcircuits S0,j ,since cell j stores wj for 1 ≤ j ≤ n and the blank symbol otherwise.A binary tree on top of row t(n) has to be added to check whetherthe final state is accepting.Sn can be constructed in logarithmic time with polynomially manyprocessors.

P-Completeness The Circuit Value Problem 23 / 48

Difficult Variants of CVP

In M-CVP we assume that S is a monotone circuit and askwhether S accepts input w .

A circuit is monotone, if it does not have negations.

M2-CVP is defined as M-CVP, except that the circuit is required tohave fan-out at most two.In NOR-CVP the circuit is built from NOR-gates only.

Remember that NOR(u, v) = ¬(u ∨ v).

We show that all variants are P-complete.All three problems belong to P.CVP ≤parM-CVP, M-CVP ≤par M2-CVP and CVP ≤par NOR-CVP.

P-Completeness The Circuit Value Problem 24 / 48

M-CVP: Pushing Negations to the Sources I

S is a ¬,∧,∨-circuit.Is there an equivalent circuit S′ which uses negations only forinput bits?

Push negations to the inputs using the De’Morgan rules¬(x ∧ y) ≡ ¬x ∨ ¬y and ¬(x ∨ y) ≡ ¬x ∧ ¬y .

However there is a problem:I The result of a gate may be needed in its negated as well as in its

unnegated form.I The equivalent circuit S′ should provide both results.

S′ will at most double in size. Why?

P-Completeness The Circuit Value Problem 25 / 48

M-CVP: Pushing Negations to the Sources II

The circuit S′ has two gates upos and uneg for any gate u of S.I If u is an input bit xi , then upos = xi and uneg = ¬xi .I upos keeps the functionality of u, uneg interchanges ∧ and ∨.

For any edge (u, v) of S, the equivalent circuit S′ has two edges,namely (upos, vpos) and (uneg, vneg).

x1 x2 x3

¬

x1 x2 x3 ¬x3 ¬x2 ¬x1

]

6

M ]

6

M ]

6

]

O

M

-

k 3

P-Completeness The Circuit Value Problem 26 / 48

M-CVP is P-Complete

For any circuit S there is an equivalent circuit S′ with negations only forinputs. S′ can be constructed in logarithmic time from S. The size of S′

at most doubles in comparison to S.

Let (S,w) be an input for CVP.I Construct the circuit S′, remove all negations from S′ and call the

resulting monotone circuit Sm.I We have to modify the input for Sm, since we removed negations:

F The input for a not-negated input gate is left unchanged,F whereas the input for a negated input gate is flipped.F Call the new input wm.

I Define the translation T (S,w) = (Sm,wm). Then

(S,w) ∈ CVP ⇔ S accepts w ⇔ Sm accepts wm

⇔ (Sm,wm) ∈ M-CVP.

We have verified the reduction CVP ≤par M-CVP, since T can becomputed in logarithmic time.

P-Completeness The Circuit Value Problem 27 / 48

M2-CVP and NOR-CVP are P-Complete

The construction of the reduction M-CVP ≤par M2-CVP is left asan exercise.To show: CVP ≤par NOR-CVP.

I NOR is a basis, sinceF ¬u ≡ NOR(u, u)F u ∧ v ≡ NOR(¬u,¬v)F and u ∨ v ≡ NOR(NOR(u, v),NOR(u, v)).

I If (S,w) is an input of CVP:F replace all gates by small NOR-circuits to obtain an equivalent

NOR-circuit S∗.F The translation (S,w) 7→ (S∗,w) establishes the wanted reduction to

NOR-CVP.

P-Completeness The Circuit Value Problem 28 / 48

Linear Programming

Minimize a linear objective function cT · x =∑n

i=1 cixi over the realnumbers subject to

linear constraints∑n

j=1 A[i , j] · xj ≥ bi for j = 1, . . . ,mand to non-negativity constraints x1 ≥ 0, . . . , xn ≥ 0.

In shorthand:

min cT · x s.t. A · x ≥ b and x ≥ 0,

A is the m × n constraint matrix with A = (A[i , j])1≤i≤m,1≤j≤n,c = (c1, . . . , cn) is the vector of coefficients of the objectivefunctionand b = (b1, . . . ,bm) is the vector of right hand sides.

P-Completeness Linear Programming 29 / 48

Linear Programming: What Is Known?

Efficient sequential solutions exist, namely the ellipsoid method orinterior point methods.May be the most powerful optimization method with efficientalgorithms. For instance the bipartite matching problem for thegraph G:

I max∑

e∈E ye s.t. AG · y ≤ 1 and y ≥ 0.I The incidence matrix AG of G has a row for every node u ∈ V0 ∪ V1

and a column for every edge e ∈ E : AG[u,e] = 1 iff node u is anendpoint of edge e and AG[u,e] = 0 otherwise.

I Although fractional solutions are allowed, all vertices of thepolytope are integral.

We show that the general linear programming problem isP-complete. However fast parallel algorithms exist, if allparameters (A, b and c) are nonnegative.

P-Completeness Linear Programming 30 / 48

Linear Programming As A Decision Problem

The Linear Inequalities problem (LIP):I We are given an m × n matrix A ∈ Zm·n and a vector b ∈ Zm.I The pair (A,b) belongs to LIP iff the system A · x ≥ b of linear

inequalities has a solution.

The Linear Programming Problem (LPP)I We receive the same input as in LIP, but obtain additionally a vector

c ∈ Zn and a rational number α.I The quadruple (A,b, c, α) belongs to LPP iff there is a vector x that

solves the linear system A · x ≥ b and also satisfies cT · x ≤ α.I LPP is the language version of the Linear Programming Problem:

minimize the linear function cT · x subject to the linear constraintsA · x ≥ b.

P-Completeness Linear Programming 31 / 48

Linear Inequalities are P-Complete

Show the reduction M-CVP ≤par LIP.

Let (S,w) be an input of M-CVP.Assign linear inequalities to any node c of the monotone circuit S.

I c is a source storing the i th input bit: use the inequalitiesxc ≥ 0, −xc ≥ 0, if wi = 0 and xc ≥ 1, −xc ≥ −1, if wi = 1.

I c ≡ a ∧ b is an AND-gate: use the inequalitiesxa − xc ≥ 0, xb − xc ≥ 0, xc − xa − xb ≥ −1, xc ≥ 0.

I c ≡ a ∨ b is an OR-gate: now choosexc − xa ≥ 0, xc − xb ≥ 0, xa + xb − xc ≥ 0, − xc ≥ −1.

Show by induction on the depth of a gate c of S:xc = the value of gate c for input w .

Finally, if t is the output gate of the circuit: add the inequalityxt ≥ 1 to the linear system and

(S,w) ∈ CVP⇔ S accepts w ⇔ the linear system is solvable.

P-Completeness Linear Programming 32 / 48

Linear Programming is P-Complete

Show the reduction LIP ≤par LPP.

Choose the translation (A,b) 7→ (A,b, ~0,0).The existence of a solution x with A · x ≥ b and 0T · x ≥ 0 isequivalent to the existence of a solution of A · x ≥ b.

Good parallel algorithms exist for positive linear programs:I minimize cT · x such that A · x ≥ b and x ≥ 0.I The constraint matrix A, as well as the right hand side b and the

coefficient vector c of the objective function are non-negative.

P-Completeness Linear Programming 33 / 48

Network Flow

An input for the flow problem consists of- a directed graph G = (V ,E),- two distinguished nodes, the source s ∈ V and the sink t ∈ V- and a capacity function c : E → R.

A function f : E → R is a flow, if 0 ≤ f (e) ≤ c(e) for all edgese ∈ E and if flow conservation∑

u∈V ,(u,v)∈E

f (u, v) =∑

u∈V ,(v ,u)∈E

f (v ,u)

holds for any node v ∈ V \ s, t:the amount of flow entering v equals the amount of flow leaving v .Flow conservation is not required for the source or the sink.

A flow f is maximal, if f maximizes

|f | =∑

u∈V ,(s,u)∈E

f (s,u)−∑

u∈V ,(u,s)∈E

f (u, s),

i.e., if the net flow pumped out of s is maximal.P-Completeness Network Flow 34 / 48

The Flow Problem

The integrality property holds: if all capacities are integral, thenthere is a maximal flow which is integral.As a consequence, the bipartite matching problem

determine a largest collection of node-disjoint edges for a givenbipartite graph B = (V0 ∪ V1,E)

is a special case of the flow problem:I Add a source s and a sink t to B.I Connect s to all nodes in V0 and all nodes in V1 to t by edges of

capacity one.I All edges of B receive capacity one.

The Flow Problem FPAn input (G, s, t , c) belongs to FP iff all capacities are integral and themaximal flow is odd.

P-Completeness Network Flow 35 / 48

M2-CVP ≤par FP

Let (S,w) be an input for M2-CVP and let G = (V ,E) be the directedgraph of the monotone circuit S.

Assumptions on S:I the output gate tG of S is an OR-gate: if not, then replace S by two

copies of S which are fed into an OR-gate.I each input node of G has fan-out one: otherwise insert additional

copies.I All other nodes have fanout at most two.

The graph G∗ for FP:I add two new nodes s and t to G.I The source s is connected to all input gates of G and the output

gate tG of S is connected to the sink t .I Finally add edges from OR-gates to s as well as edges from

AND-gates to t .

fanout(z) counts the number of edges of G leaving z.However set fanout(tG) = 1 to account for the edge (tg , t).

P-Completeness Network Flow 36 / 48

Defining Capacities

- Assume that G has exactly n nodes.- Determine a topological sort nr : V → 1, . . . ,n of G.

If z is an input node for variable xi , then

c(s, z) =

2n−nr(z) wi = 1,0 otherwise.

If z is an interior node with immediate predecessors x and y : thenc(x , z) = 2n−nr(x) and c(y , z) = 2n−nr(y).

I If z is an AND-gate: c(z, t) = 2n−nr(x) + 2n−nr(y) − fanout(z) · 2n−nr(z),I If z is an OR-gate: c(z, s) = 2n−nr(x) + 2n−nr(y) − fanout(z) · 2n−nr(z).

Set c(tg , t) = 1.I Observe c(tG, s) = 2n−nr(x) + 2n−nr(y) − 1.I If tG receives the maximal possible flow 2n−nr(x) + 2n−nr(y),

then the edge (tG, t) can transport the remaining flow of one to t .I All other nodes have links to t of even capacity.

P-Completeness Network Flow 37 / 48

Capacities Of An AND-Gate

]

vt

2n−nr(u) 2n−nr(w)

fanout(v) · 2n−nr(v)

−fanout(v) · 2n−nr(v)

sink

2n−nr(u) + 2n−nr(w)

9

u w

6

P-Completeness Network Flow 38 / 48

The Maximal Flow f ∗

Claim: The translation (S,w) 7→ (G∗, s, t , c) establishes a reductionfrom M2-CVP to FP.

Define a flow f ∗:If z is an input gate: set f ∗(s, z) = c(s, z).

I A flow of 2n−nr(z) is pumped into z whenever the input bit of gate zis one and a zero flow otherwise.

I There is one edge leaving z which has to be filled to its capacity2n−nr(z).

If z is an interior gate which evaluates to one: fill its fan-out(z)many leaving G-edges to capacity and push the excess flow to srespectively to t .

If tG “fires”, then push flow one along edge (tG, t).If z is an interior gate which evaluates to zero: push zero flowacross its leaving G-edges and any excess flow to s respectivelyto t .

If tG does not fire, then push flow zero along edge (tG, t).

P-Completeness Network Flow 39 / 48

The Cut

Partition the nodes of G∗ into the setsV1 = s ∪ z ∈ G|the gate z evaluates to one V0 = t ∪ z ∈ G|the gate z evaluates to zero .

all V1 → V0 edges of G∗ are filled to capacity:I G-edges from a node in V1 to a node in V0 are filled to capacity by

definition of f ∗.I Edges from AND-gates in V1 to t are also filled to capacity,I whereas edges from OR-gates in V1 to s stay in V1.I The edge (tG, t) is also filled to capacity, provided tG fires.

All V0 → V1 edges of G∗ carry flow zero:I G-edges from a node in V0 to a node in V1 carry flow zero.I Edges from AND-gates in V0 to t stay within V0.I An OR-gate in V0 receives zero flow along its two G-edges and

hence an OR-gate in V0 has no flow to distribute.I The edge (tG, t) stays inside V0, if tG ∈ V0,

P-Completeness Network Flow 40 / 48

Proof of Optimality

|f ∗| =∑

x∈V1,y∈V0,(x ,y)∈E

c(x , y).

|g| ≤∑

x∈V1,y∈V0,(x ,y)∈E c(x , y) for any flow g,since s ∈ V1 and t ∈ V0.f ∗ is a maximal flow. As a consequence:

I if tG fires, then |f ∗| is odd, since all edges carry even flow except forthe edge (tG, t) which carries flow 1.

I If tG does not fire, then |f ∗| is even, since all edges carry even flow.

P-Completeness Network Flow 41 / 48

FP is P-Complete

(S,w) ∈ M2-CVP ⇔ S is a monotone circuit with fanout twoand S accepts w

⇔ the maximal flow in (G∗, s, t , c) is odd⇔ (G∗, s, t , c) ∈ FP.

The translation (S,w)→ (G∗, s, t , c) establishes the reductionM2-CVP ≤par FP.Is it necessary to use capacities of exponentional size? Yes.

FP with capacities of polynomial size can be solved with arandomized NC-algorithm.

P-Completeness Network Flow 42 / 48

Lexicographically First Maximal Independent Set

Determine a maximal independent set for an undirected graphG = (1, . . . ,n,E):

I := ∅.for v = 1 to n do

If (v is not connected with a node from I) thenI = I ∪ v.

I is a maximal independent set: any node v belongs either to I oris connected with an edge to a node in I.In the lexicographically first maximal independent set problem(LFMIS) we are to determine if a node v belongs to theindependent set I computed by the for-loop.

I LFMIS is trivial sequentially, since we only have to run the for-loop.I We show that LFMIS is P-complete and hence parallelizing

for-loops is non-trivial.

P-Completeness LFMIS 43 / 48

The All-Pairs-Longest-Path Problem

Solve the all-pairs-longest-path problem for a directed acyclic graphwith n nodes in time O(log2

2 n) with n3 processors.

Reduce the all-pairs-longest-path problem to matrix multiplication:I Define a non-standard matrix multiplication for n × n matrices

X ∗ Y [u, v ] = maxw∈V

X [u,w ] + Y [w , v ].

max replaces addition and addition replaces multiplication.I Let A be the adjacency matrix of G and assume that the diagonal of

A is zero, i.e., G has no self-loops. Then

Ak [u, v ] = maxw1,...,wk−1

A[u,w1] + A[w1,w2] + · · ·+ A[wk−1, v ].

Since G is acyclic, no node wi is traversed twice and An−1[u, v ] isthe length of a longest path from u to v .

P-Completeness LFMIS 44 / 48

Topological Sort

For the all-pairs-longest path problem:I To compute An−1: compute the matrix powers

A2, . . . ,A2i,A2i+1

, . . . ,A2tfor t = dlog2 ne.

I The all-pairs-longest path problem can be solved in time O(log22 n)

with n3 processors.

The topological sort of a directed acyclic graph G is a bijectionπ : V → 1, . . .n with π(u) < π(v) for any edge (u, v).

I Let length(v) be the length of a longest path ending in v .I Replace each node v by the pair (length(v), v) and sort all pairs.I All in all, time O(log2

2 n) with n3 processors is sufficient to perform atopological sort.

P-Completeness LFMIS 45 / 48

LFMIS is P-Complete

Construct the reduction NOR-CVP ≤par LFMIS.

Let (S,w) be an input for NOR-CVP and let G be the directedgraph for the NOR-circuit S.Important observation: all gates with value 1 form an independentset. Why?

A NOR-gate NOR(u, v) “fires” iff u = v = 0.The idea:

I modify the graph G i.e., rename nodes, to obtain a graph G∗ suchthat the for-loop picks all nodes with value 1.

I Then ask whether the output gate t of S belongs to I.I Done, since

(S,w) ∈ NOR-CVP⇔ t fires ⇔ t ∈ I ⇔ (G∗, t) ∈ LFMIS,provided the translation (S,w)→ (G∗, t) can be computed inpoly-logarithmic time.

P-Completeness LFMIS 46 / 48

The Translation (S,w)→ (G∗, t)

Determine a topological sort for the graph G of S.

To determine the translation:I replace the name of a node by its rank within the topological sort.I Thus the smallest number is given to a source which may however

evaluate to zero. But the node with smallest number will belong tothe independent set!

I Invent a new node 0, connect 0 with all sources which evaluate to 0and disregard edge direction. Let G∗ be the new (undirected) graph.

What does the for-loop do initially:I The new node 0 is included into I and automatically all sources with

value 0 will be disregarded.

Does the reduction work?

P-Completeness LFMIS 47 / 48

Proof of Correctness

The for-loop determines the independent setI = 0 ∪ z | z evaluates to one .

We show the claim by induction on the depth of a node. The claimis correct, if z = 0 or if z is a previous source of G.If z = NOR(u, v) and z fires:

I u = v = 0, since z = 1.I By induction hypothesis u 6∈ I and v 6∈ I.I z is included into I.

If z = NOR(u, v) and z does not fire:I u = 1 or v = 1 since z = 0.I By induction hypothesis u ∈ I or v ∈ II and hence z will be left out.

P-Completeness LFMIS 48 / 48