coordination capacity - arxivindex terms—common randomness, cooperation capacity, co-ordination...

26
arXiv:0909.2408v2 [cs.IT] 26 May 2010 1 Coordination Capacity Paul Cuff, Member, IEEE, Haim Permuter, Member, IEEE, and Thomas M. Cover, Fellow, IEEE Abstract—We develop elements of a theory of cooperation and coordination in networks. Rather than considering a commu- nication network as a means of distributing information, or of reconstructing random processes at remote nodes, we ask what dependence can be established among the nodes given the communication constraints. Specifically, in a network with communication rates {Ri,j } between the nodes, we ask what is the set of all achievable joint distributions p(x1, ..., xm) of actions at the nodes of the network. Several networks are solved, including arbitrarily large cascade networks. Distributed cooperation can be the solution to many problems such as distributed games, distributed control, and establishing mutual information bounds on the influence of one part of a physical system on another. Index Terms—Common randomness, cooperation capacity, co- ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma, task assignment, Wyner common information. I. I NTRODUCTION C OMMUNICATION is required to establish cooperative behavior. In a network of nodes where relevant informa- tion is known at only some nodes in the network, finding the minimum communication requirements to coordinate actions can be posed as a network source coding problem. This diverges from traditional source coding. Rather than focus on sending data from one point to another with a fidelity constraint, we consider the communication needed to establish coordination summarized by a joint probability distribution of behavior among all nodes in the network. A large variety of research addresses the challenge of collecting or moving information in networks. Network coding [1] seeks to efficiently move independent flows of informa- tion over shared communication links. On the other hand, distributed average consensus [2] involves collecting related information. Sensors in a network collectively compute the average of their measurements in a distributed fashion. The network topology and dynamics determine how many rounds of communication among neighbors are needed to converge to the average and how good the estimate will be at each node [3]. Similarly, in the gossiping Dons problem [4], each node starts with a unique piece of gossip, and one wishes to know how many exchanges of gossip are required to make everything known to everyone. Computing functions in a network is considered in [5], [6], and [7]. This work was partially supported by the National Science Foundation (NSF) through the grant CCF-0635318. Paul Cuff is with the Department of Electrical Engineering at Princeton University and can be reached at [email protected]. Haim Permuter is with the Department of Electrical Engineering at Ben Gurion University and can be reached at [email protected]. Thomas M. Cover is with the Department of Electrical Engineering at Stanford University and can be reached at [email protected]. Our work, introduced in [8], has several distinctions from the network communication examples mentioned. First, we keep the purpose for communication very general, which means sometimes we get away with saying very little about the information in the network while still achieving the desired coordination. We are concerned with the joint distribution of actions taken at the various nodes in the network, and the “information” that enters the network is nothing more than actions that are selected randomly by nature and assigned to certain nodes. Secondly, we consider quantization and rates of communication in the network, as opposed to only counting the number of exchanges. We find that we can gain efficiency by using vector quantization specifically tailored to the network topology. Figure 1 shows an example of a network with rate-limited communication links. In general, each node in the network performs an action where some of these actions are selected randomly by nature. In this example, the source set S indicates which actions are chosen by nature: Actions X 1 , X 2 , and X 3 are assigned randomly according to the joint distribution p 0 (x 1 ,x 2 ,x 3 ). Then, using the communication and common randomness that is available to all nodes, the actions Y 1 , Y 2 , and Y 3 outside of S are produced. We ask, which conditional distributions p(y 1 ,y 2 ,y 3 |x 1 ,x 2 ,x 3 ) are compatible with the network constraints. X 1 X 2 X 3 Y 1 Y 2 Y 3 R 1 R 2 R 3 R 4 R 5 R 6 R 7 p 0 (x 1 ,x 2 ,x 3 ) S P p0 (R)= {p(y 1 ,y 2 ,y 3 |x 1 ,x 2 ,x 3 )} Fig. 1. Coordination capacity. This network represents the general framework we consider. The nodes in this network have rate-limited links of communi- cation between them. Each node performs an action. The actions X 1 , X 2 , and X 3 in the source set S are chosen randomly by nature according to p 0 (x 1 ,x 2 ,x 3 ), while the actions Y 1 , Y 2 , and Y 3 are produced based on the communication and common randomness in the network. What joint distributions p 0 (x 1 ,x 2 ,x 3 )p(y 1 ,y 2 ,y 3 |x 1 ,x 2 ,x 3 ) can be achieved? A variety of applications are encompassed in this frame- work. This could be used to model sensors in a sensor network, sharing information in the standard sense, while also cooperating in their transmission of data. Similarly, a

Upload: others

Post on 30-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

arX

iv:0

909.

2408

v2 [

cs.IT

] 26

May

201

01

Coordination CapacityPaul Cuff,Member, IEEE, Haim Permuter,Member, IEEE, and Thomas M. Cover,Fellow, IEEE

Abstract—We develop elements of a theory of cooperation andcoordination in networks. Rather than considering a commu-nication network as a means of distributing information, orof reconstructing random processes at remote nodes, we askwhat dependence can be established among the nodes giventhe communication constraints. Specifically, in a network withcommunication rates Ri,j between the nodes, we ask whatis the set of all achievable joint distributions p(x1, ..., xm) ofactions at the nodes of the network. Several networks are solved,including arbitrarily large cascade networks.

Distributed cooperation can be the solution to many problemssuch as distributed games, distributed control, and establishingmutual information bounds on the influence of one part of aphysical system on another.

Index Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, sourcecoding, strong Markov lemma, task assignment, Wyner commoninformation.

I. I NTRODUCTION

COMMUNICATION is required to establish cooperativebehavior. In a network of nodes where relevant informa-

tion is known at only some nodes in the network, finding theminimum communication requirements to coordinate actionscan be posed as a network source coding problem. Thisdiverges from traditional source coding. Rather than focuson sending data from one point to another with a fidelityconstraint, we consider the communication needed to establishcoordination summarized by a joint probability distribution ofbehavior among all nodes in the network.

A large variety of research addresses the challenge ofcollecting or moving information in networks. Network coding[1] seeks to efficiently move independent flows of informa-tion over shared communication links. On the other hand,distributed average consensus [2] involves collecting relatedinformation. Sensors in a network collectively compute theaverage of their measurements in a distributed fashion. Thenetwork topology and dynamics determine how many roundsof communication among neighbors are needed to converge tothe average and how good the estimate will be at each node [3].Similarly, in the gossiping Dons problem [4], each node startswith a unique piece of gossip, and one wishes to know howmany exchanges of gossip are required to make everythingknown to everyone. Computing functions in a network isconsidered in [5], [6], and [7].

This work was partially supported by the National Science Foundation(NSF) through the grant CCF-0635318.

Paul Cuff is with the Department of Electrical Engineering at PrincetonUniversity and can be reached at [email protected].

Haim Permuter is with the Department of Electrical Engineering at BenGurion University and can be reached at [email protected].

Thomas M. Cover is with the Department of Electrical Engineering atStanford University and can be reached at [email protected].

Our work, introduced in [8], has several distinctions fromthe network communication examples mentioned. First, wekeep the purpose for communication very general, whichmeans sometimes we get away with saying very little aboutthe information in the network while still achieving the desiredcoordination. We are concerned with the joint distributionofactions taken at the various nodes in the network, and the“information” that enters the network is nothing more thanactions that are selected randomly by nature and assignedto certain nodes. Secondly, we consider quantization andrates of communication in the network, as opposed to onlycounting the number of exchanges. We find that we can gainefficiency by using vector quantization specifically tailored tothe network topology.

Figure 1 shows an example of a network with rate-limitedcommunication links. In general, each node in the networkperforms an action where some of these actions are selectedrandomly by nature. In this example, the source setS indicateswhich actions are chosen by nature: ActionsX1, X2, andX3 are assigned randomly according to the joint distributionp0(x1, x2, x3). Then, using the communication and commonrandomness that is available to all nodes, the actionsY1, Y2,andY3 outside ofS are produced. We ask, which conditionaldistributionsp(y1, y2, y3|x1, x2, x3) are compatible with thenetwork constraints.

PSfrag replacements

X1

X2

X3

Y1

Y2

Y3

R1

R2

R3

R4

R5

R6

R7

p0(x1, x2, x3)

S

Pp0(R) = p(y1, y2, y3|x1, x2, x3)Fig. 1. Coordination capacity.This network represents the general frameworkwe consider. The nodes in this network have rate-limited links of communi-cation between them. Each node performs an action. The actions X1, X2,and X3 in the source setS are chosen randomly by nature according top0(x1, x2, x3), while the actionsY1, Y2, and Y3 are produced based onthe communication and common randomness in the network. What jointdistributionsp0(x1, x2, x3)p(y1, y2, y3|x1, x2, x3) can be achieved?

A variety of applications are encompassed in this frame-work. This could be used to model sensors in a sensornetwork, sharing information in the standard sense, whilealso cooperating in their transmission of data. Similarly,a

Page 2: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

2

wireless ad hoc network can improve performance by coop-erating among nodes to allow beam-forming and interferencealignment. On the other hand, some settings do not involvemoving information in the usual sense. The nodes in thenetwork might comprise a distributed control system, wherethe behavior at each node must be related to the behavior atother nodes and the information coming into the system. Also,with computing technology continuing to move in the directionof parallel processing, even across large networks, a networkof computers must coherently perform computations whiledistributing the work load across the participating machines.Alternatively, the nodes might each be agents taking actionsin a multiplayer game.

Network communication can be revisited from the view-point of coordinated actions. Rate distortion theory becomesa special case. More generally, we ask how we can builddependence among the nodes. What is it good for? How dowe use it?

In this paper we deal with two fundamentally differentnotions of coordination which we distinguish asempiricalcoordinationand strong coordination, both associated with adesired joint distribution of actions. Empirical coordination isachieved if the joint type of the actions in the network—theempirical joint distribution—is close to the desired distribu-tion. Techniques from rate-distortion theory are relevanthere.Strong coordination instead deals with the joint probabilitydistribution of the actions. If the actions in the network aregenerated randomly so that a statistician cannot reliably distin-guish (as measured by total variation) between the constructedn-length sequence of actions and random samples from thedesired distribution, then strong coordination is achieved. Theapproach and proofs in this framework are related to thecommon information work by Wyner [9].

Before developing the mathematical formulation, considerthe first surprising observation.

No communication:Suppose we have three nodes choosingactions and no communication is allowed between the nodes(Fig. 2). We assume that common randomness is available toall the nodes. What is the set of joint distributionsp(x, y, z)that can be achieved at these isolated nodes? The answer turnsout to be any joint distribution whatsoever. The nodes canagree ahead of time on how they will behave in the presenceof common randomness (for example, a time stamp used as aseed for a random number generator). Any triple of randomvariables can be created as functions of common randomness.

This would seem to be the end of the problem, but theproblem changes dramatically when one of the nodes isspecified by nature to take on a certain value, as will be thecase in each of the scenarios following.

An eclectic collection of work, ranging from game theory toquantum information theory, has a number of close relation-ships to our approach and results. For example, Anantharamand Borkar [10] let two agents generate actions for a multi-player game based on correlated observations and commonrandomness and ask what kind of correlated actions areachievable. From a quantum mechanics perspective, Barnumet. al. [11] consider quantum coding of mixed quantum states.Kramer and Savari [12] look at communication for the purpose

PSfrag replacements

X = X(ω)

Y = Y (ω)

Z = Z(ω)

P = p(x, y, z)

Fig. 2. No communication.Any distribution p(x, y, z) can be achievedwithout communication between nodes. Define three random variablesX(·),Y (·), and Z(·) with the appropriate joint distribution, on the standardprobability space(Ω,B,P), and let the actions at the nodes beX(ω), Y (ω),andZ(ω), whereω ∈ Ω is the common randomness.

of “communicating probability distributions” in the sensethatthey care about reconstructing a sequence with the properempirical distribution of the sources rather than the sourcesthemselves. Weissman and Ordentlich [13] make statementsabout the empirical distributions of sub-blocks of source andreconstruction symbols in a rate-constrained setting. AndHanand Verdu [14] consider generating a random process via useof a memoryless channel, while Bennett et. al. [15] proposea “reverse Shannon theorem” stating the amount of noise freecommunication necessary to synthesize a memoryless channel.

In this work, we consider coordination of actions in twoand three node networks. These serve as building blocks forunderstanding larger networks. Some of the actions at thenodes are given by nature, and some are constructed by thenode itself. We describe the problem precisely in Section II.For some network settings we characterize the entire solution,but for others we give partial results including bounds andsolutions to special cases. The complete results are presentedin Section III and include a variant of the multiterminal sourcecoding problem. Among the partial results of Section IV, aconsistent trend in coordination strategies is identified,andthe golden ratio makes a surprise appearance.

In Section V we consider strong coordination. We charac-terize the communication requirements in a couple of settingsand discuss the role of common randomness. If commonrandomness is available to all nodes in the network thenempirical coordination and strong coordination seem to requireequivalent communication resources, consistent with the impli-cations of the “reverse Shannon theorem” [15]. Furthermore,we can quantify the amount of common randomness needed,treating common randomness itself as a scarce resource.

Rate-distortion regions are shown to be projections of thecoordination capacity region in Section VI. The proofs forall theorems are presented together in Section VII, where weintroduce a stronger Markov Lemma (Theorem 12) that maybe broadly useful in network information theory. In our closingremarks we show cases where this work can be extrapolated tolarge networks to identify the efficiency of different networktopologies.

II. EMPIRICAL COORDINATION

In this section and the next we address questions of thefollowing nature: If three different tasks are to be performedin a shared effort between three people, but one person is

Page 3: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

3

randomly assigned his responsibility, how much must he tellthe others about his assignment in order to divide the labor?

A. Problem specifics

The definitions in this section pinpoint the concept ofempirical coordination. We will consider coordination in avariety of two and three node networks. The basic meaning ofempirical coordination is the same for each network—we usethe network communication to construct a sequence of actionsthat have an empirical joint distribution closely matchingadesired distribution. What’s different from one problem tothenext is the set of nodes whose actions are selected randomlyby nature and the communication limitations imposed by thenetwork topology.

Here we define the problem in the context of the cascadenetwork of Section III-C shown in Figure 3. These definitionshave obvious generalizations to other networks.

PSfrag replacements

Node X

i(Xn, ω)

Node Yyn(I, ω)

j(I, ω)

Node Zzn(J, ω)

Xn

I ∈ [2nR1 ] J ∈ [2nR2 ]

Y n Zn

∼ ∏ni=1 p0(xi)

Fig. 3. Cascade network.Node X is assigned actionsXn chosen by natureaccording top(xn) =

∏ni=1

p0(xi). A messageI in the set1, ...,2nR1is constructed based onXn and the common randomnessω and sent to NodeY, which constructs both an action sequenceY n and a messageJ in the set1, ...,2nR2. Finally, Node Z produces actionsZn based on the messageJ and the common randomnessω. This is summarized in Figure 4.PSfrag replacements

X ∼ p0(x)

Y Z

R1 R2Node X Node Y Node Z

Fig. 4. Shorthand notation for the cascade network of Figure3.

In the cascade network of Figure 3, nodeX has a sequenceof actionsX1, X2, ... specified randomly by nature. Note thata node is allowed to see all of its actions before it summarizesthem for the next node. Communication is used to give NodeY and NodeZ enough information to choose sequencesof actions that are empirically correlated withX1, X2, ...according to a desired joint distributionp0(x)p(y, z|x). Thecommunication travels in a cascade, first from NodeX toNodeY at rateR1 bits per action, and then from NodeY toNodeZ at rateR2 bits per action.

Specifically, a(2nR1 , 2nR2 , n) coordination codeis used asa protocol to coordinate the actions in the network for a blockof n time periods. The coordination code and the distributionof the random actionsXn induce a joint distribution onthe actions in the network. If the joint type of the actionsin the network can be made arbitrarily close to a desireddistribution p0(x)p(y, z|x) with high probability, as dictatedby the distribution induced by a(2nR1 , 2nR2 , n)) coordinationcode, thenp0(x)p(y, z|x) is achievable with the rate pair(R1, R2).

Definition 1 (Coordination code). A (2nR1 , 2nR2 , n) coordi-nation code for the cascade network of Figure 3 consists offour functions—an encoding function

i : Xn × Ω −→ 1, ..., 2nR1,a recoding function

j : 1, ..., 2nR1 × Ω −→ 1, ..., 2nR2,and two decoding functions

yn : 1, ..., 2nR1 × Ω −→ Yn,

zn : 1, ..., 2nR2 × Ω −→ Zn.

Definition 2 (Induced distribution). The induced distributionp(xn, yn, zn) is the resulting joint distribution of the actionsin the networkXn, Y n, and Zn when a (2nR1 , 2R2 , n)coordination code is used.

Specifically, the actionsXn are chosen by nature i.i.d. ac-cording top0(x) and independent of the common randomnessω. Thus, Xn and ω are jointly distributed according to aproduct distribution,

(Xn, ω) ∼ p(ω)

n∏

i=1

p0(xi).

The actionsY n andZn are functions ofXn andω given byimplementing the coordination code as

Y n = yn(i(Xn, ω), ω),

Zn = zn(j(i(Xn, ω), ω), ω).

Definition 3 (Joint type). The joint typePxn,yn,zn of a tupleof sequences(xn, yn, zn) is the empirical probability massfunction, given by

Pxn,yn,zn(x, y, z) ,1

n

n∑

i=1

1((xi, yi, zi) = (x, y, z)),

for all (x, y, z) ∈ X×Y×Z, where1 is the indicator function.

Definition 4 (Total variation). The total variation between twoprobability mass functions is half theL1 distance betweenthem, given by

‖p(x, y, z)− q(x, y, z)‖TV ,1

2

x,y,z

|p(x, y, z)− q(x, y, z)|.

Definition 5 (Achievability). A desired distributionp0(x)p(y, z|x) is achievable for empirical coordinationwith the rate pair (R1, R2) if there exists a sequence of(2nR1 , 2nR2 , n) coordination codes and a choice ofp(ω)such that the total variation between the joint type of theactions in the network and the desired distribution goes tozero in probability (under the induced distribution). Thatis,

‖PXn,Y n,Zn(x, y, z)− p0(x)p(y, z|x)‖TV −→ 0 in probability.

We now define the region of all rate-distribution pairs inDefinition 6 and slice it into rates for a given distributionin Definition 7 and distributions for a given set of rates inDefinition 8.

Page 4: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

4

Definition 6 (Coordination capacity region). The coordinationcapacity regionCp0 for the source distributionp0(x) is the clo-sure of the set of rate-coordination tuples(R1, R2, p(y, z|x))that are achievable:

Cp0 , Cl

(R1, R2, p(y, z|x)) :p0(x)p(y, z|x) is achievable at rates(R1, R2)

.

Definition 7 (Rate-coordination region). The rate-coordination region Rp0 is a slice of the coordinationcapacity region corresponding to a fixed distributionp(y, z|x):

Rp0(p(y, z|x)) , (R1, R2) : (R1, R2, p(y, z|x)) ∈ Cp0.

Definition 8 (Coordination-rate region). The coordination-rate regionPp0 is a slice of the coordination capacity regioncorresponding to a tuple of rates(R1, R2):

Pp0(R1, R2) , p(y, z|x) : (R1, R2, p(y, z|x)) ∈ Cp0.

B. Preliminary observations

Lemma 1 (Convexity of coordination). Cp0 , Rp0 , andPp0 areall convex sets.

Proof: The coordination capacity regionCp0 is convexbecause time-sharing can be used to achieve any point on thechord between two achievable rate-coordination pairs. Simplycombine two sequences of coordination codes that achievethe two points in the coordination capacity region by usingone code and then the other in a proportionate manner toachieve any point on the chord. The definition of joint typein Definition 3 involves an average over time. Thus if onesequence is concatenated with another sequence, the resultingjoint type is a weighted average of the joint types of the twocomposing sequences. Rates of communication also combineaccording to the same weighted average. The rate of theresulting concatenated code is the weighted average of thetwo rates.

The rate-coordination regionRp0 is the intersection of thecoordination capacity regionCp0 with a hyperplane, which areboth convex sets. Likewise for the coordination-rate regionPp0 . Therefore,Rp0 andPp0 are both convex.

Common randomness used in conjunction with randomizedencoders and decoders can be a crucial ingredient for somecommunication settings, such as secure communication. Wesee, for example, in Section V that common randomness is avaluable resource for achieving strong coordination. However,it does not play a necessary role in achieving empiricalcoordination, as the following theorem shows.

Theorem 2 (Common randomness doesn’t help). Any desireddistribution p0(x)p(y, z|x) that is achievable for empiricalcoordination with the rate pair(R1, R2) can be achieved withΩ = ∅.

Proof: Suppose thatp0(x)p(y, z|x) is achievable forempirical coordination with the rate pair(R1, R2). Then thereexists a sequence of(2nR1 , 2nR2 , n) coordination codes forwhich the expected total variation between the joint type andp(x, y, z) goes to zero with respect to the induced distribution.

This follows from the bounded convergence theorem sincetotal variation is bounded by one. By iterated expectation,

E[

E[

‖PXn,Y n,Zn − p0(x)p(y, z|x)‖TV |ω]]

=

E ‖PXn,Y n,Zn − p0(x)p(y, z|x)‖TV .

Therefore, there exists a valueω∗ such that

E[

‖PXn,Y n,Zn − p0(x)p(y, z|x)‖TV |ω∗] ≤E ‖PXn,Y n,Zn − p0(x)p(y, z|x)‖TV .

Define a new coordination code that doesn’t depend onω and at the same time doesn’t increase the expected totalvariation:

i∗(xn) = i(xn, ω∗),

j∗(i) = j(i, ω∗),

yn∗(i) = Y n(i, ω∗),

zn∗(j) = Zn(j, ω∗).

This can be done for each(2nR1 , 2nR2 , n) coordination codefor n = 1, 2, ....

C. Generalization

We investigate empirical coordination in a variety of net-works in Sections III and IV. In each case, we explicitlyspecify the structure and implementation of the coordinationcodes, similar to Definitions 1 and 2, while all other definitionscarry over in a straightforward manner.

We use ashorthandnotation in order to illustrate eachnetwork setting with a simple and consistent figure. Figure 4shows the shorthand notation for the cascade network of Figure3. The random actions that are specified by nature are shownwith arrows pointing down toward the node (represented bya block). Actions constructed by the nodes themselves areshown coming out of the node with an arrow downward. Andarrows indicating communication from one node to anotherare labeled with the rate limits for the communication alongthose links.

III. C OORDINATION—COMPLETE RESULTS

In this section we present the coordination capacity regionsCp0 for empirical coordination in four network settings: anetwork of two nodes; a cascade network; an isolated nodenetwork; and a degraded source network. Proofs are left toSection VII. As a consequence of Theorem 2 we need notuse common randomness. Common randomness will only berequired when we try to generate desired distributions overentire n-blocks in Section V.

A. Two nodes

In the simplest network setting shown in Figure 5, weconsider two nodes, X and Y. The actionX is specified bynature according top0(x), and a message is sent at rateR tonode Y.

The (2nR, n) coordination codes consist of an encodingfunction

i : Xn −→ 1, ..., 2nR,

Page 5: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

5

PSfrag replacements

X ∼ p0(x)

Y

RNode X Node Y

Fig. 5. Two nodes.The actionX is chosen by nature according top0(x).A message is sent to node Y at rateR. The coordination capacity regionCp0is the set of rate-coordination pairs where the rate is greater than the mutualinformation betweenX andY .

and a decoding function

yn : 1, ..., 2nR −→ Yn.

The actionsXn are chosen by nature i.i.d. according top0(x), and the actionsY n are functions ofXn given byimplementing the coordination code as

Y n = yn(i(Xn)).

Theorem 3 (Coordination capacity region). The coordinationcapacity regionCp0 for empirical coordination in the two-nodenetwork of Figure 5 is the set of rate-coordination pairs wherethe rate is greater than the mutual information betweenX andY . Thus,

Cp0 =

(R, p(y|x)) : R ≥ I(X ;Y )

.

Discussion:The coordination capacity region in this settingyields the rate-distortion result of Shannon [16]. Notice thatwith no communication (R = 0), only independent distribu-tions p0(x)p(y) are achievable, in contrast to the setting ofFigure 2, where none of the actions were specified by natureand all joint distributions were achievable.

Example 1 (Task assignment). Suppose there arek tasksnumbered1 throughk. One task is dealt randomly to node X,and node Y needs to choose one of the remaining tasks. Thiscoordinated behavior can be summarized by a distributionp. The actionX is given by nature according top0(x),the uniform distribution on the set1, ..., k. The desiredconditional distribution of the actionY is p(y|x), the uniformdistribution on the set of tasks different fromx. Therefore,the joint distributionp0(x)p(y|x) is the uniform distributionon pairs of differing tasks from the set1, ..., k. Figure 6illustrates a valid outcome fork larger than5.

PSfrag replacements

X YR

k ≥ 5 :

5 3

Fig. 6. Task assignment in the two-node network.A task from a set oftasks numbered1, ..., k is to be assigned uniquely to each of the nodes Xand Y in the two-node network setting. The task assignment for X is givenrandomly by nature. The communication rateR ≥ log(k/k−1) is necessaryand sufficient to allow Y to select a different task from X.

By applying Theorem 3, we find that the rate-coordinationregionRp0(p(y|x)) is given by

Rp0(p(y|x)) =

R : R ≥ log

(

k

k − 1

)

.

B. Isolated node

Now we derive the coordination capacity region for theisolated-node network of Figure 7. Node X has an actionchosen by nature according top0(x), and a message is sent atrateR from node X to node Y from which node Y producesan action. Node Z also produces an action but receives nocommunication. What is the set of all achievable coordinationdistributionsp(y, z|x)? At first it seems that the action at theisolated node Z must be independent ofY , but we will seeotherwise.

PSfrag replacements

X ∼ p0(x)

Y

Z

R

R2

Node X

Node Y

Node Z

Fig. 7. Isolated node.The actionX is chosen by nature according top0(x),and a message is sent at rateR from node X to node Y. Node Z receivesno communication. The coordination capacity regionCp0 is the set of rate-coordination pairs wherep(x, y, z) = p0(x)p(z)p(y|x, z) and the rateR isgreater than the conditional mutual information betweenX andY given Z.

We formalize this problem as follows. The(2nR, n) coor-dination codes consist of an encoding function

i : Xn −→ 1, ..., 2nR,a decoding function

yn : 1, ..., 2nR −→ Yn,

and a deterministic sequence

zn ∈ Zn.

The actionsXn are chosen by nature i.i.d. according top0(x), and the actionsY n are functions ofXn given byimplementing the coordination code as

Y n = yn(i(Xn)),

Zn = zn.

The coordination capacity region for this network is givenin the following theorem. As we previously alluded, noticethat the actionZ need not be independent ofY , even thoughthere is no communication to node Z.

Theorem 4 (Coordination capacity region). The coordinationcapacity regionCp0 for empirical coordination in the isolated-node network of Figure 7 is the set of rate-coordination pairs

Page 6: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

6

whereZ is independent ofX and the rateR is greater thanthe conditional mutual information betweenX and Y givenZ. Thus,

Cp0 =

(R, p(z)p(y|x, z)) : R ≥ I(X ;Y |Z)

.

Discussion:How canY and Z have a dependence whenthere is no communication between them? This dependenceis possible because neitherY nor Z is chosen randomly bynature. In an extreme case, we could let node Y ignore theincoming message from node X and let the actions at node Yand node Z be equal,Y = Z. Thus we can immediately seethat with no communication the coordination region consistsof all distributions of the formp0(x)p(y, z).

If we were to use common randomnessω to generatethe action sequenceZn(ω), then Node Y, which also hasaccess to the common randomness, can use it to producecorrelated actions. This does not increase the coordinationcapacity region (see Theorem 2), but it provides an intuitiveunderstanding of howY and Z can be correlated. Withoutexplicit use of common randomness, we select a deterministsequencezn before-hand as part of our codebook and makeit known to all parties.

It is interesting to note that there is a tension between thecorrelation ofX and Y and the correlation ofY and Z.For instance, if the communication is used to make perfectcorrelation betweenX and Y then any potential correlationbetweenY andZ is forfeited.

Within the results for the more general cascade network inthe sequel (Section III-C) we will find that Theorem 4 is animmediate consequence of Theorem 5 by lettingR2 = 0.

Example 2 (Jointly Gaussian). Jointly Gaussian distributionsillustrate the tradeoff between the correlation ofX andY andthe correlation ofY andZ in the isolated-node network. Con-sider the portion of the coordination-rate regionPp0(R) thatconsists of jointly Gaussian distributions. IfX is distributedaccording toN(0, σ2

X), what set of covariance matrices canbe achieved at rateR?

So far we have discussed coordination for distributionfunctions with finite alphabets. Extending to infinite alphabetdistributions, achievability means that any finite quantizationof the joint distribution is achievable.

Using Theorem 4, we bound the correlations as follows:

R ≥ I(X ;Y |Z)

= I(X ;Y, Z)

=1

2log

|Kx||Kyz||KXY Z |

(a)=

1

2log

σ2x(σ

2yσ

2z − σ2

yz)

σ2xσ

2yσ

2z − σ2

xσ2yz − σ2

zσ2xy

(b)=

1

2log

1−(

σyz

σyσz

)2

1−(

σyz

σyσz

)2

−(

σxy

σxσy

)2

=1

2log

1− ρ2yz1− ρ2yz − ρ2xy

, (1)

whereρxy and ρyz are correlation coefficients. Equality (a)holds becauseσxz = 0 due to the independence betweenX

and Z. Obtain equality (b) by dividing the numerator anddenominator of the argument of thelog by σ2

xσ2yσ

2z .

Unfolding (1) yields a linear tradeoff between theρ2xy andρ2yz, given by

(1− 2−2R)−1ρ2xy + ρ2yz ≤ 1.

Thus all correlation coefficientsρxy and ρyz satisfying thisconstraint are achievable at rateR.

C. Cascade

We now give the coordination capacity region for thecascade of communication in Figure 8. In this setting, theaction at node X is chosen by nature. A message at rateR1 issent from node X to node Y, and subsequently a message atrateR2 is sent from node Y to node Z based on the messagereceived from node X. Nodes Y and Z produce actions basedon the messages they receive.

PSfrag replacements

X ∼ p0(x)

Y Z

R1 R2Node X Node Y Node Z

Fig. 8. Cascade.The actionX is chosen by nature according top0(x). Amessage is sent from node X to node Y at rateR1. Node Y produces an actionY and a message to send to node Z based on the message received from nodeX. Node Z then produces an actionZ based on the message received fromnode Y. The coordination capacity regionCp0 is the set of rate-coordinationtriples where the rateR1 is greater than the mutual information betweenXand (Y,Z), and the rateR2 is greater than the mutual information betweenX andZ.

The formal statement is as follows. The(2nR1 , 2nR2 , n)coordination codes consist of four functions—an encodingfunction

i : Xn −→ 1, ..., 2nR1,a recoding function

j : 1, ..., 2nR1 −→ 1, ..., 2nR2,and two decoding functions

yn : 1, ..., 2nR1 −→ Yn,

zn : 1, ..., 2nR2 −→ Zn.

The actionsXn are chosen by nature i.i.d. according top0(x), and the actionsY n andZn are functions ofXn givenby implementing the coordination code as

Y n = yn(i(Xn)),

Zn = zn(j(i(Xn))).

This network was considered by Yamamoto [17] in thecontext of rate-distortion theory. The same optimal encodingscheme from his work achieves the coordination capacityregion as well.

Theorem 5 (Coordination capacity region). The coordinationcapacity regionCp0 for empirical coordination in the cascade

Page 7: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

7

network of Figure 8 is the set of rate-coordination tripleswhere the rateR1 is greater than the mutual informationbetweenX and (Y, Z), and the rateR2 is greater than themutual information betweenX andZ. Thus,

Cp0 =

(R1, R2, p(y, z|x)) :R1 ≥ I(X ;Y, Z),R2 ≥ I(X ;Z).

.

Discussion:The coordination capacity regionCp0 meets thecut-set bound. The trick to achieving this bound is to firstspecifyZ and then specifyY conditioned onZ.

Example 3 (Task assignment). Consider a task assignmentsetting where three tasks are to be assigned without dupli-cation to the three nodes X, Y, and Z, and the assignmentfor node X is chosen uniformly at random by nature. A dis-tribution capturing this coordination behavior is the uniformdistribution over the six permutations of task assignments. Letp0(x) be the uniform distribution on the set1, 2, 3, and letp(y, z|x) give equal probability to both of the assignments toY and Z that produce different tasks at the three nodes. Figure9 illustrates a valid outcome of the task assignments.

PSfrag replacements

X Y ZR1 R2

2 1 3

Fig. 9. Task assignment in the cascade network.Three tasks, numbered1, 2,and3, are distributed among three nodes X, Y, and Z in the cascade networksetting. The task assignment for X is given randomly by nature. The ratesR1 ≥ log 3 andR2 ≥ log 3− log 2 are required to allow Y and Z to choosedifferent tasks from X and from each other.

According to Theorem 5, the rate-coordination regionRp0(p(y, z|x)) is given by

Rp0(p(y, z|x)) =

(R1, R2) :R1 ≥ log 3,R2 ≥ log 3− log 2.

.

D. Degraded source

Here we present the coordination capacity region for thedegraded-source network shown in Figure 10. Nodes X and Yeach have an action specified by nature, andY is a functionof X . That is,p0(x, y) = p0(x)1(y = f0(x)), where1(·) isthe indicator function. Node X sends a message to node Y atrateR1 and a message to node Z at rateR2. Node Y, uponreceiving the message from node X, sends a message at rateR3 to node Z. Node Z produces an action based on the twomessages it receives.

The(2nR1 , 2nR2 , 2nR3 , n) coordination codes for Figure 10consist of four functions—two encoding functions

i : Xn −→ 1, ..., 2nR1,j : Xn −→ 1, ..., 2nR2,

a recoding function

k : 1, ..., 2nR1 × Yn −→ 1, ..., 2nR3,and a decoding function

zn : 1, ..., 2nR2 × 1, ..., 2nR3 −→ Yn.

PSfrag replacements

X ∼ p0(x)

Y = f0(X)

Z

R1R3

R2Node X

Node Y

Node Z

Fig. 10. Degraded source:The actionX is specified by nature according top0(x), and the actionY is a functionf0 of X. A message is sent from nodeX to node Y at rateR1, after which node Y constructs a message for nodeZ at rateR3 based on the incoming message from node X and the actionY .Node X also sends a message directly to node Z at rateR2. The coordinationcapacity regionCp0 is given in Theorem 6.

The actionsXn andY n are chosen by nature i.i.d. accordingto p0(x, y), having the property thatYi = f0(Xi) for all i,and the actionsZn are a function ofXn and Y n given byimplementing the coordination code as

Y n = yn(j(Xn), k(i(Xn), Y n)).

Others have investigated source coding networks in the rate-distortion context where two sources are encoded at separatenodes to be reconstructed at a third node. Kaspi and Berger[18] consider a variety of cases where the encoders sharesome information. Also, Barros and Servetto [19] articulatethe compress and bin strategy for more general bi-directionalexchanges of information among the encoders. While fallingunder the same general compression strategy, the degradedsource network is a special case where optimality can beestablished, yielding a characterization of the coordinationcapacity region.

Theorem 6 (Coordination capacity region). The coordina-tion capacity regionCp0 for empirical coordination in thedegraded-source network of Figure 10 is given by

Cp0 =

(R1, R2, R3, p(z|x, y)) :

∃p(u|x, y, z) such that|U| ≤ |X ||Z|+ 2,R1 ≥ I(X ;U |Y ),R2 ≥ I(X ;Z|U),R3 ≥ I(X ;U).

.

IV. COORDINATION—PARTIAL RESULTS

We have given the coordination capacity region for severalmultinode networks. Those results are complete. We nowinvestigate networks for which we have only partial results.

In this section we present bounds on the coordinationcapacity regionsCp0 for empirical coordination in two net-work settings of three nodes—the broadcast network and thecascade-multiterminal network. A communication techniquethat we find useful in both settings, also used in the degraded-source network of Section III, is to use a portion of thecommunication to send identical messages to all nodes inthe network. The common message serves to correlate thecodebooks used on different communication links and canresult in reduced rates in the network.

Page 8: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

8

Proofs are left to Section VII. Again, as a consequenceof Theorem 2 we need not use common randomness in thissection.

A. Broadcast

We now give bounds on the coordination capacity region forthe broadcast network of Figure 11. In this setting, node X hasan action specified by nature according top0(x) and sends onemessage to node Y at rateR1 and a separate message to nodeZ at rateR2. Nodes Y and Z each produce an action basedon the message they receive.

PSfrag replacements

X ∼ p0(x)

Y

Z

R1

R2

Node X

Node Y

Node Z

Fig. 11. Broadcast.The actionX is chosen by nature according top0(x). Amessage is sent from node X to node Y at rateR1, and a separate message issent from node X to node Z at rateR2. Nodes Y and Z produce actions basedon the messages they receive. Bounds on the coordination capacity regionCp0are given in Theorem 7.

Node X serves as the controller for the network. Natureassigns an action to node X, which then tells node Y andnode Z which actions to take.

The (2nR1 , 2nR2 , n) coordination codes consist of two en-coding functions

i : Xn −→ 1, ..., 2nR1,j : Xn −→ 1, ..., 2nR2,

and two decoding functions

yn : 1, ..., 2nR1 −→ Yn.

zn : 1, ..., 2nR2 −→ Zn.

The actionsXn are chosen by nature i.i.d. according top0(x), and the actionsY n andZn are functions ofXn givenby implementing the coordination code as

Y n = yn(i(Xn)).

Zn = zn(j(Xn)).

From a rate-distortion point of view, the broadcast networkis not a likely candidate for consideration. The problem sep-arates into two non-interfering rate-distortion problems, andthe relationship between the sequencesY n andZn is ignored(unless the decoders communicate as in [20]). However, arelated scenario, the problem of multiple descriptions [21],where the combination of two messagesI andJ are used tomake a third estimate of the sourceX , demands considerationof the relationship between the two messages. In fact, the

communication scheme for the multiple descriptions problempresented by Zhang and Berger [22] coincides with our innerbound for the coordination capacity region in the broadcastnetwork.

The set of rate-coordination tuplesCp0,in is an inner boundon the coordination capacity region, given by

Cp0,in ,

(R1, R2, p(y, z|x)) : ∃p(u|x, y, z) such thatR1 ≥ I(X ;U, Y ),R2 ≥ I(X ;U,Z),R1 +R2 ≥ I(X ;U, Y ) + I(X ;U,Z) + I(Y ;Z|X,U).

.

The set of rate-coordination tuplesCp0,out is an outer boundon the coordination capacity region, given by

Cp0,out ,

(R1, R2, p(y, z|x)) :R1 ≥ I(X ;Y ),R2 ≥ I(X ;Z),R1 +R2 ≥ I(X ;Y, Z).

.

Also, defineRp0,in(p(y, z|x)) and Rp0,out(p(y, z|x)) to bethe sets of rate pairs inCp0,in and Cp0,out corresponding tothe desired distributionp(y, z|x).Theorem 7 (Coordination capacity region bounds). The co-ordination capacity regionCp0 for empirical coordination inthe broadcast network of Figure 11 is bounded by

Cp0,in ⊂ Cp0 ⊂ Cp0,out.

Discussion:The regionsCp0,in and Cp0,out are convex. Atime-sharing random variable can be lumped into the auxil-iary random variableU in the definition ofCp0,in to showconvexity.

The inner boundCp0,in is achieved by first sending acommon message, represented byU , to both receivers and thenprivate messages to each. The common message effectivelycorrelates the two codebooks to reduce the required rates forspecifying the actionsY n and Zn. The sum rate takes apenalty ofI(Y ;Z|X,U) in order to assure thatY andZ arecoordinated with each other as well as withX .

The outer boundCp0,out is a consequence of applying thetwo-node result of Theorem 3 in three different ways, oncefor each receiver, and once for the pair of receivers with fullcooperation.

For many distributions, the bounds in Theorem 7 are tightand the rate-coordination regionRp0 = Rp0,in = Rp0,out.This is true for all distributions whereX , Y , and Z forma Markov chain in any order. It is also true for distributionswhereY andZ are independent or whereX is independentpairwise with bothY andZ. For each of these cases, TableI shows the choice of auxiliary random variableU in thedefinition of Rp0,in that yieldsRp0,in = Rp0,out. In case5, the regionRp0,in is optimized by time-sharing betweenU = Y andU = Z.

Notice that if R2 = 0 in the broadcast network we findourselves in the isolated node setting of Section III-B. Con-sider a particular distributionp0(x)p(z)p(y|x, z) that couldbe achieved in the isolated node network. In the setting of the

Page 9: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

9

TABLE IKNOWN CAPACITY REGION(CASES WHERERp0,in = Rp0,out).

Condition AuxiliaryCase 1: Y −X − Z U = ∅Case 2: X − Y − Z U = ZCase 3: X − Z − Y U = YCase 4: Y ⊥ Z U = ∅Case 5: X ⊥ Y andX ⊥ Z U = Y, U = Z

broadcast network, it might seem that the message from nodeX to node Z is useless for achievingp0(x)p(z)p(y|x, z), sinceX andZ are independent. However, this is not the case. Forsome desired distributionsp0(x)p(z)p(y|x, z), a positive rateR2 in the broadcast network actually helps reduce the requiredrateR1.

To highlight a specific case where a message to node Zis useful even thoughZ is independent ofX in the desireddistribution, consider the following. Letp0(x)p(z)p(y|x, z) bethe uniform distribution over all combinations of binaryx,y, and z with even parity. The variablesX , Y , and Z areeach Bernoulli-half and pairwise independent, andX ⊕ Y ⊕Z = 0, where⊕ is addition modulo two. This distributionsatisfies both case 4 and case 5 from Table I, so we knowthat Rp0

= Rp0,out. Therefore, the rate-coordination regionRp0

(p(y, z|x)) is characterized by a single inequality,

Rp0(p(y, z|x)) = (R1, R2) ∈ R

2+ : R1 +R2 ≥ 1 bit.

The minimum rateR1 needed when no message is sent fromnode X to node Z is 1 bit, while the required rate in generalis 1−R2 bits.

The following task assignment problem has practical im-portance.

Example 4 (Task assignment). Consider a task assignmentsetting similar to Example 3, where three tasks are to beassigned without duplication to the three nodes X, Y, and Z,and the assignment for node X is chosen uniformly at randomby nature. A distribution capturing this coordination behavioris the uniform distribution over the six permutations of taskassignments. Letp0(x) be the uniform distribution on the set0, 1, 2, and let p(y, z|x) give equal probability to both ofthe assignments to Y and Z that produce different tasks at thethree nodes. Figure 12 illustrates a valid outcome of the taskassignments.

PSfrag replacements

X

Y

Z

R1

R2

1

0

2

Fig. 12. Task assignment in the broadcast network.Three tasks, numbered0, 1, and2, are distributed among three nodes X, Y, and Z in the broadcastnetwork setting. The task assignment for X is given randomlyby nature. WhatratesR1 andR2 are necessary to allow Y and Z to choose different tasksfrom X and each other?

We can explore the achievable rate regionRp0(p(y, z|x))by using the bounds in Theorem 7. In this process, we find

rates as low aslog 3 − logφ to be sufficient on each link,whereφ =

√5+12 is the golden ratio.

PSfrag replacementsA

B

C

D

R1

R2

1

1

Fig. 13. Rate region bounds for task assignment.PointsA, B, C, andD areachievable rates for the task assignment problem in the broadcast network.The solid line indicates the outer boundRp0,out(p(y, z|x)), and the dashedline indicates a subset of the inner boundRp0,in(p(y, z|x)). PointsA andBare achieved by lettingU = ∅. PointC usesU as time-sharing, independentof X. PointD usesU to describeX partially to each of the nodes Y and Z.

First consider the points in the inner boundRp0,in(p(y, z|x)) that are achieved without the use ofthe auxiliary variableU . This consists of a pentagonal regionof rate pairs. The extreme pointA = (log(3/2), log 3), shownin Figure 13, corresponds to the a simple communicationapproach. First node X coordinates with node Y. Theorem 3for the two-node network declares the minimum rate neededto be R1 = log(3/2). After actionY has been established,node X specifies actionZ in it’s entire detail using the rateR2 = log 3. A complementary scheme achieves the extremepoint B in Figure 13. The sum rate achieved by these pointsis R1 +R2 = 2(log2 3− 1/2) bits.

We can explore more of the inner boundRp0,in(p(y, z|x))by adding the element of time-sharing. That is, use an auxiliaryvariableU that is independent ofX . As long as we can assigntasks in the network so thatX , Y , andZ are each unique, thenthere will be a method of using time-sharing that will achievethe desired uniform distribution over unique task assignmentsp. For example, devise six task assignment schemes from theone successful scheme by mapping the tasks onto the sixdifferent permutations of0, 1, 2. By time-sharing equallyamong these six schemes, we achieve the desired distribution.

With the idea of time-sharing in mind, we achieve a bettersum rate by restricting the domain ofY to 0, 1 and Z to0, 2 and letting them be functions ofX in the followingway:

Y =

1, X 6= 1,0, X = 1,

(2)

Z =

2, X 6= 2,0, X = 2.

(3)

We can say thatY takes on a default value of1, andZ takeson a default value of2. Node X just tells nodes Y and Z whenthey need to get out of the way, in which case they switch totask0. To achieve this we only needR1 ≥ H(Y ) = log3 −2/3bits andR2 ≥ H(Z) = log2 3−2/3 bits, represented by pointC in Figure 13.

Page 10: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

10

Finally, we achieve an even smaller sum rate in the innerboundRp0,in(p(y, z|x)) by using a more interesting choice ofU in addition to time-sharing.1 LetU ∈ 0, 1, 2 be correlatedwith X in such a way that they are equal more often than onethird of the time. Now restrict the domains ofY andZ basedonU . The actionsY andZ are functions ofX andU definedas follows:

Y =

U + 1 mod 3, X 6= U + 1 mod 3,U, X = U + 1 mod 3,

(4)

Z =

U − 1 mod 3, X 6= U − 1 mod 3,U, X = U − 1 mod 3.

(5)

This corresponds to sending a compressed description ofX ,represented byU , and then assigning default values toY andZ centered aroundU . The actionsY andZ sit on both sidesof U and only move when X tells them to get out of the way.The description rates needed for this method are

R1 ≥ I(X ;U) + I(X ;Y |U)

= I(X ;U) +H(Y |U).

R2 ≥ I(X ;U) + I(X ;Z|U)

= I(X ;U) +H(Z|U). (6)

Using a symmetric conditional distribution fromX to U ,calculus provides the following parameters:

P (U = u|X = x) =

1√5, u = x,

1φ√5, u 6= x,

(7)

(8)

whereφ =√5+12 is the golden ratio. This level of compression

results in a very low rate of description,I(X ;U) ≈ 0.04 bits,for sendingU to each of the nodes Y and Z.

The description rates needed for this method are as follows,and are represented by Point D in Figure 13:

R1 ≥ I(X ;U) +H(Y |U)

= log 3− 1

2log 5− 2

φ√5logφ+H(Y |U)

= log 3− 1

2log 5− 2

φ√5logφ+H

(

1

φ√5

)

= log 3− 2

φ√5logφ+

1

φ√5logφ− φ√

5log φ

= log 3−(

φ+1

φ

)

1√5logφ

= log 3− logφ,

R2 ≥ log 3− logφ, (9)

whereH is the binary entropy function. The above calculationis assisted by observing thatφ = 1

φ + 1 andφ+ 1φ =

√5.

B. Cascade multiterminal

We now give bounds on the coordination capacity region forthe cascade-multiterminal network of Figure 14. In this setting,node X and node Y each have an action specified by nature

1Time-sharing is also lumped intoU , but we ignore that here to simplifythe explanation.

according to the joint distributionp0(x, y). Node X sends amessage at rateR1 to node Y. Based on its own actionY andthe incoming message aboutX , node Y sends a message tonode Z at rateR2. Finally, node Z produces an action basedon the message from node Y.

PSfrag replacements

X ∼ p0(x, y) Y ∼ p0(x, y)

Z

R1 R2Node X Node Y Node Z

Fig. 14. Cascade multiterminal.The actionsX andY are chosen by natureaccording top0(x, y). A message is sent from node X to node Y at rateR1.Node Y then constructs a message for node Z based on the received messagefrom node X and its own action. Node Z produces an action basedon themessage it receives from node Y. Bounds on the coordination capacity regionCp0 are given in Theorem 8.

The (2nR1 , 2nR2 , n) coordination codes consist of an en-coding function

i : Xn −→ 1, ..., 2nR1,a recoding function

j : 1, ..., 2nR1 × Yn −→ 1, ..., 2nR2,and a decoding function

zn : 1, ..., 2nR2 −→ Zn.

The actionsXn andY n are chosen by nature i.i.d. accordingto p0(x, y), and the actionsZn are functions ofXn andY n

given by implementing the coordination code as

Zn = zn(j(i(Xn), Y n)).

Node Y is playing two roles in this network. It acts partiallyas a relay to send on the message from node X to node Z, whileat the same time sending a message about its own actions tonode Z. This situation applies to a variety of source codingscenarios. Nodes X and Y might both be sensors in a sensornetwork, or node Y can be thought of as a relay for connectingnode X to node Z, with side informationY .

This network is similar to multiterminal source codingconsidered by Berger and Tung [23] in that two sourcesof information are encoded in a distributed fashion. In fact,the expansion to accommodate cooperative encoders [18] canbe thought of as a generalization of our network. However,previous work along these lines is missing one key aspect ofefficiency, which is to partially relay the encoded informationwithout changing it.

Vasudevan, Tian, and Diggavi [24] looked at a similarcascade communication system with a relay. In their setting,the relay’s informationY is a degraded version of the de-coder’s side information, and the decoder is only interestedin recoveringX . Because the relay’s observations contain noadditional information for the decoder, the relay does not facethe dilemma of mixing in some of the side information intoits outgoing message. In our cascade multiterminal network,the decoder does not have side information. Thus, the relay is

Page 11: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

11

faced with coalescing the two pieces of informationX andY into a single message. Other research involving similarnetwork settings can be found in [25], where Gu and Effrosconsider a more general network but with the restriction thatthe actionY is a function of the actionX , and [26], whereBakshi et. al. identify the optimal rate region for losslessencoding of independent sources in a longer cascade (line)network.

The set of rate-coordination tuplesCp0,in is an inner boundon the coordination capacity region, given by

Cp0,in ,

(R1, R2, p(z|x, y)) :∃p(u, v|x, y, z) such thatp(x, y, z, u, v) = p0(x, y)p(u, v|x)p(z|y, u, v)R1 ≥ I(X ;U, V |Y ),R2 ≥ I(X ;U) + I(Y, V ;Z|U).

.

The set of rate-coordination tuplesCp0,out is an outer boundon the coordination capacity region, given by

Cp0,out ,

(R1, R2, p(z|x, y)) :∃p(u|x, y, z) such thatp(x, y, z, u) = p0(x, y)p(u|x)p(z|y, u)|U| ≤ |X ||Y||Z|,R1 ≥ I(X ;U |Y ),R2 ≥ I(X,Y ;Z).

.

Also, defineRp0,in(p(z|x, y)) and Rp0,out(p(z|x, y)) to bethe sets of rate pairs inCp0,in and Cp0,out corresponding tothe desired distributionp(z|x, y).Theorem 8 (Coordination capacity region bounds). The coor-dination capacity regionCp0 for empirical coordination in thecascade multiterminal network of Figure 14 is bounded by

Cp0,in ⊂ Cp0 ⊂ Cp0,out.

Discussion:The regionsCp0,in and Cp0,out are convex. Atime-sharing random variable can be lumped into the auxil-iary random variableU in the definition ofCp0,in to showconvexity.

The inner boundCp0,in is achieved by dividing the messagefrom node X into two parts. One part, represented byU , issent to all nodes, relayed by node Y to node Z. The otherpart, represented byV , is sent only to node Y. Then node YrecompressesV along withY .

The outer boundCp0,out is a combination of the Wyner-Ziv [27] bound for source coding with side information atthe decoder, obtained by letting node Y and node Z fullycooperate, and the two-node bound of Theorem 3, obtainedby letting node X and node Y fully cooperate.

For some distributions, the bounds in Theorem 8 are tightand the rate-coordination regionRp0 = Rp0,in = Rp0,out.This is true for all distributions whereX − Y − Z form aMorkov chain orY − X − Z form a Markov chain. In thefirst case, whereX − Y − Z form a Morkov chain, choosingU = V = ∅ in the definition ofCp0,in reduces the regionto all rate pairs such thatR2 ≥ I(Y ;Z), which meets theouter boundCp0,out. In the second case, whereY − X − Z

form a Morkov chain, choosingU = Z andV = ∅ reducesthe region to all rate pairs such thatR1 ≥ I(X ;Z|Y ) andR2 ≥ I(X ;Z), which meets the outer bound. Therefore, wefind as special cases that the bounds in Theorem 8 are tightif X is a function ofY , if Y is a function ofX , or if thereconstructionZ is a function ofX andY [28].

Table II shows choices ofU andV from Rp0,in that yieldRp0,in = Rp0,out in each of the above cases. In case 3,V isselected to minimizeR1 along the lines of [29].

TABLE IIKNOWN CAPACITY REGION(CASES WHERERp0,in = Rp0,out).

Condition AuxiliaryCase 1: X − Y − Z U = ∅, V = ∅Case 2: Y −X − Z U = Z, V = ∅Case 3: Z = f(X, Y ) U = ∅

Example 5 (Task assignment). Consider again a task as-signment setting similar to Example 3, where three tasks areto be assigned without duplication to the three nodes X, Y,and Z, and the assignments for nodes X and Y are chosenuniformly at random by nature among all pairs of tasks whereX 6= Y . A distribution capturing this coordination behavioris the uniform distribution over the six permutations of taskassignments. Letp0(x, y) be the distributions obtained bysamplingX andY uniformly at random from the set1, 2, 3without replacement, and letp(z|x, y) be the degeneratedistribution whereZ is the remaining unassigned task in1, 2, 3. Figure 15 illustrates a valid outcome of the taskassignments.

PSfrag replacements

X Y ZR1 R2

3 1 2

Fig. 15. Task assignment in the cascade multiterminal network.Three tasks,numbered1, 2, and3, are distributed among three nodes X, Y, and Z in thecascade multiterminal network setting. The task assignments for X and Y aregiven randomly by nature but different from each other. WhatratesR1 andR2 are necessary to allow Z to choose a different task from both Xand Y?

Task assignment in the cascade multiterminal networkamounts to computing a functionZ(X,Y ), and the boundsin Theorem 8 are tight in such cases. The rate-coordinationregionRp0(p(z|x, y)) is given by

Rp0(p(z|x, y)) =

(R1, R2) :R1 ≥ log 2,R2 ≥ log 3.

.

This is achieved by lettingU = ∅ andV = X in the definitionof Cp0,in. To show that this region meets the outer boundCp0,out, make the observation thatI(X ;U |Y ) ≥ I(X ;Z|Y )in relation to the bound onR1, sinceX − (Y, U)− Z formsa Markov chain.

V. STRONG COORDINATION

So far we have examined coordination where the goal is togenerateY n through communication based onXn so that thejoint type PXn,Y n(x, y) is equal to the desired distribution

Page 12: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

12

p0(x)p(y|x). This goal relates to the joint behavior at thenodes in the network averaged over time. There is no imposedrequirement thatY n be random, and the order of the sequenceof the (Xi, Yi) pairs doesn’t matter.

How different does the problem become if we actuallywant the actions at the various nodes in the network tobe random according to a desired joint distribution? In thisvein, we turn to a stronger notion of cooperation whichwe call strong coordination. We require that the induceddistribution over the entire coding blockp(xn, yn) (inducedby the coordination code) be close to the target distributionp(xn, yn) =

∏ni=1 p0(xi)p(yi|xi)—so close that a statistician

could not tell the difference, based on(Xn, Y n), of whether(Xn, Y n) ∼ p(xn, yn) or (Xn, Y n) ∼ p(xn, yn).

Clearly this new strong coordination objective is moredemanding than empirical coordination—after all, if one wereto generate random actions, i.i.d. in time, according to theappropriate joint distribution, then the empirical distributionwould also follow suit. But in some settings it is crucialfor the coordinated behavior to be random. For example,in situations where an adversary is involved, it might beimportant to maintain a mystery in the sequence of actionsthat are generated in the network.

Strong coordination has applications in cooperative gametheory, discussed in [30]. Suppose a team shares the samepayoff in a repeated game setting. An opponent who tries toanticipate and exploit patterns in the team’s combined actionswill be adequately combatted by strong coordination accordingto a well-chosen joint distribution.

A. Problem specifics

Most of the definitions relating to empirical coordinationin Section II-A carry over to strong coordination, includingthe notions of coordination codes and induced distributions.However, in the context of strong coordination, achievabilityhas nothing to do with the joint type. Here we define strongachievability to mean that the distribution of the time-sequenceof actions in the network is close in total variation to thedesired joint distribution, i.i.d. in time. We discuss the strongcoordination capacity regionCp0

, like the region of Definition6, but instead defined by this notion of strong achievability.

Definition 9 (Strong achievability). A desired distributionp(x, y, z) is strongly achievable if there exists a sequenceof (non-deterministic) coordination codes such that the totalvariation between the induced distributionp(xn, yn, zn) andthe i.i.d. desired distribution goes to zero. That is,

p(xn, yn, zn)−n∏

i=1

p(xi, yi, zi)

TV

−→ 0.

A non-deterministic coordination code is a deterministiccode that utilizes an extra argument for each encoder anddecoder which is a random variable independent of all theother variables and actions. It seems quite reasonable to allowthe encoders and decoders to use private randomness duringthe implementation of the coordination code. This allowancewould have also been extended to the empirical coordination

framework of sections II, III, and IV; however, randomizedencoding and decoding is not beneficial in that frameworkbecause the objective has nothing to do with producing randomactions (appropriately distributed). This claim is similar toTheorem 2. Thus, non-deterministic coordination codes do notimprove the empirical coordination capacity over deterministiccoordination codes.

Common randomness plays a crucial role in achievingstrong coordination. For instance, in a network with no com-munication, only independent actions can be generated ateach node without common randomness, but actions can begenerated according to any desired joint distribution if enoughcommon randomness is available, as is illustrated in Figure2of Section I. In addition, for each desired joint distribution wecan identify a specific bit-rate of common randomness thatmust be available to the nodes in the network. This motivatesus to deal with common randomness more precisely.

Aside from the communication in the network, we allowcommon randomness to be supplied to each node. However,to quantify the amount of common randomness, we limit it toa rate ofR0 bits per action. For ann-block coordination code,ω is uniformly distributed on the setΩ = 1, ..., 2nR0. In thisway, common randomness is viewed as a resource alongsidecommunication.

B. Preliminary observations

The strong coordination capacity regionCp0is not convex

in general. This becomes immediately apparent when we con-sider a network with no communication and without any com-mon randomness. An arbitrary joint distribution is not stronglyachievable without communication or common randomness,but any extreme point in the probability simplex correspondsto a degenerate distribution that is trivially achievable.Thuswe see that convex combinations of achievable points inthe strong coordination capacity region are not necessarilystrongly achievable, and cannot be achieved through simpletime-sharing as was done for empirical coordination.

We use total variation as a measurement of fidelity forthe distribution of the actions in the network. This has anumber of implications. If two distributions have a small totalvariation between them, then a hypothesis test cannot reliablytell them apart. Additionally, the expected value of a boundedfunction of these random variables cannot differ by much.Steinberg and Verdu, for example, also use total variationas one of a handful of fidelity criteria when considering thesimulation of random variables in [31]. On the other hand,Wyner used normalized relative entropy as his measurement oferror for generating random variables in [9]. Neither quantity,total variation or normalized relative entropy, is dominated bythe other in general (because of the normalization). However,relative entropy would give infinite penalty if the support ofthe block-distribution of actions is not contained in the supportof the desired joint distribution. We find cases where the ratesrequired under the constraint of normalized relative entropygoing to zero are unpleasantly high. For instance, losslesssource coding would truly have to be lossless, with zero error.

Based on the success of random codebooks in informationtheory and source coding in particular, it seems hopeful that we

Page 13: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

13

might always be able to use common randomness to augmenta coordination code intended for empirical coordination toresult in a randomized coordination code that achieves strongcoordination. Bennett et. al. demonstrate this principle forthe two-node setting with their reverse Shannon theorem[15]. They use common randomness to generate a randomcodebook. Then the encoder synthesizes a memoryless channeland finds a sequence in the codebook with the same jointtype as the synthesized output. Will methods like this work inother network coordination settings as well? The followingconjecture makes this statement precise and is consistentwith both networks considered for strong coordination in thissection of the paper.

Conjecture 1 (Strong meets empirical coordination). Withenough common randomness, for instance ifω ∼ Unif[0, 1],the strong coordination capacity region is the same as the em-pirical coordination capacity region for any specific networksetting . That is,

With unlimited common randomness: Cp0= Cp0 .

If Conjecture 1 is true, then results regarding empirical co-ordination should influence strong coordination schemes, andstrong coordination capacity regions will reduce to empiricalcoordination capacity regions under the appropriate limit.

C. No communication

Here we characterize the strong coordination capacity regionC for the no communication network of Figure 16. A collectionof nodes X, Y, and Z generate actions according to thejoint distribution p(x, y, z) using only common randomness(and private randomization). The strong coordination capacityregion characterizes the set of joint distributions that can beachieved with common randomness at a rate ofR0 bits peraction.

PSfrag replacements

X

Y

Z

|Ω| = 2nR0

Fig. 16. No communication.Three nodes generate actionsX, Y , andZ according top(x, y, z) without communication. The rate of commonrandomness needed is characterized in Theorem 9.

Wyner considered a two-node setting in [9], where cor-related random variables are constructed based on commonrandomness. He found the amount of common randomnessneeded and named the quantity “common information.” Herewe extend that result to three nodes, and the conclusion forany number of nodes is immediately apparent.

The n-block coordination codes consist of three non-deterministic decoding functions,

xn : 1, ..., 2nR0 −→ Xn,

yn : 1, ..., 2nR0 −→ Yn,

zn : 1, ..., 2nR0 −→ Zn.

Each function can use private randomization to probabilisti-cally map the common random bitsω to action sequences.That is, the functionsxn(ω), yn(ω), and zn(ω) behave ac-cording to conditional probability mass functionsp(xn|ω),p(yn|ω), andp(zn|ω).

The rate region given in Theorem 9 can be generalized toany number of nodes.

Theorem 9 (Strong coordination capacity region). The strongcoordination capacity regionC for the no communicationnetwork of Figure 16 is given by

C =

p(x, y, z) : ∃p(u|x, y, z) such thatp(x, y, z, u) = p(u)p(x|u)p(y|u)p(z|u)|U| ≤ |X ||Y||Z|,R0 ≥ I(X,Y, Z;U).

.

Discussion:The proof of Theorem 9, sketched in SectionVII, follows nearly the same steps as Wyner’s commoninformation proof. This generalization can be interpretedasa proposed measurement of common information between agroup of random variables. Namely, the amount of commonrandomness needed to generate a collection of random vari-ables at isolated nodes is the amount of common informationbetween them. However, it would also be interesting to con-sider a richer problem by allowing each subset of nodes to havean independent common random variable and investigating allof the rates involved.

Example 6 (Task assignment). Suppose there are tasks num-bered1, ..., k, and three of them are to be assigned randomlyto the three nodes X, Y, and Z without duplication. Thatis, the desired distributionp(x, y, z) for the three actions inthe network is the distribution obtained by samplingX , Y ,and Z uniformly at random from the set1, ..., k withoutreplacement. The three nodes do not communicate but haveaccess to common randomness at a rate ofR0 bits peraction. We want to determine the infimum of ratesR0 requiredto strongly achievep(x, y, z). Figure 17 illustrates a validoutcome of the task assignments.

PSfrag replacements

X

Z

Y

k ≥ 6 :

3

2

6

|Ω| = 2nR0

Fig. 17. Random task assignment with no communication.A task from a setof tasks numbered1, ..., k is to be assigned randomly but uniquely to eachof the nodes X, Y, and Z without any communication between them. The rateof common randomness needed to accomplish this is roughlyR0 ≥ 3 log 3for largek.

Theorem 9 tells us which values ofR0 will result inp(x, y, z) ∈ C. We must optimize over distributions of anauxiliary random variableU . Two things come in to playto make this optimization manageable: The variablesX , Y ,

Page 14: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

14

and Z are all conditionally independent givenU ; and thedistribution p has sparsity. For any particular value ofU ,the conditional supports ofX , Y , and Z must be disjoint.Therefore,

I(X,Y, Z;U) = H(X,Y, Z)−H(X,Y, Z|U)

= H(X,Y, Z)−E [H(X,Y, Z|U = u)]

≥ H(X,Y, Z)−E [log(k1,Uk2,Uk3,U )] ,

wherek1,U , k2,U , andk3,U are integers that sum tok for allU . Therefore, we maximizelog(k1,Uk2,Uk3,U ) by letting thethree integers be as close to equal as possible. Furthermore,it is straightforward to find a joint distribution that meetsthisinequality with equality.

If k, the number of tasks, is divisible by three, then we seethat p(x, y, z) ∈ C for values ofR0 > 3 log 3 − log( k

k−1 ) −log( k

k−2 ). No matter how largek is, the required rate neverexceedsR0 > 3 log 3.

D. Two nodes

We can revisit the two-node network from Section III-A andask what communication rate is needed for strong coordina-tion. In this network the action at node X is specified by natureaccording top0(x), and a message is sent from node X to nodeY at rateR. Common randomness is also available to bothnodes at rateR0. The common randomness is independent ofthe actionX .

PSfrag replacementsX ∼ p0(x)

Y

RNode X Node Y |Ω| = 2nR0

Fig. 18. Two nodes.The action at node X is specified by nature accordingto p0(x), and a message is sent from node X to node Y at rateR.Common randomness is also available to both nodes at rateR0. The commonrandomness is independent of the actionX. The strong coordination capacityregionC depends on the amount of common randomness available. With nocommon randomness,C contains all rate-coordination pairs where the rateis greater than the common information betweenX and Y . With enoughcommon randomness,C contains all rate-coordination pairs where the rate isgreater than the mutual information betweenX andY .

The ratesR0 and R required for strong coordination inthe two-node network are characterized in [30] and wereindependently discovered by Bennett et. al. [32] in the contextof synthesizing a memoryless channel. Here we take particularnote of the two extremes: what is the strong coordinationcapacity region when no common randomness is present, andhow much common randomness is enough to maximize thestrong coordination capacity region?

The (2nR, n) coordination codes consist of a non-deterministic encoding function,

i : Xn × 1, ..., 2nR0 −→ 1, ..., 2nR.

and a non-deterministic decoding function,

yn : 1, ..., 2nR × 1, ..., 2nR0 −→ Yn.

Both functions can use private randomization to probabilis-tically map the arguments onto the range of the function.That is, the encoding functioni(xn, ω) behaves according toa conditional probability mass functionp(i|xn, ω), and thedecoding functionyn(i, ω) behaves according to a conditionalprobability mass functionp(yn|i, ω).

The actionsXn are chosen by nature i.i.d. according top0(x), and the actionsY n are constructed by implementingthe non-deterministic coordination code as

Y n = yn(i(Xn, ω), ω).

Let us define two quantities before stating the result. Thefirst is Wyner’s common informationC(X ;Y ) [9], whichturns out to be the communication rate requirement for strongcoordination in the two-node network when no commonrandomness is available:

C(X ;Y ) , minU : X−U−Y

I(X,Y ;U),

where the notationX − U − Y represents a Markov chainfrom X to U to Y . The second quantity we callnecessaryconditional entropyH(Y †X), which we will show to bethe amount of common randomness needed to maximize thestrong coordination capacity region in the two-node network:

H(Y †X) , minf : X−f(Y )−Y

H(f(Y )|X).

Theorem 10 (Strong coordination capacity region). Withno common randomness,R0 = 0, the strong coordinationcapacity regionCp0

for the two-node network of Figure 18is given by

Cp0= (R, p(y|x)) : R ≥ C(X ;Y ) .

On the other hand, if and only if the rate of common ran-domness is greater than the necessary conditional entropy,R0 ≥ H(Y †X), the strong coordination capacity regionCp0

for the two-node network of Figure 18 is given by

Cp0= (R, p(y|x)) : R ≥ I(X ;Y ) .

Discussion:The proof of Theorem 10, found in Section VII,is an application of Theorem 3.1 in [30]. This theorem is con-sistent with Conjecture 1—with enough common randomness,the strong coordination capacity regionCp0

is the same as thecoordination capacity regionCp0 found in Section III-A.

For many joint distributions, the necessary conditional en-tropy H(Y †X) will simply equal the conditional entropyH(Y |X).

Example 7 (Task assignment). Consider again a task assign-ment setting similar to Example 6, where tasks are numbered1, ..., k and are to be assigned randomly to the two nodesXandY without duplication. The actionX is supplied by nature,uniformly at random (p0(x)), and the desired distributionp(y|x) for the actionY is the uniform distribution over alltasks not equal toX . Figure 19 illustrates a valid outcome ofthe task assignments.

Page 15: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

15

PSfrag replacements

X YR

k ≥ 7 :

2 7

|Ω| = 2nR0

Fig. 19. Task assignment in the two-node network.A task from a set oftasks numbered1, ..., k is to be assigned randomly but uniquely to each ofthe nodes X and Y in the two-node network. The task assignmentfor Xis given by nature. Common randomness at rateR0 is available to bothnodes, and a message is sent from node X to node Y at rateR. Whenno common randomness is available, the required communication rate isR ≥ 2 − log( k

k−1) bits (for evenk). At the other extreme, if the rate

of common randomness is greater thanlog(k − 1), then R ≥ log( kk−1

)suffices.

To apply Theorem 10 we must evaluate the three quantitiesI(X ;Y ), C(X ;Y ), and H(Y †X). For the joint distributionp0(x)p(y|x), the necessary conditional entropyH(Y †X) isexactly the conditional entropyH(Y |X). The computation ofthe common informationC(X ;Y ) follows the same steps asthe derivation found in Example 6. Let⌈k⌉ take the value ofk rounded up to the nearest even number.

I(X ;Y ) = log

(

k

k − 1

)

,

C(X ;Y ) = 2 bits − log

( ⌈k⌉⌈k⌉ − 1

)

,

H(Y †X) = log (k − 1) .

Without common randomness, we find that the communica-tion rate R ≥ 2 bits − log

(

⌈k⌉⌈k⌉−1

)

is necessary to strongly

achievep0(x)p(y|x). The strong coordination capacity regionC p0

expands as the rate of common randomnessR0 increases.Additional common randomness is no longer useful whenR0 > log(k − 1). With this amount of common randomness,only the communication rateR ≥ log( k

k−1 ) is necessary tostrongly achievep0(x)p(y|x).

VI. RATE-DISTORTION THEORY

The challenge of describing random sources of informationwith the fewest bits possible can be defined in a number of dif-ferent ways. Traditionally, source coding in networks followsthe path of rate-distortion theory by establishing multiple dis-tortion penalties for the multiple sources and reconstructionsin the network. Yet, fundamentally, the rate-distortion problemis intimately connected to empirical coordination.

The basic result of rate-distortion theory for a single mem-oryless source states that in order to achieve any desireddistortion level you must find an appropriate conditionaldistribution of the reconstructionX given the sourceXand then use a communication rate larger than the mutualinformation I(X ; X). This lends itself to the interpretationthat optimal encoding for a rate-distortion setting reallycomesdown to coordinating a reconstruction sequence with a sourcesequence according to a selected joint distribution. Here wemake that observation formal by showing that in general, evenin networks, the rate-distortion region is a projection of thecoordination capacity region.

The coordination capacity regionCp0 is a set of rate-coordination tuples. We can express rate-coordination tu-ples as vectors. For example, in the cascade network ofSection III-C there are two ratesR1 and R2. The actionsin this network areX , Y , and Z, where X is givenby nature. Order the spaceX × Y × Z in a sequence(x1, y1, z1), ..., (xm, ym, zm), where m = |X ||Y||Z|. Therate-coordination tuples(R1, R2, p(y, z|x)) can be expressedas vectors[R1, R2, p(y1, z1|x1), ..., p(ym, zm|xm)]T .

The rate-distortion regionDp0 is the closure of the set ofrate-distortion tuples that are achievable in a network. Wesaythat a distortionD is achievable if there exists a rate-distortioncode that gives an expected average distortion less thanD,using d as a distortion measurement. For example, in thecascade network of Section III-C we might have two distortionfunctions: The functiond1(x, y) measures the distortion in thereconstruction at node Y; the functiond2(x, y, z) evaluatesdistortion jointly between the reconstructions at nodes Y andZ. The rate-distortion regionDp0 would consist of tuples(R1, R2, D1, D2), which indicate that using ratesR1 andR2

in the network, a source distributed according top0(x) canbe encoded to achieve no more thanD1 expected averagedistortion as measured byd1 andD2 distortion as measuredby d2.

The relationship between the rate-distortion regionDp0

and the coordination capacity regionCp0 is that of a linearprojection. Suppose we have multiple finite-valued distortionfunctionsd1, ..., dk. We construct a distortion matrixD usingthe same enumeration(x1, y1, z1), ..., (xm, ym, zm) of thespaceX × Y ×Z as was used to vectorize the tuples inCp0 :

D ,

d1(x1, y1, z1)p0(x1) · · · d1(xm, ym, zm)p0(xm)...

......

dk(x1, y1, z1)p0(x1) · · · dk(xm, ym, zm)p0(xm)

.

The distortion matrixD is embedded in a block diagonalmatrix A where the upper-left block is the identity matrixIwith the same dimension as the number of rates in the network:

A ,

[

I 00 D

]

.

Theorem 11 (Rate-distortion region). The rate-distortion re-gionDp0 for a memoryless source with distributionp0 in anyrate-limited network is a linear projection of the coordinationcapacity regionCp0 by the matrixA,

Dp0 = A Cp0 .

We treat the elements ofDp0 andCp0 as vectors, as discussed,and the matrix multiplication byA is the standard set multi-plication.

Discussion: The proof of Theorem 11 can be found inSection VII. Since the coordination capacity regionCp0 is aconvex set, the rate-distortion regionDp0 is also a convex set.

Clearly we can use a coordination code to achieve thecorresponding distortion in a rate-distortion setting. But thetheorem makes a stronger statement. It says that there is

Page 16: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

16

not a more efficient way of satisfying distortion limits inany network setting with memoryless sources than by usinga code that produces the same joint type for almost everyobservation of the sources. It is conceivable that a rate-distortion code for a network setting would produce a varietyof different joint types, each satisfying the distortion limit, butvarying depending on the particular source sequence observed.However, given such a rate-distortion code, repeated useswill produce a longer coordination code that consistentlyachieves coordination according to the expected joint type.The expected joint type of a good rate-distortion code can beshown to satisfy the distortion constraints.

0 0.5 10

0.5

1

PSfrag replacements

P (Y = 1|X = 0)

P(Y

=1|X

=1)

p0p1

R = 0.1

0.40.7

R(p0, p1)

d ≤ D

Fig. 20. Coordination capacity and rate-distortion.The coordination-rateregion for a uniform binary sourceX and binary actionY , where X isdescribed at rateR = 0.1 bits to node Y in the two-node network. The shadedregion shows distributions with Hamming distortion less thanD, whereD ischosen to satisfyR(D) = 0.1 bits.

Geometrically, each distortion constraint defines a hyper-plane that divides the coordination-rate region into two sets—one that satisfies the distortion constraint and one that doesnot. Therefore, minimizing the distortion for fixed rates inthe network amounts to finding optimal extreme points in thecoordination-rate region in the directions orthogonal to thesehyperplanes. Figure 20 shows the coordination-rate regionforR = 0.1 bits in the two-node network of Section III-A, with auniform binary sourceX and binaryY . The figure also showsthe region satisfying a Hamming distortion constraintD.

VII. PROOFS

A. Empirical Coordination - Achievability (Sections III, IV)

For a distributionp(x), define thetypical setT (n)ǫ with

respect top(x) to be sequencesxn whose types areǫ-close top(x) in total variation. That is,

T (n)ǫ , xn ∈ Xn : ‖Pxn(x)− p(x)‖TV < ǫ. (10)

This definition is almost the same as the definition of thestrongly typical setA∗(n)

ǫ found in (10.106) of Cover andThomas [33], and it shares the same important properties.The difference is that here we give a total variation constraint(L1 distance) on the type of the sequence rather than anelement-wise constraint (L∞ distance).2 We deal withT (n)

ǫ

2Additionally, our definition of the typical set handles the zero probabilityevents more liberally, but this doesn’t present any seriouscomplications.

since it relates more closely to the definition of achievabilityin Definition 5. However, the sets are almost the same, as thefollowing sandwich suggests:

A∗(n)ǫ ⊂ T (n)

ǫ ⊂ A∗(n)ǫ|X |.

A jointly typical set with respect to a joint distributionp(x, y) inherits the same definition as (10), where total vari-ation of the type is measured with respect to the joint distri-bution. Thus, achieving empirical coordination with respect toa joint distribution is a matter of constructing actions that areǫ-jointly typical (i.e. in the jointly typical setT (n)

ǫ ) with highprobability for arbitraryǫ.

1) Strong Markov Lemma:If X − Y − Z form a Markovchain, and the pair of sequencesxn and yn are jointlytypical as well as the pair of sequencesyn and zn, it is nottrue in general that the three sequencesxn, yn, and zn arejointly typical as a triple. For instance, consider any triple(xn, yn, zn) that is jointly typical with respect to a non-Markov joint distribution having marginal distributionsp(x, y)and p(y, z). However, the Markov Lemma [23] states that ifZn is randomly distributed according to

∏ni=1 p(zi|yi), then

with high probability it will be jointly typical with bothxn andyn. This lemma is used to establish joint typicality in sourcecoding settings where side information is not known to theencoder. Yet, for a network and encoding scheme that is moreintricate, the standard Markov Lemma lacks the necessarystrength. Here we introduce a generalization that will helpus analyze the layers of “piggy-back”-style codes [34] usedinour achievability proofs.3

Theorem 12(Strong Markov Lemma). Given a joint distribu-tion p(x, y, z) on the finite alphabetX ×Y ×Z that yields aMarkov chainX−Y −Z (i.e. p(x, y, z) = p(y)p(x|y)p(z|y)),let xn andyn be arbitrary sequences that areǫ-jointly typical.Suppose thatZn is randomly chosen from the set ofzn se-quences that areǫ-jointly typical withyn and additionally thatthe distribution ofZn is permutation-invariant with respect toyn, which is to say, any two sequenceszn and zn of the samejoint type withyn have the same probability. That is,

Pyn,zn = Pyn,zn ⇒ P (Zn = zn) = P (Zn = zn). (11)

Then,

Pr(

(xn, yn, Zn) ∈ T (n)4ǫ

)

> ξn,

whereξn → 1 exponentially fast asn goes to infinity.

Notice that permutation invariance is a condition satisfiedby most random codebook based proof techniques—for in-stance, encoding schemes based on i.i.d. codebooks tend tobe permutation invariant. To recover the familiar MarkovLemma, letZn have a distribution based onyn accordingto

∏ni=1 p(zi|yi), whereyn is an ǫ-typical sequence. Due to

the A.E.P.,yn and Zn will be 2ǫ-jointly typical with highprobability. Furthermore, Theorem 12 can be invoked becausethe distribution is permutation invariant.

3Through conversation we discovered that similar effort is being made byYoung-Han Kim and Abbas El Gamal and may soon be found in the StanfordEE478 Lecture Notes.

Page 17: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

17

The key to proving Theorem 12 is found in Lemma 13,which uses permutation invariance and counting arguments toshow that most realizations look empirically Markov.

Lemma 13 (Markov Tendency). Let xn ∈ Xn and yn ∈ Yn

be arbitrary sequences. Suppose that the random sequenceZn ∈ Zn has a distribution that is permutation-invariant withrespect toyn, as in (11). Then with high probability whichonly depends on the sizes of the alphabetsX , Y, andZ, thejoint typePxn,yn,Zn will be ǫ-close to the Markov joint typePxn,ynPZn|yn . That is, for anyǫ > 0,

∥Pxn,yn,Zn − Pxn,ynPZn|yn

TV< ǫ, (12)

with a probability of at least1 − 2−αn+β logn, whereα andβ only depend on the alphabet sizes andǫ.

Proof of Theorem 12: The proof of Theorem 12 re-lies mainly on Lemma 13 and repeated use of the triangleinequality. From Lemma 13 we know that with probabilityapproaching one asn tends to infinity, inequality (12) issatisfied, namely,

∥Pxn,yn,Zn − Pxn,ynPZn|yn

TV< ǫ.

In this event, we now show that

(xn, yn, Zn) ∈ T (n)4ǫ .

By the definition of total variation one can easily show that

‖Pxn,ynPZn|yn − pX,Y PZn|yn‖TV

= ‖Pxn,yn − pX,Y ‖TV

< ǫ.

Similarly,

‖pY pX|Y PZn|yn − PynpX|Y PZn|yn‖TV

= ‖pY − Pyn‖TV

< ǫ.

And finally,

‖Pyn,ZnpX|Y − pX,Y,Z‖TV

= ‖Pyn,Zn − pY,Z‖TV

< ǫ.

Thus, the triangle inequality gives

‖Pxn,yn,Zn − pX,Y,Z‖TV < 4ǫ.

Proof of Lemma 13:We start by defining two constantsthat simplify this discussion. The first constant,α, is the keyto obtaining the uniform bound that Lemma 13 provides.

α , minp(x,y,z)∈SX,Y,Z : ‖p(x,y,z)−p(x,y)p(z|y)‖TV ≥ǫ

I(X ;Z|Y ),

β , 2|X ||Y||Z|.HereSX ,Y,Z is the simplex with dimension corresponding tothe product of the alphabet sizes. Notice thatα is defined asa minimization of a continuous function over a compact set;therefore, by analysis we know that the minimum is achievedin the set. SinceI(X ;Z|Y ) is positive for any distribution that

does not form a Markov chainX − Y − Z, we find thatα ispositive for ǫ > 0. The constantsα andβ are functions ofǫand the alphabet sizes|X |, |Y|, and |Z|.

We categorize sequences into sets with the same joint type.The type classTp(y,z) is defined as

Tp(y,z) , (yn, zn) : Pyn,zn = p(y, z).

We also define aconditional type classTp(z|y)(yn) to be the

set ofzn sequences such that the pair(yn, zn) are in the typeclassTp(y,z). Namely,

Tp(z|y)(yn) , zn : Pyn,zn = p(z|y)Pyn.

We will show that the statement made in (12) is trueconditionally for each conditional type classTp(z|y)(y

n) andtherefore must be true overall.

SupposeZn falls in the conditional type classTPzn|yn(yn).

By assumption (11), allzn in this type class are equally likely.Assessing probabilities simply becomes a matter of counting.From the method of types [33] we know that

∣TPzn|yn(yn)

∣ ≥ n−|Y||Z|2nHPyn,zn(Z|Y ).

We also can bound the number ofzn sequences inTPzn|yn

(yn) that do not satisfy (12). These sequences mustfall in a conditional type classTPzn|xn,yn

(xn, yn) where∥

∥Pxn,yn,zn − Pxn,ynPzn|yn

TV≥ ǫ.

For each such type class, the size can be bounded by∣

∣TPzn|xn,yn(xn, yn)

∣ ≤ 2nHPxn,yn,zn(Z|X,Y )

= 2n(

HPyn,zn(Z|Y )−IPxn,yn,zn

(X;Z|Y ))

≤ 2n(

HPyn,zn(Z|Y )−α

)

.

Furthermore, there are only polynomially many types,bounded byn|X ||Y||Z|. Therefore, the probability thatZn doesnot satisfy (12) for any conditional typePzn|yn is bounded by

Pr( not (12) | Zn ∈ TPzn|yn(yn) )

=

∣zn ∈ TPzn|yn(yn) : not (12)

∣TPzn|yn(yn)

≤ n|X ||Y||Z|2n(

HPyn,zn(Z|Y )−α

)

n−|Y||Z|2nHPyn,zn(Z|Y )

= n|Y||Z|+|X ||Y||Z|2−αn

≤ 2−αn+β logn.

2) Generic Achievability Proof:The coding techniques forachieving the empirical coordination regions in Sections IIIand IV are familiar from rate distortion theory. For the proofs,we construct random codebooks for communication and showthat the resulting encoding schemes perform well on average,producing jointly-typical actions with high probability.Thisproves that there must be at least one deterministic scheme thatperforms well. Here we prove one generally useful example toverify that the rate-distortion techniques actually do work forachieving empirical coordination. The technique here is very

Page 18: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

18

similar to the source coding technique of “piggy-back” codesintroduced by Wyner [34].

Consider the two-node source coding setting of Figure 21with arbitrary sequencesxn, yn, and zn that areǫ-jointlytypical according to a joint distributionp(x, y, z). The se-quencesxn and yn are available to the encoder at node 1,while yn and zn are available to the decoder at node 2. Wecan think ofxn as the source to be encoded andyn and zn

as side information known to either both nodes or the decoderonly, respectively. Communication from node 1 to node 2 atrate R is used to produce a sequenceUn. Original resultsrelated to this setting in the context of rate-distortion theorycan be found in the work of Wyner and Ziv [27]. Here weanalyze a randomized coding scheme that attempts to producea sequenceUn at the decoder such that(xn, yn, zn, Un)are (8ǫ)-jointly typical with respect to a joint distribution ofthe form p(x, y, z)p(u|x, y). We give a scheme that uses acommunication rate ofR > I(X ;U |Y, Z) and is successfulwith probability approaching one asn tends to infinity for alljointly typical sequencesxn, yn, andzn.

PSfrag replacementsxn, yn

Un

I ∈[

2nR]

Node 1 Node 2

yn, zn

Fig. 21. Two nodes with side information.This network represents a genericsource coding setting encountered in networks and will illustrate standardencoding techniques. The sequencesxn, yn, andzn are jointly typical withrespect top0(x, y, z). Only xn andyn are observed by the encoder at node1. A message is sent to specifyUn to node 2 at rateR. A randomized codingscheme can produceUn to be jointly typical with(xn, yn, zn) with respectto a Markov chainZ − (X, Y )− U with high probability, regardless of theparticular sequencesxn, yn, andzn, as long as the rate is greater than theconditional mutual informationI(X;U |Y,Z).

The (2nR, n) coordination codes consist of a randomizedencoding function

i : Xn × Yn × Ω −→ 1, ..., 2nR,

and a randomized decoding function

un : 1, ..., 2nR × Yn ×Zn × Ω −→ Un.

These functions are random simply because the commonrandomnessω is involved for generating random codebooks.

The sequencesxn, yn, andzn are arbitrary jointly typicalsequences according top0(x, y, z), and the sequenceUn is arandomized function ofxn, yn, andzn given by implementingthe coordination code as

Un = un(i(xn, yn, ω), yn, zn, ω).

Lemma 14(Generic Coordination with Side Information). Forthe two-node network with side information of Figure 21 andany discrete joint distribution of the formp(x, y, z)p(u|x, y),there exists a functionδ(ǫ) which goes to zero asǫ goes to zerosuch that, for anyǫ > 0 and rateR > I(X ;U |Y, Z) + δ(ǫ),

there exists a sequence of randomized coordination codes atrate R for which

Pr(

(xn, yn, zn, Un) ∈ T (n)δ(ǫ)

)

→ 1

asn goes to infinity, uniformly for all(xn, yn, zn) ∈ T (n)ǫ .

Proof: Consider a joint distributionp(x, y, z)p(u|x, y)and defineγ to be the excess rate,γ = R−I(X ;U |Y, Z). Theconditions of Lemma 14 require thatγ > δ(ǫ) for someδ(ǫ)that goes to zero asǫ goes to zero. We will identify a validfunction δ(ǫ) at the conclusion of the following analysis.

We first over-cover the typical set of(xn, yn) using acodebook of size2nRc , whereRc = I(X,Y ;U) + γ/2. Wethen randomly categorize the codebook sequences into2nR

bins, yielding roughly2nRb sequences in each bin, where

Rb = Rc −R

= I(X,Y ;U)− I(X ;U |Y, Z)− γ/2

= I(X,Y, Z;U)− I(X ;U |Y, Z)− γ/2

= I(Y, Z;U)− γ/2.

Codebook:Using ω, generate a codebookC of 2nRc se-quencesun(j) independently according to the marginal distri-butionp(u), namely

∏ni=1 p(ui). Randomly and independently

assign each one a bin numberb(un(j)) in the set1, ..., 2nR.Encoder: The encoding functioni(xn, yn, ω) can be ex-

plained as follows. Search the codebookC and identify anindex j such that(xn, yn, un(j)) ∈ T (n)

2ǫ . If multiple exist,select the first suchj. If none exist, selectj = 1. Send thebin numberi(xn, yn, ω) = b(un(j)).

Decoder: The decoding functionun(i, yn, zn, ω) can beexplained as follows. Consider the codebookC and identifyan indexj such that(yn, zn, un(j)) ∈ T (n)

8ǫ andb(un(j)) = i.If multiple exist, select the first suchj. If none exist, selectj = 1. Produce the sequenceUn = un(j).

Error Analysis:We conservatively declare errors for any ofthe following,E1, E2, or E3.

Error 1: The encoder does not find a(2ǫ)-jointly typicalsequence in the codebook.By the method of types one canshow, as in Lemma 10.6.2 of [33], that each sequence inC is(2ǫ)-jointly typical with (xn, yn) with probability greater than2−n(I(X,Y ;U)+δ1(ǫ)) for n large enough, whereδ1(ǫ) goes tozero asǫ goes to zero.

Each sequence in the codebookC is generated indepen-dently, so the probability that none of them are jointly typicalis bounded by

Pr(E1) ≤ (1 − 2−n(I(X,Y ;U)+δ1(ǫ)))2nRc

≤ e−2nRc2−n(I(X,Y ;U)+δ1(ǫ))

= e−2n(Rc−I(X,Y ;U)−δ1(ǫ))

= e−2n(γ/2−δ1(ǫ))

.

Error 2: The sequence identified by the encoder is not(8ǫ)-jointly typical with (xn, yn, zn). AssumingE1 did notoccur, because of the MarkovityZ − (X,Y ) − U impliedby p(x, y, z)p(u|x, y) and the symmetry of our codebookconstruction, we can invoke Theorem 12 to verify that the

Page 19: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

19

conditional probabilityPr(E2|Ec1) is arbitrarily small for large

enoughn.Error 3: The decoder finds more than one eligible action

sequence.Assume thatE1 andE2 did not occur. If the decoderconsiders the same indexj as the encoder selected, thencertainly un(j) will be be eligible, which is to say it willbe (8ǫ)-jointly typical with (yn, zn), and the bin index willmatch the received message. For all other sequences in thecodebookC, an appeal to the property of iterated expectationindicates that the probability of eligibility is slightly less thanthe a priori probability that a randomly generated sequenceand bin number will yield eligibility (had you not known thatit was not the sequence selected by the encoder), which isupper bounded by2−nR2−n(I(Y,Z;U)−δ2(ǫ)). Therefore, by themethod of types and the union bound,

Pr(E3|Ec1, E

c2) ≤ 2nRc2−nR2−n(I(Y,Z;U)−δ2(ǫ))

= 2−n(R−Rc+I(Y,Z;U)−δ2(ǫ))

= 2−n(I(Y,Z;U)−Rb−δ2(ǫ))

= 2−n(γ/2−δ2(ǫ)).

Thus we can selectδ(ǫ) = max2δ1(ǫ), 2δ2(ǫ), 8ǫ to makeall error terms go to zero and satisfy the lemma.

With the result of Lemma 14 in mind, we can confidentlytalk about using communication to establish coordination ofsequences across links in a network. Throughout the followingexplanations we will no longer pay particular attention to theǫ in the ǫ-jointly typical set. Instead, we will simply makereference to the generic jointly typical set, with the assumptionthat ǫ is sufficiently small andn is sufficiently large.

3) Two nodes - Theorem 3:It is clear from Lemma 14that an action sequenceY n jointly typical with Xn can bespecified with high probability using any rateR > I(X ;Y ).With high probabilityXn will be a typical sequence. ApplyLemma 14 withY = Z = ∅.

4) Isolated node - Theorem 4:No proof is necessary, asthis is a special case of the cascade network withR2 = 0.

5) Cascade - Theorem 5:The cascade network of Figure8 has a sequenceXn given by nature. The actionsXn

will be typical with high probability. Consider the desiredcoordinationp(y, z|x). A sequenceZn can be specified withrate RZ > I(X ;Z) to be jointly typical with Xn. Thiscommunication is sent to node Y and forwarded on to nodeZ. Additionally, now that every node knowsZn, a sequenceY n can be specified with rateRY > I(X ;Y |Z) and sent tonode Y. The rates used areR1 = RY +RZ > I(X ;Y, Z) andR2 = RZ > I(X ;Z).

R1 = RY +RZ > I(X ;Y, Z),

R2 = RZ > I(X ;Z).

6) Degraded source - Theorem 6:The degraded sourcenetwork of Figure 10 has a sequenceXn given by nature,known to node X, and another sequenceY n, which is a letter-by-letter function ofXn, known to node Y. Incidentally,Y n isalso known to node X because it is a function of the availableinformation. The actionsXn and Y n will be jointly typicalwith high probability.

Consider the desired coordinationp(z|x, y) and choose adistribution for the auxiliary random variablep(u|x, y, z) tohelp achieve it. The encoder first specifies a sequenceUn

that is jointly typical withXn and Y n. This requires a rateRU > I(X,Y ;U) = I(X ;U), but with binning we only needa rate ofR1 > I(X ;U |Y ) to specifyUn from node X tonode Y. Binning is not used whenUn is forwarded to nodeZ. Finally, after everyone knowsUn, the action sequenceZn

jointly typical with Xn, Y n, andUn is specified to node Zat a rate ofR2 > I(X,Y ;Z|U) = I(X ;Z|U). Thus, all ratesare achievable which satisfy

R1 > I(X ;U |Y ),

R2 > I(X ;Z|U),

R3 = RU > I(X ;U).

7) Broadcast - Theorem 7:The broadcast network of Figure11 has a sequenceXn given by nature, known to node X. Theaction sequenceXn will be typical with high probability.

Consider the desired coordinationp(y, z|x) and choose adistribution for the auxiliary random variablep(u|x, y, z) tohelp achieve it. We will focus on achieving one corner pointof the pentagonal rate region. The encoder first specifies asequenceUn that is jointly typical with Xn using a rateRU > I(X ;U). This sequence is sent to both node Y andnode Z. After everyone knowsUn, the encoder specifies anaction sequenceY n that is jointly typical withXn andUn

using rateRY > I(X ;Y |U). Finally, the encoder at node X,knowing bothXn and Y n, can specify an action sequenceZn that is jointly typical with (Xn, Y n, Un) using a rateRZ > I(X,Y ;Z|U). This results in rates

R1 = RU +RY > I(X ;U) + I(X ;Y |U) = I(X ;U, Y ),

R2 = RU +RX > I(X ;U) + I(X,Y ;Z|U).

8) Cascade multiterminal - Theorem 8:The cascade mul-titerminal network of Figure 14 has a sequenceXn given bynature, known to node X, and another sequenceY n given bynature, known to node Y. The actionsXn and Y n will bejointly typical with high probability.

Consider the desired coordinationp(z|x, y) and choosea distribution for the auxiliary random variablesU andV according to the inner bound in Theorem 8. That is,p(x, y, z, u, v) = p(x, y)p(u, v|x)p(z|y, u, v). We specify asequenceUn to be jointly typical withXn. By the StrongMarkov Lemma (Theorem 12), in conjunction with the sym-metry of our random coding scheme and the Markovity of thedistribution p(x, y)p(u|x), the sequenceUn will be jointlytypical with the pair(Xn, Y n) with high probability. Usingbinning, we only need a rate ofRU,1 > I(X ;U |Y ) to specifyUn from node X to node Y (as in Lemma 14). However, wecannot use binning for the message to node Z, so we sendthe index of the codework itself at a rate ofRU,2 > I(X ;U).Now that everyone knows the sequenceUn, it is treated asside information.

A second auxiliary sequenceV n is specified from nodeX to node Y to be jointly typical with(Xn, Y n, Un). Thisscenario coincides exactly with Lemma 14, and a sufficientrate isRV > I(X ;V |U, Y ). Finally, an action sequenceZn

Page 20: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

20

is specified from node Y to node Z to be jointly typicalwith (Y n, V n, Un), where Un is side information knownto the encoder and decoder. We achieve this using a rateRZ > I(Y, V ;Z|U). Again, because of the symmetry of ourencoding scheme, the Strong Markov Lemma (Theorem 12)tells us that(Xn, Y n, Un, V n, Zn) will be jointly typical, andtherefore,(Xn, Y n, Zn) will be jointly typical.

The rates used by this scheme are

R1 = RU,1 +RV > I(X ;U, V |Y ),

R2 = RU,2 +RZ > I(X ;U) + I(Y, V ;Z|U).

B. Empirical Coordination - Converse (Sections III, IV)

In proving outer bounds for the coordination capacity ofvarious networks, a commontime mixingtrick is to make useof a random time variableQ and then consider the value of arandom sequenceXn at the random timeQ using notationXQ. We first make this statement precise and discuss theimplications of such a construction.

Considering a coordination code for a block lengthn. WeassignQ to have a uniform distribution over the set1, ..., n,independent of the action sequences in the network. ThevariableXQ is simply a function of the sequenceXn and thevariableQ; namely, the variableXQ takes on the value of theQth element in the sequenceXn. Even though all sequences ofactions and auxiliary variables in the network are independentof Q, the variableXQ need not be independent ofQ.

Here we list a couple of key properties of time mixing.Property 1: If all elements of a sequenceXn are identically

distributed, thenXQ is independent ofQ. Furthermore,XQ

has the same distribution asX1. Verifying this property iseasy when one considers the conditional distribution ofXQ

givenQ.Property 2: For a collection of random sequencesXn, Y n,

andZn, the expected joint typeEPXn,Y n,Zn is equal to thejoint distribution of the time-mixed variables(XQ, YQ, ZQ).

E PXn,Y n,Zn(x, y, z)

=∑

xn,yn,zn

p(xn, yn, zn)PXn,Y n,Zn(x, y, z)

=∑

xn,yn,zn

p(xn, yn, zn)1

n

n∑

q=1

1((xq , yq, zq) = (x, y, z))

=1

n

n∑

q=1

xn,yn,zn

p(xn, yn, zn)1((xq , yq, zq) = (x, y, z))

=1

n

n∑

q=1

pXq,Yq,Zq (x, y, z)

=

n∑

q=1

pXQ,YQ,ZQ|Q(x, y, z|q)p(q)

= pXQ,YQ,ZQ(x, y, z).

1) Two nodes - Theorem 3:Assume that a rate-coordinationpair (R, p(y|x)) is in the interior of the coordination capacityregionCp0 for the two-node network of Figure 5 with sourcedistribution p0(x). For a sequence of(2nR, n) coordination

codes that achieves(R, p(y|x)), consider the induced distri-bution on the action sequences.

Recall thatI is the message from node X to node Y.

nR ≥ H(I)

≥ I(Xn;Y n)

=

n∑

q=1

I(Xq;Yn|Xq−1)

=n∑

q=1

I(Xq;Yn, Xq−1)

≥n∑

q=1

I(Xq;Yq)

= nI(XQ;YQ|Q)a= nI(XQ;YQ, Q)

≥ nI(XQ;YQ).

Equality a comes from Property 1 of time mixing.We would like to be able to say that the joint distribution

of XQ and YQ is arbitrarily close top0(x)p(y|x) for somen. That way we could conclude, by continuity of the entropyfunction, thatR ≥ I(X ;Y ).

The definition of achievability (Definition 5) states that

‖PXn,Y n,Zn(x, y, z)− p0(x)p(y, z|x)‖TV −→ 0 in probability.

Because total variation is bounded, this implies that

E ‖PXn,Y n,Zn(x, y, z)− p0(x)p(y, z|x)‖TV −→ 0.

Furthermore, by the Jensen Inequality,

EPXn,Y n,Zn(x, y, z) −→ p0(x)p(y, z|x).

Now Property 2 of time mixing allows us to conclude theargument for Theorem 3.

2) Isolated node - Theorem 4:No proof is necessary, asthis is a special case of the cascade network withR2 = 0.

3) Cascade - Theorem 5:For the cascade network of Figure8, apply the bound from the two-node network twice—once toshow that the rateR1 ≥ I(X ;Y, Z) is needed even if node Yand node Z are allowed to fully cooperate, and once to showthat the rateR2 ≥ I(X ;Z) is needed even if node X and nodeY are allowed to fully cooperate.

4) Degraded source - Theorem 6:Assume that a rate-coordination quadruple(R1, R2, R3, p(z|x, y)) is in the inte-rior of the coordination capacity regionCp0 for the degradedsource network of Figure 10 with source distributionp0(x)and the degraded relationshipYi = f0(xi). For a sequenceof (2nR1 , 2nR2 , 2nR3 , n) coordination codes that achieves(R1, R2, R3, p(z|x, y)), consider the induced distribution onthe action sequences.

Recall that the message from node X to node Y at rateR1

is labeledI, the message from node X to node Z at rateR2

is labeledJ , and the message from node Y to node Z at rateR3 is labeledK. We identify the auxiliary random variableUas the collection of random variables(K,XQ−1, Q).

Page 21: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

21

nR1 ≥ H(I)

≥ H(I|Y n)a= H(I,K|Y n)

≥ H(K|Y n)

= I(Xn;K|Y n)

=

n∑

q=1

I(Xq;K|Y n, Xq−1)

=

n∑

q=1

I(Xq;K,Xq−1, Y q−1, Y nq+1|Yq)

≥n∑

q=1

I(Xq;K,Xq−1|Yq)

= nI(XQ;K,XQ−1|YQ, Q)b= nI(XQ;K,XQ−1, Q|YQ)

= nI(XQ;U |YQ).

Equalitya is justified because the messageK is a function ofthe messageI and the sequenceY n. Equality b comes fromProperty 1 of time mixing.

nR2 ≥ H(J)

≥ H(J |K)a= H(J, Zn|K)

≥ H(Zn|K)

= I(Xn;Zn|K)

≥n∑

q=1

I(Xq;Zn|K,Xq−1)

≥n∑

q=1

I(Xq;Zq|K,Xq−1)

= nI(XQ;ZQ|K,XQ−1, Q)

= nI(XQ;ZQ|U).

Equality a is justified because the action sequenceZn is afunction of the messagesJ and K. Equality b comes fromProperty 1 of time mixing.

nR3 ≥ H(K)

= I(Xn;K)

=

n∑

q=1

I(Xq;K|Xq−1)

= nI(XQ;K|XQ−1, Q)a= nI(XQ;K,XQ−1, Q)

= nI(XQ;U).

Equality a comes from Property 1 of time mixing.As seen in the proof for the two-node network, the joint

distribution of XQ, YQ, and ZQ is arbitrarily close top0(x)1(y = f0(x))p(z|x, y). Therefore, sinceCp0 is a closed

set, (R1, R2, R3, p(z|x, y)) is in the coordination capacityregion stated in Theorem 6.

It remains to bound the cardinality ofU . We can use thestandard method rooted in the support lemma of [35]. ThevariableU should have|X ||Z| − 1 elements to preserve thejoint distribution p(x, z), which in turn preservesp(x, y, z),H(X), and H(X |Y ), and three more elements to preserveH(X |U), H(X |Y, U), andH(X |Z,U).

5) Broadcast - Theorem 7:For the broadcast network ofFigure 11, apply the bound from the two-node network threetimes—once to show that the rateR1 ≥ I(X ;Y ) is needed andonce to show that the rateR2 ≥ I(X ;Z) is needed, and finallya third time to show that the sum-rateR1 +R2 = I(X ;Y, Z)is needed even if node Y and node Z are allowed to fullycooperate.

6) Cascade multiterminal - Theorem 8:Assume that a rate-coordination triple(R1, R2, p(z|x, y)) is in the interior of thecoordination capacity regionCp0 for the cascade multiterminalnetwork of Figure 14 with source distributionp0(x, y). For asequence of(2nR1 , 2nR2 , n) coordination codes that achieves(R1, R2, p(z|x, y)), consider the induced distribution on theaction sequences.

Recall that the message from node X to node Y at rateR1 islabeledI, and the message from node Y to node Z at rateR2 islabeledJ . We identify the auxiliary random variableU as thecollection of random variables(I,XQ−1, Y Q−1, Y n

Q+1, Q).This is the same choice of auxiliary variable used by Wynerand Ziv [27]. Notice thatU satisfies the Markov chain prop-ertiesU −XQ − YQ andXQ − (YQ, U)− ZQ

nR1 ≥ H(I)

≥ H(I|Y n)

= I(Xn; I|Y n)

=

n∑

q=1

I(Xq; I|Y n, Xq−1)

=

n∑

q=1

I(Xq; I,Xq−1, Y q−1, Y n

q+1|Yq)

= nI(XQ; I,XQ−1, Y Q−1, Y n

Q+1|YQ, Q)a= nI(XQ; I,X

Q−1, Y Q−1, Y nQ+1, Q|YQ)

= nI(XQ;U |YQ).

Equality a comes from Property 1 of time mixing.

Page 22: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

22

nR2 ≥ H(J)

≥ I(Xn, Y n;Zn)

=

n∑

q=1

I(Xq, Yq;Zq|Xq−1, Y q−1)

=n∑

q=1

I(Xq, Yq;Zq, Xq−1, Y q−1)

≥n∑

q=1

I(Xq, Yq;Zq)

= nI(XQ, YQ;ZQ|Q)a= nI(XQ, YQ;ZQ, Q)

≥ I(XQ, YQ;ZQ).

Equality a comes from Property 1 of time mixing.As seen in the proof for the two-node network, the

joint distribution of XQ, YQ, and ZQ is arbitrarily closeto p0(x, y)p(z|x, y). Therefore, sinceCp0 is a closed set,(R1, R2, p(z|x, y)) is in the coordination capacity regionstated in Theorem 8.

It remains to bound the cardinality ofU . We can againuse the standard method of [35]. Notice thatp(x, y, z|u) =p(x|u)p(y|x)p(z|y, u) captures all of the Markovity con-straints of the outer bound. Therefore, convex mixtures ofdistributions of this form are valid for achieving points inthe outer bound. The variableU should have|X ||Y||Z| − 1elements to preserve the joint distributionp(x, y, z), whichin turn preservesI(X,Y ;Z) and H(X |Y ), and one moreelement to preserveH(X |Y, U).

C. Strong Coordination (Section V)

1) No communication - Theorem 9:The network of Figure16 with no communication generalizes Wyner’s common in-formation work [9] to three nodes. Here we provide a sketchof the proof.

The following phenomenon was noticed both by Wyner [9]and by Han and Verdu [14]. Consider a memoryless channelp(x|u). A channel input with distributionp(u) induces anoutput with distributionp(x) =

u p(u)p(x|u). If the inputsare i.i.d. then the outputs are i.i.d. as well. Now suppose thatinstead a channel input sequenceUn is chosen uniformly atrandom from a setM of 2nR deterministic sequences. IfR > I(X ;U) then the setM can be chosen so that the outputdistribution is arbitrarily close in total variation to thei.i.d.distribution

∏ni=1 p(xi) for large enoughn.

Figure 22 illustrates how to achieve the strong coordinationcapacity regionC of Theorem 9. Let each decoder simulate amemoryless channel fromU to X , Y , or Z, depending on theparticular node. The common randomnessω is used to indexa sequenceUn(ω) that is used as the inputs to the channels.Notice that the action sequencesXn, Y n, andZn producedvia these three separate channels are distributed the same asif they were generated as outputs of a single channel becausep(x, y, z|u) = p(x|u)p(y|u)p(z|u) according to the definition

PSfrag replacements

ω

Common randomness

Node X

Node Y

Node Z

p(x|u)

p(y|u)

p(z|u)

Un(ω)

Un(ω)

Un(ω)

Xn

Y n

Zn

Fig. 22. Achievability for no-communication network.The strong coordi-nation capacity regionC of Theorem 9 is achieved in a network with nocommunication by using the common randomness to specify a sequenceUn(ω) that is then passed through a memoryless channel at each nodeusingprivate randomness.

of C in the theorem. SinceR > I(X,Y, Z;U) for points inthe interior ofC, this scheme will achieve strong coordination.

For the converse, identify the auxiliary variableU asω andnotice thatXq, Yq, andZq are conditionally independent (forall q) givenω.

nR ≥ H(ω)

≥ I(Xn, Y n, Zn;ω)

≥ I(Xn, Y n, Zn;U).

SinceXn, Y n, andZn have a joint distribution close in totalvariation to the i.i.d. distribution

∏ni=1 p(xi, yi, zi), it can be

shown that they can essentially be treated as i.i.d. sequencesin the mutual information bounds (see [30]). If they were i.i.d.we would have

I(Xn, Y n, Zn;U)

=

n∑

q=1

I(Xq, Yq, Zq;U |Xq−1, Y q−1, Zq−1)

=

n∑

q=1

I(Xq, Yq, Zq;U,Xq−1, Y q−1, Zq−1)

≥n∑

q=1

I(Xq, Yq, Zq;U)

≥ nminU

I(X,Y, Z; U),

where the minimization is over all eligible auxiliaryU thatseparateX , Y , andZ into conditional independence.

It remains to bound the cardinality ofU . We can againuse the standard method of [35]. The variableU shouldhave|X ||Y||Z| − 1 elements to preserve the joint distributionp(x, y, z), which in turn preservesH(X,Y, Z), and one moreelement to preserveH(X,Y, Z|U).

2) Two nodes - Theorem 10:The strong coordinationcapacity region for the two-node network of Figure 18 is the

Page 23: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

23

main result of [30]:

Cp0=

p(y|x) : ∃p(u|x, y) such thatp(x, y, u) = p(u)p(x|u)p(y|u)|U| ≤ |X ||Y| + 1,R ≥ I(X ;U),R0 +R ≥ I(X,Y ;U).

, (13)

whereR0 refers to the rate of common randomness, andRrefers to the communication rate.

In the case of no common randomness (R0 = 0), thestronger inequality in (13) on the rateR become the second,R ≥ I(X,Y ;U). Because of the Markov constraint onU ,the minimum value of the right-hand side of this inequality isWyner’s common informationC(X ;Y ).

Additionally, Theorem 10 states that ifR0 is greater thanthe necessary conditional entropyH(Y †X) then ratesR >I(X ;Y ) are sufficient for achieving strong coordination. Thisis a straightforward application of the definition ofH(Y †X).We can verify this with the following choice ofU :

U = argminf(Y ) : X−f(Y )−Y

H(f(Y )|X).

Notice that this choice ofU separatesX andY into a Markovchain by definition. Also, the mutual informationI(X ;U) isless than or equal toI(X ;Y ), sinceU is a function ofY ,thus satisfying the first rate inequality in (13). The secondinequality is satisfied because of the chain rule,

I(X,Y ;U) = I(X ;U) + I(Y ;U |X)

= I(X ;U) +H(Y †X)

≤ I(X ;Y ) +H(Y †X).

Furthermore, we can show that this is the least amountof common randomness needed to fully expand the strongcoordination capacity region. In other words, the minimumR0 such that(R0, I(X ;Y )) is in the strong rate-coordinationregionRp0

p(y|x) is H(Y †X).To prove this, first consider the implications ofR =

I(X ;Y ). This means that in order to satisfy the first rate in-equality in (13), we must haveI(X ;U) ≤ I(X ;Y ). However,because of the Markovity,I(X ;U) = I(X ;U, Y ). Therefore,I(X ;U |Y ) = 0, which implies a second Markov conditionX − Y − U in addition toX − U − Y .

We are concerned with minimizing the required rate ofcommon randomnessR0. SinceR = I(X ;Y ), the second rateinequality in (13) becomesR0 ≥ I(Y ;U |X). The conditionalentropy H(Y |X) is fixed, so we want to maximize theconditional entropyH(Y |U,X).

With the distributionp(x|y) in mind, we can clump valuesof Y together for which the channel fromY to X is identical.Define a functionf with the property that

f(y) = f(y) ⇐⇒ p(x|y) = p(x|y) for ∀x ∈ X . (14)

LettingU = f(Y ) will be the choice ofU that simultaneouslymaximizesH(Y |U,X) and satisfies the Markov conditionsX − U − Y and X − Y − U . We can compareU to anyother choiceU that satisfies the conditions and show that theresulting conditional entropyH(Y |U ,X) is smaller.

Another way to state the two Markov conditions is thatfor all values of y and u such thatp(y, u) > 0, the con-ditional distributionsp(x|y) and p(x|u) are equal becausep(x|y) = p(x|y, u) = p(x|u). Notice that the value ofU = f(Y ), characterized in (14), only depends on the channelp(x|y). However, with probability one the value ofU canbe determined fromU based on the conditional distributionp(x|u). Therefore,

H(Y |U ,X) = H(Y, U |U ,X)

= H(Y |U, U,X) + I(U |U ,X)

= H(Y |U, U,X)

≤ H(Y |U,X).

D. Rate-distortion theory (Sections VI)

We establish the relationship from Theorem 11 between thecoordination capacity region and the rate-distortion region intwo parts. First we show thatDp0 containsACp0 and then theother way around. To keep clutter to a minimum and withoutloss of generality, we only discuss a single distortion measured, rateR, and a pair of sequences of actionsXn andY n.

1) Coordination implies distortion (Dp0 ⊃ ACp0 ): Thedistortion incurred with respect to a distortion functiond ona set of sequences of actions is a function of the joint type ofthe sequences. That is,

d(n)(xn, yn) =1

n

n∑

i=1

d(xi, yi)

=1

n

n∑

i=1

x,y

1(xi = x, yi = y)d(x, y)

=∑

x,y

d(x, y)1

n

n∑

i=1

1(xi = x, yi = y)

=∑

x,y

d(x, y)Pxn,yn(x, y)

= EPxn,ynd(X,Y ). (15)

When a rate-coordination tuple(R, p(x, y)) is in the interiorof the coordination capacity regionCp0 , we are assured theexistence of a coordination code for anyǫ > 0 for which

Pr(‖PXn,Y n − p‖TV > ǫ) < ǫ.

Therefore, with probability greater that1− ǫ,

EPXn,Y nd(X,Y ) ≤ Epd(X,Y ) + ǫdmax.

Recalling (15) yields,

Ed(n)(xn, yn) ≤ Epd(X,Y ) + 2ǫdmax.

As expected, a sequence of(2nR, n) coordination codesthat achieves empirical coordination for the joint distributionp(x, y) also achieves the point in the rate-distortion regionwith the same rate and with distortion valueEpd(X,Y ).

Page 24: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

24

2) Distortion implies coordination (Dp0 ⊂ ACp0 ): Sup-pose that a(2nR, n) rate-distortion codes achieves distortionEd(n)(Xn, Y n) ≤ D. Substituting from (15),

E[

EPXn,Y nd(X,Y )]

≤ D.

However,

E[

EPXn,Y nd(X,Y )]

= EEPXn,Y nd(X,Y )

by linearity.We can achieve the rate-coordination pair(R,EPXn,Y n) by

augmenting the rate-distortion code. If we repeat the use ofthe rate-distortion code overk blocks of lengthn each, thenwe induce a joint distribution on(Xkn, Y kn) that consists ofi.i.d. sub-blocks(Xn, Y n), ..., (Xkn

kn−n+1, Yknkn−n+1) denoted

as (X(1)n, Y (1)n), ..., (X(k)n, Y (k)n).By the weak law of large number,

PXkn,Y kn =1

k

k∑

i=1

PX(i)n,Y (i)n

−→ EPXn,Y n in probability.

Point-wise convergence in probability implies that ask grows∥

∥PXkn,Y kn −EPXn,Y n

TV−→ 0 in probability.

Thus, for any point(R,D) in the rate-distortion regionwe have identified an associated point(R,EPXn,Y n) in thecoordination-capacity region. Indeed, the rate-distortion regionis a linear projection of the coordination-capacity region.

VIII. R EMARKS

Rather than inquire about the possibility of moving datain a network, we have asked for the set of all achievablejoint distribution on actions at the nodes. For some three-node networks we have fully characterized the answer to thisquestion, while for others we have established bounds.

Some of the results discussed in this work extend nicelyto larger networks. Consider for example an extended cascadenetwork shown in Figure 23, whereX is given randomly bynature andY1 throughYk−1 are actions based on a cascadeof communication. Just as in the cascade network of SectionIII-C, we can achieve ratesRi ≥ I(X ;Yi, ..., Yk) for empiricalcoordination by sending messages to the last nodes in the chainfirst and conditioning later messages on earlier ones. Theserates meet the cut-set bound. We now can make an interestingobservation about assigning unique tasks to nodes in such anetwork. Supposek tasks are to be completed by thek nodesin this cascade network, one at each node. Node X is assigneda task randomly, and the communication in the network is usedto assign a permutation of all the tasks to the nodes in thenetwork. The necessary rates in the network areRi ≥ log(ki ).The sum of all the rates in the network, for largek, is thenapproximatelyRtotal ≥ k nats, wherek is the number of tasksand nodes in the network.

Now consider the same task assignment scenario for anextended broadcast network shown in Figure 24. Here againXis given randomly by nature, butY1 throughYk−1 are actionsbased on individual messages sent to each of the nodes. Again,

PSfrag replacements

R1 R2 Rk−1

X ∼ p0(x)

Y1 Y2 Yk−2 Yk−1

Fig. 23. Extended cascade network.This is an extension of the cascadenetwork of Section III-C. ActionX is given randomly by nature according top0(x), and a cascade of communication is used to produce actionsY1 throughYk−1. The coordination capacity region contains all rate-coordination tuplesthat satisfyRi ≥ I(X;Yi, ..., Yk) for all i. In particular, the sum rate neededto assign a permutation ofk tasks to thek nodes grows linearly with thenumber of nodes.

we want to assign a permutation of all thek tasks to all of thek nodes. We can use ideas from the broadcast network resultsof Section IV-A. For example, let us assign default tasks to thenodes so thatY1 = 1, ..., Yk−1 = k− 1 unless told otherwise.Now the communication is simply used to tell each node whenit must choose taskk rather than the default task, which willhappen about one time out ofk. The rates needed for thisscheme areRi ≥ H(1/k), whereH is the binary entropyfunction. For largek, the sum of all the rates in the networkis approximatelyRtotal ≥ ln k + 1 nats. The cut-set boundgives us a lower bound on the sum rate ofRtotal ≥ ln k nats.Therefore, we can conclude that the optimal sum rate scaleswith the logarithm of the number of nodes in the network.

PSfrag replacements

X ∼ p0(x) Y1

Y2

Y3

Y4

Yk−2

Yk−1

Fig. 24. Extended broadcast network.This is an extension of the broadcastnetwork of Section IV-A. ActionX is given randomly by nature according top0(x), and each peripheral node produces an actionYi based on an individualmessage at rateRi. Bounds on the coordination capacity region show thatthe sum rate needed to assign a permutation ofk tasks to thek nodes growslogarithmically with the number of nodes.

Even without explicitly knowing the coordination capacityregion for the broadcast network, we are able to use boundsto establish the scaling laws for the total rate needed to assigntasks uniquely, and we can compare the efficiency of thebroadcast network (logarithmic in the network size) with thatof the cascade network (linear in the network size) for thiskind of coordination.

We would also like to understand the coordination capacityregion for a noisy network. For example, the communicationcapacity region for the broadcast channelp(y1, y2|x) of Figure25 has undergone serious investigation. The standard questionis, how many bits of independent information can be com-

Page 25: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

25

municated fromX to Y1 and fromX to Y2. We know theanswer if the broadcast channel is degraded; that is, ifY2

can be viewed as a noisy version ofY1. We also know theanswer if the channel can be separated into two orthogonalchannels or is deterministic. But what if instead we are tryingto coordinate actions via the broadcast channel, similar tothebroadcast network of Section IV-A? Now we care about thedependence betweenY1 and Y2. The broadcast channel willimpose a natural dependence between the channel outputsY1 and Y2 that we abolish if we try to send independentinformation to the two nodes. After all, the communicationcapacity region for the broadcast channel depends only on themarginalsp(y1|x) andp(y2|x). Here we are wasting a valuableresource—the natural conditional dependence betweenY1 andY2 given X.

PSfrag replacements

X

Y2

Y1

X

Y2

Y1

Encoder

Decoder

DecoderChannel

p(y1, y2|x)p0(x)

Fig. 25. Broadcast channel.When a noisy channel is used to coordinatejoint actions(X, Y1, Y2), what is the resulting coordination capacity region?The broadcast network of Section IV-A is a noiseless specialcase.

Again, we are enlarging the focus from communication ofindependent information to the creation of coordinated actions.This larger question may force a simpler solution and illu-minate the problem of independent information (the standardchannel capacity formulation) as a special case. Presumably,information is being communicated for a reason—so futurecooperative behavior can be achieved.

IX. F INAL REMARKS

At first it seems that the nodes in a network can cooperatearbitrarily without communication. Prior arrangement achievesthat. Also common randomness achieves it.

But the problem changes dramatically when some of thenodes take actions specified by nature. Now some communi-cation to the remaining nodes becomes necessary to establishthe desired dependence.

We have established the rate-dependence tradeoff for cas-cade networks and isolated node networks found in Section III.The broadcast network of Figure 11 remains elusive, perhapsfor the same reason that the broadcast channel is difficult.

REFERENCES

[1] R. Ahlswede, N. Cai, S.-Y. Li, and R. Yeung. Network information flow.IEEE Trans. on Info. Theory, 46(4):1204–1216, July 2000.

[2] J. Tsitsiklis, D. Bertsekas, and M. Athans. Distributedasynchronousdeterministic and stochastic gradient optimization algorithms. IEEETrans. on Automatic Control, 31(9):803–812, Sept. 1986.

[3] L. Xiao, S. Boyd, and S.-J. Kim. Distributed average consensuswith least-mean-square deviation.Journal of Parallel and DistributedComputing, 67(1):33–46, Jan. 2007.

[4] B. Bollobas. The Art of Mathematics: Coffee Time in Memphis.Cambridge University Press, 2006.

[5] A. Yao. Some complexity questions related to distributive comput-ing(preliminary report). InACM Symposium on Theory of Computing,pages 209–213, 1979.

[6] A. Orlitsky and A. El Gamal. Average and randomized communicationcomplexity. IEEE Trans. on Info. Theory, 36(1):3–16, Jan. 1990.

[7] O. Ayaso, D. Shah, and M. Dahleh. Distributed computation under bitconstraints. InIEEE Conference on Decision and Control, pages 4837–4842, Dec. 2008.

[8] T. Cover and H. Permuter. Capacity of coordinated actions. In IEEEInternational Symp. on Info. Theory, Nice, 2007.

[9] A. Wyner. The common information of two dependent randomvariables.IEEE Trans. on Info. Theory, 21(2):163–179, March 1975.

[10] V. Anantharam and V. Borkar. Common randomness and distributedcontrol; a counterexample.Systems and Control Letters, 56:568–572,2007.

[11] H. Barnum, C. Caves, C. Fuchs, R. Jozsa, and B. Schumacher. Onquantum coding for ensembles of mixed states.Journal of Physics A:Mathematical and General, 34:6767–6785, 2001.

[12] G. Kramer and S. Savari. Communicating probability distributions.IEEETrans. on Info. Theory, 53(2):518–525, Feb. 2007.

[13] T. Weissman and E. Ordentlich. The empirical distribution of rate-constrained source codes.IEEE Trans. on Info. Theory, 51(11):3718–3733, Nov. 2005.

[14] T. Han and S. Verdu. Approximation theory of output statistics. IEEETrans. on Info. Theory, 39(3):752–772, May 1993.

[15] C. Bennett, P. Shor, J. Smolin, and A. Thapliyal. Entanglement-assistedcapacity of a quantum channel and the reverse shannon theorem. IEEETrans. on Info. Theory, 48(10):2637–2655, Oct. 2002.

[16] C. Shannon. Coding theorems for a discrete source with fidelity criterion.In R. Machol, editor,Information and Decision Processes, pages 93–126. 1960.

[17] H. Yamamoto. Source coding theory for cascade and branchingcommunication systems.IEEE Trans. on Info. Theory, 27(3):299–308,May 1981.

[18] A. Kaspi and T. Berger. Rate-distortion for correlatedsources withpartially separated encoders.IEEE Trans. on Info. Theory, 28:828–840,Nov. 1982.

[19] J. Barros and S. Servetto. A note on cooperative multiterminal sourcecoding. In Conference on Information Sciences and Systems, March2004.

[20] H. Yamamoto. Source coding theory for a triangular communicationsystem.IEEE Trans. on Info. Theory, 42(3):848–853, May 1996.

[21] J. Wolf, A. Wyner, and J. Ziv. Source coding for multipledescriptions.Bell System Technical Journal, 59:1417–1426, Oct. 1980.

[22] Z. Zhang and T. Berger. New results in binary multiple descriptions.IEEE Trans. on Info. Theory, 33:502–521, July 1987.

[23] T. Berger. Multiterminal source coding. In G. Longo, editor, InformationTheory Approach to Communications, pages 171–231. CISM Course andLecture, 1978.

[24] D. Vasudevan, C. Tian, and S. Diggavi. Lossy source coding fora cascade communication system with side-informations. InAllertonConference on Communication, Control, and Computing, Sep. 2006.

[25] W. Gu and M. Effros. On multi-resolution coding and a two-hopnetwork. InData Compression Conference, 2006.

[26] M. Bakshi, M. Effros, W. Gu, and R. Koetter. On network codingof independent and dependent sources in line networks. InIEEEInternational Symp. on Info. Theory, Nice, 2007.

[27] A. Wyner and J. Ziv. The rate-distortion function for source codingwith side information at the decoder.IEEE Trans. on Info. Theory,22(1):1–10, Jan. 1976.

[28] P. Cuff, H. Su, and A. El Gamal. Cascade multiterminal source coding.In IEEE International Symp. on Info. Theory, Seoul, 2009.

[29] A. Orlitsky and J. Roche. Coding for computing.IEEE Trans. on Info.Theory, 47(3):903–917, March 2001.

[30] P. Cuff. Communication requirements for generating correlated randomvariables. InIEEE International Symp. on Info. Theory, pages 1393–1397, Toronto, 2008.

[31] Y. Steinberg and S. Verdu. Simulation of random processes and rate-distortion theory.IEEE Trans. on Info. Theory, 42(1):63–86, Jan. 1996.

[32] I. Devetak, A. Harrow, P. Shor, A. Winter, and C. Ben-nett. Quantum reverse shannon theorem. Presentation:http://www.research.ibm.com/people/b/bennetc/QRSTonlineVersion.pdf,2007.

[33] T. Cover and J. Thomas.Elements of Information Theory. Wiley, NewYork, 2nd edition, 2006.

[34] A. Wyner. On source coding with side-information at thedecoder.IEEETrans. on Info. Theory, 21(3):294–300, May 1975.

[35] I. Csiszar and J. Korner.Information Theory: Coding Theorems forDiscrete Memoryless Systems. Academic, New York, 1981.

Page 26: Coordination Capacity - arXivIndex Terms—Common randomness, cooperation capacity, co-ordination capacity, network dependence, rate distortion, source coding, strong Markov lemma,

26

Paul Cuff (S’08-M’10) received the B.S. degree in electrical engineering fromBrigham Young University in 2004 and the M.S and Ph.D. degrees in electricalengineering from Stanford University in 2006 and 2009. He was awarded theISIT 2008 Student Paper Award for his work titled ”Communication Require-ments for Generating Correlated Random Variables” and was arecipient ofthe National Defense Science and Engineering Graduate Fellowship and theNumerical Technologies Fellowship.

Dr. Cuff is an Assistant Professor of Electrical Engineering at PrincetonUniversity.

Haim Permuter (M’08) received his B.Sc. (summa cum laude) and M.Sc.(summa cum laude) degree in Electrical and Computer Engineering fromthe Ben-Gurion University, Israel, in 1997 and 2003, respectively, and Ph.D.degrees in Electrical Engineering from Stanford University, California in 2008.Between 1997 and 2004, he was an officer at a research and developmentunit of the Israeli Defense Forces. He is currently a Lecturer at Ben-Gurion university. He is a recipient of the Fullbright Fellowship, the StanfordGraduate Fellowship (SGF), Allon Fellowship, and Bergman award.

Thomas M. Cover, the K.T. Li Professor of Electrical Engineering andProfessor of Statistics at Stanford, does research in information theory,communication theory and statistics, and is the coauthor ofthe textbook,Elements of Information Theory. He was Lab Director of the InformationSystems Laboratory in Electrical Engineering from 1989 to 1996. He hasbeen the contract statistician for the California State Lottery and a consultantto AT&T Laboratories and IBM. He received the 1990 Claude E. ShannonAward in information theory and has also received the IEEE Neural NetworkCouncils Pioneer Award in 1993 for his work on the capacity ofneural nets.He received the 1997 IEEE Richard M. Hamming medal for contributionsto information, communication theory and statistics and isa member of theNational Academy of Engineering and the American Academy ofArts andSciences. He is currently working on network information theory and theinterplay between information theory and investment.