g method in action: from exact sampling to approximate...

40
G G - G 1 G G

Upload: others

Post on 14-Mar-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

G METHOD IN ACTION: FROM EXACT SAMPLING

TO APPROXIMATE ONE

UDREA P�AUN

Communicated by Marius Iosifescu

The main contribution of this work is the uni�cation, by G method � usingMarkov chains, therefore, a Markovian uni�cation �, in the �nite case, of �ve(even six) sampling methods from exact and approximate sampling theory. Thisuni�cation is in conjunction with a main problem, our problem of interest: �n-ding the fastest Markov chains in sampling theory based on Metropolis-Hastingschains and their derivatives. We show, in the �nite case, that the cyclic Gibbssampler � the Gibbs sampler for short � belongs to our collection of hybridMetropolis-Hastings chains from [U. P�aun, A hybrid Metropolis-Hastings chain,Rev. Roumaine Math. Pures Appl. 56 (2011), 207−228]. So, we obtain, forany type of Gibbs sampler (cyclic, random, etc.), in the �nite case, the structureof matrices corresponding to the coordinate updates. Concerning our hybridMetropolis-Hastings chains, to do uni�cations, and, as a result of these, to docomparisons and improvements, we construct, by G method, a chain, which wecall the reference chain. This is a very fast chain because it attains its stationa-rity at time 1. Moreover, the reference chain is the best one we can construct(concerning our hybrid chains), see Uniqueness Theorem. The reference chain isconstructed such that it can do what a exact sampling method, which we calledthe reference method, does. This method contains, as special cases, the aliasmethod and swapping method. Coming back to the reference chain, this is so-metimes identical with the Gibbs sampler or with a special Gibbs sampler withgrouped coordinates � we illustrate this case for two classes of wavy probabilitydistributions. As a result of these facts, we give a method for generating, exactly(not approximately), a random variable with geometric distribution. Finally, westate the fundamental idea on the speed of convergence:

the nearer our hybrid chain is of its reference chain,

the faster our hybrid chain is.

The addendum shows that we are on the right track.

AMS 2010 Subject Classi�cation: 60J10, 65C05, 65C20, 68U20.

Key words: G method, uni�cation, exact sampling, alias method, swapping met-hod, reference method, reference chain, approximate sampling, hy-brid Metropolis-Hastings chain, optimal hybrid Metropolis chain,Gibbs sampler, wavy probability distribution, uniqueness theorem,fundamental idea on the speed of convergence, G comparison.

REV. ROUMAINE MATH. PURES APPL. 62 (2017), 3, 413�452

414 Udrea P�aun 2

1. SOME BASIC THINGS

In this section, we present some basic things (notation, notions, and re-sults) from [17�18] with completions. [17] refers, especially, to very fast Markovchains (i.e., Markov chains which converge very fast) while in [18], based on[17] (Example 2.11 was the starting point), is constructed a collection of hybridMetropolis-Hastings chains. This collection of Markov chains has something incommon � this is interesting! � with some exact sampling methods (see thenext sections), one of these methods, the swapping method, suggesting, in fact,the construction of this collection (in Example 2.11 from [17] mentioned above,it is a chain which can do what the swapping method does). These commonthings could help us to �nd fast approximate sampling methods or, at best,fast exact sampling methods (see the fundamental idea on the speed of conver-gence, etc.). This way based on exact sampling methods, to �nd e�cient exactor approximate sampling methods, is our way � the best way. On the otherhand, the exact sampling methods are important sources to obtain very fastMarkov chains.

SetPar (E) = {∆ | ∆ is a partition of E } ,

where E is a nonempty set. We shall agree that the partitions do not containthe empty set.

De�nition 1.1. Let ∆1,∆2 ∈Par(E) . We say that ∆1 is �ner than ∆2 if∀V ∈ ∆1, ∃W ∈ ∆2 such that V ⊆W.

Write ∆1 � ∆2 when ∆1 is �ner than ∆2.

In this article, a vector is a row vector and a stochastic matrix is a rowstochastic matrix.

The entry (i, j) of a matrix Z will be denoted Zij or, if confusion canarise, Zi→j .

Set

〈m〉 = {1, 2, ...,m} (m ≥ 1),

〈〈m〉〉 = {0, 1, ...,m} (m ≥ 0),

Nm,n = {P | P is a nonnegative m× n matrix} ,Sm,n = {P | P is a stochastic m× n matrix} ,

Nn = Nn,n,

Sn = Sn,n.

Let P = (Pij) ∈ Nm,n. Let ∅ 6= U ⊆ 〈m〉 and ∅ 6= V ⊆ 〈n〉. Set thematrices

PU = (Pij)i∈U,j∈〈n〉 , PV = (Pij)i∈〈m〉,j∈V , and P

VU = (Pij)i∈U,j∈V

3 G method in action: from exact sampling to approximate one 415

(e.g., if

P =

(1 1 22 3 4

),

then, e.g.,

P{2} =(

2 3 4), P {3} =

(24

), and P

{1}{1} = (1) ).

Set

({i})i∈{s1,s2,...,st} = ({s1} , {s2} , ..., {st}) ;

({i})i∈{s1,s2,...,st} ∈ Par ({s1, s2, ..., st}) .

E.g.,({i})i∈〈n〉 = ({1} , {2} , ..., {n}) .

De�nition 1.2. Let P ∈ Nm,n. We say that P is a generalized stochastic

matrix if ∃a ≥ 0, ∃Q ∈ Sm,n such that P = aQ.

De�nition 1.3 ([17]). Let P ∈ Nm,n. Let ∆ ∈Par(〈m〉) and Σ ∈Par(〈n〉).We say that P is a [∆]-stable matrix on Σ if PLK is a generalized stochasticmatrix, ∀K ∈ ∆,∀L ∈ Σ. In particular, a [∆]-stable matrix on ({i})i∈〈n〉 iscalled [∆]-stable for short.

De�nition 1.4 ([17]). Let P ∈ Nm,n. Let ∆ ∈Par(〈m〉) and Σ ∈Par(〈n〉).We say that P is a ∆-stable matrix on Σ if ∆ is the least �ne partition for whichP is a [∆]-stable matrix on Σ. In particular, a ∆-stable matrix on ({i})i∈〈n〉 iscalled ∆-stable while a (〈m〉)-stable matrix on Σ is called stable on Σ for short.A stable matrix on ({i})i∈〈n〉 is called stable for short.

For interesting examples of ∆-stable matrices on Σ for some ∆ and Σ, seeSections 2 and 3.

Let ∆1 ∈Par(〈m〉) and ∆2 ∈Par(〈n〉). Set (see [17] for G∆1,∆2 and [18]

for_

G∆1,∆2)

G∆1,∆2 = {P | P ∈ Sm,n and P is a [∆1] -stable matrix on ∆2 }

and_

G∆1,∆2 = {P | P ∈ Nm,n and P is a [∆1] -stable matrix on ∆2 } .

When we study or even when we construct products of nonnegative ma-

trices (in particular, products of stochastic matrices) using G∆1,∆2 or_

G∆1,∆2

we shall refer this as the G method.Let P ∈

_

G∆1,∆2 . Let K ∈ ∆1 and L ∈ ∆2. Then ∃aK,L ≥ 0, ∃QK,L ∈S|K|,|L| such that PLK = aK,LQK,L. Set

P−+ =(P−+KL

)K∈∆1,L∈∆2

, P−+KL = aK,L, ∀K ∈ ∆1, ∀L ∈ ∆2

416 Udrea P�aun 4

(P−+KL ,K ∈ ∆1, L ∈ ∆2, are the entries of matrix P−+). If confusion can arise,

we write P−+(∆1,∆2) instead of P−+. In this article, when we work with theoperator (·)−+ = (·)−+ (∆1,∆2), we suppose, for labeling the rows and columnsof matrices, that ∆1 and ∆2 are ordered sets (i.e., these are sets where the orderin which we write their elements counts), even if we omit to specify this. E.g.,let

P =

110

310

410

210

210

210

310

310

110

110

110

710

.

P ∈ G∆1,∆2 , where ∆1 = ({1, 2} , {3}) and ∆2 = ({1, 2} , {3, 4}) . Further, wehave

P−+ = P−+(∆1,∆2) =

(410

610

210

810

).

({1, 2} and {3} are the �rst and the second element of ∆1, respectively; basedon this order, the �rst and the second row of P−+ are labeled {1, 2} and {3},respectively. The columns of P−+ are labeled similarly.)

Below we give a basic result.

Theorem 1.5 ([18]). Let P ∈_

G∆1,∆2 ⊆ Nm,n and Q ∈_

G∆2,∆3 ⊆ Nn,p.Then

(i) PQ ∈_

G∆1,∆3 ⊆ Nm,p;(ii) (PQ)−+ = P−+Q−+.

Proof. See [18]. �

In this article, the transpose of a vector x is denoted x′. Set e = e (n) =(1, 1, ..., 1) ∈ Rn,∀n ≥ 1.

Below we give an important result.

Theorem 1.6 ([18]). Let P1 ∈_

G(〈m1〉),∆2⊆ Nm1,m2 , P2 ∈

_

G∆2,∆3 ⊆Nm2,m3 , ..., Pn−1 ∈

_

G∆n−1,∆n ⊆ Nmn−1,mn , Pn ∈_

G∆n,({i})i∈〈mn+1〉⊆ Nmn,mn+1 .

Then

(i) P1P2...Pn is a stable matrix ;(ii) (P1P2...Pn){i} = P−+

1 P−+2 ...P−+

n , ∀i ∈ 〈m1〉 ((P1P2...Pn){i} is the row

i of P1P2...Pn); therefore, P1P2...Pn = e′π, where π = P−+1 P−+

2 ...P−+n .

Proof. See [18]. �

Remark 1.7. Under the assumptions of Theorem 1.6, but taking P1 ∈Sm1,m2 , P2 ∈ Sm2,m3 , ..., Pn−1 ∈ Smn−1,mn , Pn ∈ Smn,mn+1 , we have

pP1P2...Pn = π

5 G method in action: from exact sampling to approximate one 417

for any probability distribution p on 〈m1〉 . Consequently, Theorem 1.6 couldbe used to prove that certain Markov chains have �nite convergence time (see[17] and Sections 2 and 3 for some examples).

Let P ∈ Nm,n. Set

α (P ) = min1≤i,j≤m

n∑k=1

min (Pik, Pjk)

and_

α (P ) =1

2max

1≤i,j≤m

n∑k=1

|Pik − Pjk| .

If P ∈ Sm,n, then α (P ) is called the Dobrushin ergodicity coe�cient of P ([5];see, e.g., also [13, p. 56]).

Theorem 1.8. (i)_

α (P ) = 1− α (P ) , ∀P ∈ Sm,n.(ii) ‖µP − νP‖1 ≤ ‖µ− ν‖1

_

α (P ) , ∀µ, ν, µ and ν are probability distribu-

tions on 〈m〉 ,∀P ∈ Sm,n.(iii)

_

α (PQ) ≤_

α (P )_

α (Q) ,∀P ∈ Sm,n,∀Q ∈ Sn,p.Proof. (i) See, e.g., [13, p. 57] or [14, p. 144].(ii) See, e.g., [5] or [14, p. 147].(iii) See, e.g., [5], or [13, pp. 58�59], or [14, p. 145]. �

Theorem 1.6 (see also Remark 1.7) could be used, e.g., in exact samplingtheory based on �nite Markov chains (see Section 3; see also Section 2) whilethe next result could be used, e.g., in approximate sampling theory based on�nite Markov chains (see Section 4).

Theorem 1.9 ([18]). Let P1 ∈ Nm1,m2 , P2 ∈ Nm2,m3 , ..., Pn ∈ Nmn,mn+1 .Let ∆1 = (〈m1〉) ,∆2 ∈Par(〈m2〉) , ...,∆n ∈Par(〈mn〉) ,∆n+1 = ({i})i∈〈mn+1〉 .Consider the matrices Ll = ((Ll)VW )V ∈∆l,W∈∆l+1

((Ll)VW is the entry (V,W )

of matrix Ll), where

(Ll)VW = mini∈V

∑j∈W

(Pl)ij , ∀l ∈ 〈n〉 , ∀V ∈ ∆l,∀W ∈ ∆l+1.

Thenα (P1P2...Pn) ≥

∑K∈∆n+1

(L1L2...Ln)〈m1〉K .

(Since L1L2...Ln is a 1 × |〈mn+1〉| matrix, it can be thought of as a row vec-

tor, but above we used and below we shall use, if necessary, the matrix no-

tation for its entries instead of the vector one. Above the matrix notation

(L1L2...Ln)〈m1〉K was used instead of the vector one (L1L2...Ln)K because, in

this article, the notation AU , where A ∈ Np,q and ∅ 6= U ⊆ 〈p〉 , means so-

mething di�erent.)

418 Udrea P�aun 6

Proof. See [18]. (Theorem 1.9 is part of Theorem 1.8 from [18].) �

De�nition 1.10 (see, e.g., [20, p. 80]). Let P ∈ Nm,n. We say that P is arow-allowable matrix if it has at least one positive entry in each row.

Let P ∈ Nm,n. Set

_

P =(_

P ij

)∈ Nm,n,

_

P ij =

{1 if Pij > 0,0 if Pij = 0,

∀i ∈ 〈m〉 ,∀j ∈ 〈n〉 .We call_

P the incidence matrix of P (see, e.g., [13, p. 222]).In this article, some statements on the matrices hold, obviously, eventually

by permutation of rows and columns. For simpli�cation, further, we omit tospecify this fact.

Warning! In this article, if a Markov chain has the transition matrixP = P1P2...Ps, where s ≥ 1 and P1, P2, ..., Ps are stochastic matrices, thenany 1-step transition of this chain is performed via P1, P2, ..., Ps, i.e., doing stransitions: one using P1, one using P2, ..., one using Ps. (See also Section 2.)

Let S = 〈r〉. Let π = (πi)i∈S = (π1, π2, ..., πr) be a positive probabilitydistribution on S. One way to sample approximately or, at best, exactly fromS when r ≥ 2 is by means of the hybrid Metropolis-Hastings chain from [18].Below we de�ne this chain.

Let E be a nonempty set. Set ∆ � ∆′ if ∆′ � ∆ and ∆′ 6= ∆, where∆,∆′ ∈Par(E) .

Let ∆1,∆2, ...,∆t+1 ∈Par(S) with ∆1 = (S) � ∆2 � ... � ∆t+1 =({i})i∈S , where t ≥ 1. Let Q1, Q2, ..., Qt ∈ Sr such that

(C1)_

Q1,_

Q2, ...,_

Qt are symmetric matrices;(C2) (Ql)

LK = 0,∀l ∈ 〈t〉 − {1} , ∀K,L ∈ ∆l,K 6= L (this assumption

implies that Ql is a block diagonal matrix and ∆l-stable matrix on ∆l, ∀l ∈〈t〉 − {1});

(C3) (Ql)UK is a row-allowable matrix, ∀l ∈ 〈t〉 ,∀K ∈ ∆l, ∀U ∈ ∆l+1,

U ⊆ K.Although Ql, l ∈ 〈t〉, are not irreducible matrices if l ≥ 2, we de�ne the

matrices Pl, l ∈ 〈t〉, as in the Metropolis-Hastings case ([16] and [12]; see, e.g.,also [7, pp. 233�236], [9, Chapter 6], [11, pp. 5�12], [15, pp. 63�66], and [21,Chapter 10]), namely,

Pl =(

(Pl)ij

)∈ Sr,

(Pl)ij =

0 if j 6= i and (Ql)ij = 0,

(Ql)ij min(

1,πj(Ql)jiπi(Ql)ij

)if j 6= i and (Ql)ij > 0,

1−∑k 6=i

(Pl)ik if j = i,

7 G method in action: from exact sampling to approximate one 419

∀l ∈ 〈t〉 . Set P = P1P2...Pt.

Theorem 1.11 ([18]). Concerning P above we have πP = π and P > 0.

Proof. See [18]. �

By Theorem 1.11, Pn → e′π as n → ∞. We call the Markov chain withtransition matrix P the hybrid Metropolis-Hastings chain. In particular, wecall this chain the hybrid Metropolis chain when Q1, Q2, ..., Qt are symmetricmatrices.

We call the conditions (C1)�(C3) the basic conditions of hybrid Metropolis-

Hastings chain. In particular, we call these conditions the basic conditions of

hybrid Metropolis chain when Q1, Q2, ..., Qt are symmetric matrices.

The basic conditions (C1)�(C3) and other conditions, which we call thespecial conditions, determine special hybrid Metropolis-Hastings chains. E.g.,in [18] was considered the next special hybrid Metropolis chain.

Supposing that ∆l =(K

(l)1 ,K

(l)2 , ...,K

(l)ul

),∀l ∈ 〈t+ 1〉 , this chain satis-

�es the conditions (C1)�(C3) and, moreover, the conditions:

(c1)∣∣∣K(l)

1

∣∣∣ =∣∣∣K(l)

2

∣∣∣ = ... =∣∣∣K(l)

ul

∣∣∣ , ∀l ∈ 〈t+ 1〉 with ul ≥ 2;

(c2) r = r1r2...rt with r1r2...rl = |∆l+1| , ∀l ∈ 〈t− 1〉 , and rt =∣∣∣K(t)

1

∣∣∣(this condition is compatible with ∆1 � ∆2 � ... � ∆t+1);

(c3) (c3.1) Ql is a symmetric matrix such that (c3.2) (Ql)ii > 0,∀i ∈ S,and (Ql)i1j1 = (Ql)i2j2 , ∀i1, i2, j1, j2 ∈ S with i1 6= j1, i2 6= j2, and (Ql)i1j1 ,(Ql)i2j2 > 0, ∀l ∈ 〈t〉 ((c3.2) says that all the positive entries of Ql, exceptingthe entries (Ql)ii , ∀i ∈ S, are equal, ∀l ∈ 〈t〉);

(c4) (Ql)UK has in each row just one positive entry, ∀l ∈ 〈t〉 ,∀K ∈ ∆l,∀U ∈

∆l+1 with U ⊆ K (this condition is compatible with (c3.1) because (Ql)WV is a

square matrix, ∀l ∈ 〈t〉 , ∀V,W ∈ ∆l+1).

The condition (c1) is super�uous because it follows from (C1) and (c4).(c2) is also super�uous because it follows from (c1) and ∆1 � ∆2 � ... � ∆t+1.

It is interesting to note that the matrices P1, P2, ..., Pt satisfy conditionssimilar to (C1)�(C3) and, for this special chain, moreover, (c4) � simply wereplace Ql with Pl, ∀l ∈ 〈t〉 , in (C1)�(C3) and, if need be, in (c4). (c1)�(c2)are common conditions for Q1, Q2, ..., Qt and P1, P2, ..., Pt.

In [18], for the chain satisfying the conditions (C1)�(C3) and (c1)�(c4),the positive entries of matrices Ql, l ∈ 〈t〉 , were, taking Theorem 1.9 intoaccount, optimally chosen, i.e., these were chosen such that the lower bound ofα (P1P2...Pt) from Theorem 1.9 be as large as possible (we need this conditionto obtain a chain with a speed of convergence as large as possible). More

420 Udrea P�aun 8

precisely, setting

fl = mini,j∈S,(Ql)ij>0

πjπi

(do not forget the condition (Ql)ij > 0!) and

xl = (Ql)ij ,

where i, j ∈ S are �xed such that i 6= j and (Ql)ij > 0 (see (c3) again), it wasfound (taking Theorem 1.9 into account)

xl =1

fl + rl − 1.

We call this chain the optimal hybrid Metropolis chain with respect to the con-

ditions (C1)�(C3) and (c1)�(c4) and the inequality from Theorem 1.9 � we callit the optimal hybrid Metropolis chain for short.

In Section 3, we show that the Gibbs sampler on 〈〈h〉〉n , h, n ≥ 1 (moregenerally, on 〈〈h1〉〉 × 〈〈h2〉〉 × ...× 〈〈hn〉〉 , h1, h2, ..., hn, n ≥ 1) belongs to ourcollection of hybrid Metropolis-Hastings chains. Moreover, we shall show thatthe Gibbs sampler on 〈〈h〉〉n satis�es all the conditions (c1)�(c4), excepting (c3).

As to the estimate of pn − π (pn and π are de�ned below), we have thenext result.

Theorem 1.12 (see, e.g., [18]). Let P ∈ Sr be an aperiodic irreducible

matrix. Consider a Markov chain with transition matrix P and limit probability

distribution π. Let pn be the probability distribution of chain at time n,∀n ≥ 0.Then

‖pn − π‖1 ≤ 2_

α (Pn) , ∀n ≥ 0

(P 0 = Ir; by Theorem 1.8(iii),

2_

α (Pn) ≤ 2(_

α (P ))n,∀n ≥ 0,

2_

α (Pn) ≤ 2(_

α(P k))bnk c

,∀n ≥ 1,∀k ∈ 〈n〉

(bxc = max {b | b ∈ Z, b ≤ x} , ∀x ∈ R), etc.).

Proof. See, e.g., [18] (it is used Theorem 1.8(ii) for the proof). �

2. EXACT SAMPLING

In this section, we consider a similarity relation. This has some interestingproperties. Then we consider two methods of generation of the random variablesexactly in the �nite case only. The �rst one, the alias method, is a special caseof the second one. For each of these methods, we associate a Markov chain

9 G method in action: from exact sampling to approximate one 421

such that this chain can do what the method does. These associate chainsare important for our uni�cation. Finally, we associate a hybrid chain with areference chain.

De�nition 2.1. Let P,Q ∈_

G∆1,∆2 ⊆ Nm,n. We say that P is similar to Qif P−+ = Q−+.

Set P v Q when P is similar to Q. Obviously, v is an equivalence relation

on_

G∆1,∆2 .

Theorem 2.2. Let P1, U1 ∈_

G∆1,∆2 ⊆ Nm1,m2 and P2, U2 ∈_

G∆2,∆3 ⊆Nm2,m3 . Suppose that P1 v U1 and P2 v U2. Then

P1P2 v U1U2.

Proof. By Theorem 1.5 we have P1P2, U1U2 ∈_

G∆1,∆3 ⊆ Nm1,m3 . ByTheorem 1.5 and De�nition 2.1 we have

(P1P2)−+ = P−+1 P−+

2 = U−+1 U−+

2 = (U1U2)−+ .

Therefore,

P1P2 v U1U2. �

Theorem 2.3. Let P1, U1 ∈_

G∆1,∆2 ⊆ Nm1,m2 , P2, U2 ∈_

G∆2,∆3 ⊆Nm2,m3 , ..., Pn, Un ∈

_

G∆n,∆n+1 ⊆ Nmn,mn+1 . Suppose that P1 v U1, P2 v U2,..., Pn v Un. Then

P1P2...Pn v U1U2....Un.

If, moreover, ∆1 = (〈m1〉) and ∆n+1 = ({i})i∈〈mn+1〉 , then

P1P2...Pn = U1U2....Un

(therefore, when ∆1 = (〈m1〉) and ∆n+1 = ({i})i∈〈mn+1〉 , a product of n re-

presentatives, the �rst of an equivalence class included in_

G∆1,∆2 , the second

of an equivalence class included in_

G∆2,∆3 , ..., the nth of an equivalence class

included in_

G∆n,∆n+1 , does not depend on the choice of representatives).

Proof. The �rst part follows by Theorem 2.2 and induction.

As to the second part, by Theorem 1.6, P1P2...Pn and U1U2....Un arestable matrices and, further,

(P1P2...Pn){i} = P−+1 P−+

2 ...P−+n =

= U−+1 U−+

2 ...U−+n = (U1U2....Un){i} ,∀i ∈ 〈m1〉 .

Therefore,

P1P2...Pn = U1U2....Un. �

422 Udrea P�aun 10

The reader is assumed to be acquainted with the �rst method below.Recall that for each of the two methods below we associate a Markov chainsuch that this chain can do what the method does.

1. The alias method (see, e.g., [4, pp. 107�113] and [15, pp. 25�27]).To illustrate our Markovian modeling here, we consider, for simpli�cation, theexample from [15, p. 25]. Following this example, we have a random variableX with the values 1, 2, 3, 4, 5 and probabilities

π1 = 0.41, π2 = 0.27, π3 = 0.07, π4 = 0.14, π5 = 0.11,

where πi = P (X = i) , ∀i ∈ 〈5〉 . The alias method leads, following the examplefrom [15, p. 25] too, to the table (having 2 rows and 5 columns)

0.073

0.115

0.144

0.191

0.202

0.131

0.091

0.062

0.012

02

0.07, 0.13, etc. are probabilities while 1,2,3,4,5 in bold print are values ofX. In each column of the table, the sum of probabilities is equal to 0.20. Weassociate the alias method for generating X (when this method is applied tothe generation of X) with the Markov chain (Xn)n≥0 with state space

S = {(3, 1) , (1, 1) , (5, 2) , (1, 2) , (4, 3) , (2, 3) , (1, 4) , (2, 4) , (2, 5)}

� if (x1, x2) ∈ S, then x1 denotes a value of X while x2 denotes the column oftable in which the value x1 is; for x2 = 5 (column 5), we only consider the state(2, 5) because in column 5 the second probability is 0 � and transition matrixP = P1P2, where

P1 =

(3, 1)(1, 1)(5, 2)(1, 2)(4, 3)(2, 3)(1, 4)(2, 4)(2, 5)

0.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.20

(the columns are labeled similarly, i.e., (3, 1) , (1, 1) , (5, 2) , (1, 2) , (4, 3) , (2, 3) ,(1, 4) , (2, 4) , (2, 5) from left to right) and

11 G method in action: from exact sampling to approximate one 423

P2 =

(3, 1)

(1, 1)

(5, 2)

(1, 2)

(4, 3)

(2, 3)

(1, 4)

(2, 4)

(2, 5)

0.070.20

0.130.20

0.070.20

0.130.20

0.110.20

0.090.20

0.110.20

0.090.20

0.140.20

0.060.20

0.140.20

0.060.20

0.190.20

0.010.20

0.190.20

0.010.20

1

.

P1 ∈ G∆1,∆2 , P2 ∈ G∆2,∆3 , where

∆1 = (S) ,

∆2 = ({(3, 1) , (1, 1)} , {(5, 2) , (1, 2)} , {(4, 3) , (2, 3)} , {(1, 4) , (2, 4)} , {(2, 5)}) ,∆3 = ({(x, y)})(x,y)∈S .

By Theorem 1.6 it follows that P is a stable matrix and, more precisely,

P = e′ρ,

where

ρ = (0.07, 0.13, 0.11, 0.09, 0.14, 0.06, 0.19, 0.01, 0.20)

(see the table again). Recall � even if, here, P = e′ρ � that any 1-step transitionof this chain is performed via P1, P2, i.e., doing two transitions: one using P1

and the other using P2.

Passing this Markov chain from an initial state, say, (3, 1) (the state attime 0) to a state at time 1 is done using, one after the other, the probabilitydistributions (P1){(3,1)} (this is the �rst row of matrix P1) � suppose that usingthis probability distribution the chain arrives at state (i, j) � and (P2){(i,j)} .The alias method for generating X uses these probability distributions too, inthe same order, the 0′s do not count, they can be removed � e.g.,

(P1){(3,1)} = (0.20, 0, 0.20, 0, 0.20, 0, 0.20, 0, 0.20)

leads, removing the 0′s, to

(0.20, 0.20, 0.20, 0.20, 0.20) ,

424 Udrea P�aun 12

which is the probability distribution used by the alias method in its �rst step(when, obviously, this method is applied to X from here). Therefore, this chaincan do what the alias method does (we need to run this chain just one step (ortwo steps due to P1 and P2) until time 1 inclusive).

By Theorem 2.3 we can replace P1 with any matrix, U1, similar to P1 �obviously, it is more advantageous that each of the matrices

U{(3,1),(1,1)}1 , U

{(5,2),(1,2)}1 , U

{(4,3),(2,3)}1 , U

{(1,4),(2,4)}1 , U

{(2,5)}1

have in each row just one positive entry. E.g., we can take

U1 =

(3, 1)(1, 1)(5, 2)(1, 2)(4, 3)(2, 3)(1, 4)(2, 4)(2, 5)

0.20 0 0 0.20 0.20 0 0.20 0 0.200 0.20 0.20 0 0.20 0 0.20 0 0.20

0.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0 0.20 0.20 0 0.20

0 0.20 0.20 0 0.20 0 0 0.20 0.200 0.20 0 0.20 0.20 0 0.20 0 0.20

0.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0.20 0 0.20 0 0.20 0 0.200.20 0 0 0.20 0.20 0 0 0.20 0.20

and have

P = P1P2 = U1P2.

2. The reference method for our collection of hybrid chains (in particular,for the Gibbs sampler). We call it the reference method for short. Its nameas well as the name of the chain determined of this, called the reference chain(see below for this chain), were inspired by the reference point from physicsand other �elds.

The reference method is an iterative composition method � we includethe degenerate case when no composition is done (this case corresponds to thecase t = 1 of reference method). Below we present the reference method.

Let X be a random variable with positive probability distribution π =(π1, π2, ..., πr) = (πi)i∈S , where S = 〈r〉. Let ∆1,∆2, ...,∆t+1 ∈Par(S) with∆1 = (S) � ∆2 � ... � ∆t+1 = ({i})i∈S , where t ≥ 1. Set

a(l)K,L =

∑i∈L

πi∑i∈K

πi,∀l ∈ 〈t〉 , ∀K ∈ ∆l, ∀L ∈ ∆l+1 with L ⊆ K.

Obviously,

a(l)K,L = P (X ∈ L | X ∈ K ) ,∀l ∈ 〈t〉 , ∀K ∈ ∆l,∀L ∈ ∆l+1 with L ⊆ K

(a(1)S,L =

∑i∈L

πi = P (X ∈ L) , ∀L ∈ ∆2),

13 G method in action: from exact sampling to approximate one 425

and(a

(l)K,L

)L∈K∩∆l+1

is a probability distribution on

K ∩∆l+1 = {K ∩A | A ∈ ∆l+1 } =

= {B | B ∈ ∆l+1, B ⊆ K } ,∀l ∈ 〈t〉 , ∀K ∈ ∆l.

We generate random variable X as follows. (See also discrete mixtures in, e.g.,[4, p. 16], the decomposition method in, e.g., [4, p. 66], and the compositionmethod in, e.g., [4, p. 66].)

Step 1. Generate L(1) ∼(a

(1)S,L

)L∈S∩∆2

. Suppose that we obtained L(1) =

L1 (L1 ∈ S ∩∆2 = ∆2). Set K1 = L1.

Step 2. Generate L(2) ∼(a

(2)K1,L

)L∈K1∩∆3

. Suppose that we obtained

L(2) = L2 (L2 ∈ K1 ∩∆3). Set K2 = L2....Suppose that at Step t−1 we obtained L(t−1) = Lt−1 (Lt−1 ∈ Kt−2∩∆t).

Set Kt−1 = Lt−1.

Step t. Generate L(t) ∼(a

(t)Kt−1,L

)L∈Kt−1∩∆t+1

. Suppose that we obtained

L(t) = Lt (Lt ∈ Kt−1 ∩∆t+1).Since ∆t+1 = ({i})i∈S , it follows that ∃i ∈ S such that Lt = {i} . Set

X = i � this value of X is generated according to its probability distribution πbecause by general multiplicative formula (see, e.g., [13, p. 26]) we have

P (X = i) = P (X ∈ {i}) = P (X ∈ Lt) = P (X ∈ L1 ∩ L2 ∩ ... ∩ Lt) =

= a(1)S,L1

a(2)L1,L2

...a(t)Lt−1,Lt

= πi.

The reference method is very fast if, practically speaking, we know the

quantities a(l)K,L, l ∈ 〈t〉 ,K ∈ ∆l, L ∈ ∆l+1 with L ⊆ K. Unfortunately, this

does not happen in general if S is too large. But, fortunately, we can compute

all or part of the quantities a(l)K,L when K and L are small � this is an important

thing (see, e.g., in Section 3, the hybrid chains with P ∗).To connect the reference method to our collection of hybrid chains, we

associate the reference method (for generating a (�nite) random variable (whenthis method is applied to the generation of a (�nite) random variable)) witha (�nite) Markov chain. To do this, �rst, recall that the partitions for thereference method are ∆1, ∆2, ..., ∆t+1 ∈Par(S) with ∆1 = (S) � ∆2 � ... �∆t+1 = ({i})i∈S , where t ≥ 1. Let R1, R2, ..., Rt ∈ Sr such that

(A1) R1 ∈ G∆1,∆2 , R2 ∈ G∆2,∆3 , ..., Rt ∈ G∆t,∆t+1 ;

(A2) (Rl)LK = 0,∀l ∈ 〈t〉 − {1} , ∀K,L ∈ ∆l,K 6= L (this assumption

implies that Rl is a block diagonal matrix and ∆l-stable matrix on ∆l, ∀l ∈〈t〉 − {1});

426 Udrea P�aun 14

(A3) (Rl)UK has in each row just one positive entry, ∀l ∈ 〈t〉 ,∀K ∈

∆l,∀U ∈ ∆l+1, U ⊆ K (this assumption implies that (Rl)UK is a row-allowable

matrix, ∀l ∈ 〈t〉 , ∀K ∈ ∆l, ∀U ∈ ∆l+1, U ⊆ K).(A2) and (A3) are similar to (C2) and (c4) from Section 1, respecti-

vely. Therefore, the matrices P1, P2, ..., Pt of hybrid chain (obviously, we referto our hybrid Metropolis-Hastings chain) and the matrices R1, R2, ..., Rt havesome common things, respectively. This fact contributes to our uni�cation (seeSection 4).

Suppose that each positive entry of (the matrix) (Rl)UK is equal to a

(l)K,U

(by (A1) and (A3), all the positive entries of (Rl)UK are equal), ∀l ∈ 〈t〉 , ∀K ∈

∆l, ∀U ∈ ∆l+1, U ⊆ K. Set R = R1R2...Rt.The following result is a main one both for the reference chain (this is

de�ned below) and for (our) hybrid chains.

Theorem 2.4. Under the above assumptions the following statements hold.(i) RlRl+1...Rt is a block diagonal matrix, ∀l ∈ 〈t〉 − {1} , and ∆l-stable

matrix, ∀l ∈ 〈t〉 .(ii) πRlRl+1...Rt = π, ∀l ∈ 〈t〉 .(iii) R is a stable matrix.

(iv) R = e′π.

Proof. (i) The �rst part follows from (A2). Now, we show the secondpart. By Theorem 1.5(i) and (A1), RlRl+1...Rt ∈ G∆l,∆t+1 . Consequently,RlRl+1...Rt is a [∆l]-stable matrix, ∀l ∈ 〈t〉 , because ∆t+1 = ({i})i∈S (seeDe�nition 1.3).

Case 1. l = 1. Since ∆1 = (S) , it follows that R1R2...Rt is a ∆1-stable(stable for short) matrix (see De�nition 1.4).

Case 2. l ∈ 〈t〉 − {1}. By (A2) it follows that RlRl+1...Rt is a ∆l-stablematrix.

(ii) Let j ∈ S. Then ∃!U (t) ∈ ∆t such that {j} ⊆ U (t) (∃! = there exists aunique). By (A1)�(A3) we have(

R−+t

)U(t){j} > 0 and

(R−+t

)V {j} = 0, ∀V ∈ ∆t, V 6= U (t).

Further, since ∆t−1 � ∆t, ∃!U (t−1) ∈ ∆t−1 such that U (t) ⊆ U (t−1). By (A1)�(A3) we have(

R−+t−1

)U(t−1)U(t) > 0 and

(R−+t−1

)V U(t) = 0,∀V ∈ ∆t−1, V 6= U (t−1).

Proceeding in this way, we �nd a sequence U (1), U (2), ..., U (t+1) such that

U (1) ∈ ∆1, U(2) ∈ ∆2, ..., U

(t+1) ∈ ∆t+1,

{j} = U (t+1) ⊆ U (t) ⊆ ... ⊆ U (1) = S,

15 G method in action: from exact sampling to approximate one 427

and(R−+l

)U(l)U(l+1) > 0 and

(R−+l

)V U(l+1) = 0,∀l ∈ 〈t〉 , ∀V ∈ ∆l, V 6= U (l).

Let l ∈ 〈t〉. Let i ∈ U (l) (j ∈ U (l) as well). By (i) (the fact that RlRl+1...Rtis a ∆l-stable matrix), (A1)�(A2), and Theorem 1.5 we have

(RlRl+1...Rt)ij = (RlRl+1...Rt)−+U(l){j} =

(R−+l R−+

l+1...R−+t

)U(l){j} =

=(R−+l

)U(l)U(l+1)

(R−+l+1

)U(l+1)U(l+2) ...

(R−+t

)U(t){j} =

= a(l)

U(l),U(l+1)a(l+1)

U(l+1),U(l+2) ...a(t)

U(t),{j} =πj∑

k∈U(l)

πk

(it was to be expected � see (i) � that this ratio will not depend on i ∈ U (l)).Consequently,

πRlRl+1...Rt = π.

(iii) This follows by (i) or (iv).

(iv) By proof of (ii) we have Rij = πj , ∀i, j ∈ S. Therefore, R = e′π. �

We associate the reference method for generating X with the Markovchain (this depends on X too) with state space S = 〈r〉 and transition matrixR = R1R2...Rt. Recall � even if, here, by Theorem 2.4(iv), R = e′π � thatany 1-step transition of this chain is performed via R1, R2, ..., Rt, i.e., doing ttransitions: one using R1, one using R2, ..., one using Rt.

We call the above Markov chain the reference (Markov) chain. This isanother example of chain with �nite convergence time (see, e.g., also [17] forother examples of chains with �nite convergence time). The best case is when

we know the quantities a(l)K,L, l ∈ 〈t〉 ,K ∈ ∆l, L ∈ ∆l+1, L ⊆ K; we can always

know all or part of the quantities a(l)K,L when K and L are small � a happy case!

Passing the reference chain from an initial state, say, 1 (the state at time0) to a state at time 1 is done using, one after the other, the probability distri-butions (R1){1} (this is the �rst row of matrix R1), (R2){i2} , ..., (Rt){it} , whereil = the state the chain arrives using (Rl−1){il−1} , ∀l ∈ 〈t〉−{1} , setting i1 = 1.The reference method for generating X uses these probability distributions too,in the same order, the 0′s do not count, they can be removed � e.g., if

(R1){1} =

(a

(1)

S,K(2)1

, 0, 0, a(1)

S,K(2)2

),

where S = 〈4〉 , K(2)1 = 〈2〉, K(2)

2 = {3, 4}, then, removing the 0′s, we obtain(a

(1)

S,K(2)1

, a(1)

S,K(2)2

),

428 Udrea P�aun 16

which is the probability distribution used by the reference method in its �rst

step, ∆2 being, here, equal to(K

(2)1 ,K

(2)2

). Therefore, the reference chain can

do what the reference method does (we need to run this chain just one step (ort steps due to R1, R2, ..., Rt) until time 1 inclusive).

To illustrate the reference chain, we consider a random variable X withprobability distribution π = (π1, π2, ..., π8) . Taking the partitions

∆1 = (〈8〉) , ∆2 = ({1, 2, 3, 4} , {5, 6, 7, 8}) ,∆3 = ({1, 2} , {3, 4} , {5, 6} , {7, 8}) , ∆4 = ({i})i∈〈8〉 ,

a reference chain is the Markov chain with state space S = 〈8〉 and transitionmatrix R = R1R2R3, where

R1 =

a(1)

S,K(2)1

0 0 0 a(1)

S,K(2)2

0 0 0

0 a(1)

S,K(2)1

0 0 a(1)

S,K(2)2

0 0 0

0 0 a(1)

S,K(2)1

0 0 a(1)

S,K(2)2

0 0

0 0 0 a(1)

S,K(2)1

0 0 a(1)

S,K(2)2

0

a(1)

S,K(2)1

0 0 0 0 a(1)

S,K(2)2

0 0

0 0 a(1)

S,K(2)1

0 0 0 a(1)

S,K(2)2

0

0 a(1)

S,K(2)1

0 0 0 0 0 a(1)

S,K(2)2

0 a(1)

S,K(2)1

0 0 0 0 a(1)

S,K(2)2

0

,

K(2)1 = {1, 2, 3, 4} , K(2)

2 = {5, 6, 7, 8} ,

a(1)

S,K(2)1

= π1 + π2 + π3 + π4, a(1)

S,K(2)2

= π5 + π6 + π7 + π8

(R1 ∈ G∆1,∆2),

R2 =

R(2)1

R(2)2

,

R(2)1 =

0 a(2)

K(2)1 ,K

(3)1

0 a(2)

K(2)1 ,K

(3)2

a(2)

K(2)1 ,K

(3)1

0 a(2)

K(2)1 ,K

(3)2

0

a(2)

K(2)1 ,K

(3)1

0 0 a(2)

K(2)1 ,K

(3)2

0 a(2)

K(2)1 ,K

(3)1

0 a(2)

K(2)1 ,K

(3)2

,

17 G method in action: from exact sampling to approximate one 429

R(2)2 =

a(2)

K(2)2 ,K

(3)3

0 a(2)

K(2)2 ,K

(3)4

0

a(2)

K(2)2 ,K

(3)3

0 0 a(2)

K(2)2 ,K

(3)4

0 a(2)

K(2)2 ,K

(3)3

a(2)

K(2)2 ,K

(3)4

0

a(2)

K(2)2 ,K

(3)3

0 0 a(2)

K(2)2 ,K

(3)4

,

K(3)1 = {1, 2} , K(3)

2 = {3, 4} , K(3)3 = {5, 6} , K(3)

4 = {7, 8} ,

a(2)

K(2)1 ,K

(3)1

=π1 + π2

π1 + π2 + π3 + π4, a

(2)

K(2)1 ,K

(3)2

=π3 + π4

π1 + π2 + π3 + π4,

a(2)

K(2)2 ,K

(3)3

=π5 + π6

π5 + π6 + π7 + π8, a

(2)

K(2)2 ,K

(3)4

=π7 + π8

π5 + π6 + π7 + π8

(R2 ∈ G∆2,∆3 (moreover, it is a ∆2-stable matrix on ∆2, ∆2-stable matrix on∆3, and block diagonal matrix)), and

R3 =

R

(3)1

R(3)2

R(3)3

R(3)4

,

R(3)w =

a(3)

K(3)w ,K

(4)2w−1

a(3)

K(3)w ,K

(4)2w

a(3)

K(3)w ,K

(4)2w−1

a(3)

K(3)w ,K

(4)2w

, ∀w ∈ 〈4〉 ,

K(4)i = {i} , ∀i ∈ 〈8〉 ,

a(3)

K(3)w ,K

(4)2w−1

=π2w−1

π2w−1 + π2w, a

(3)

K(3)w ,K

(4)2w

=π2w

π2w−1 + π2w, ∀w ∈ 〈4〉

(R3 ∈ G∆3,∆4 (moreover, it is a ∆3-stable matrix on ∆3, ∆3-stable matrix(because ∆4 = ({i})i∈〈8〉 , see De�nition 1.4), and block diagonal matrix)). By

Theorem 2.4(iv) or direct computation we have R = e′π.Warning! In the above example, R1, R2, R3 are representatives of certain

equivalence classes: Rl, where l ∈ 〈3〉 , is a representative of the equivalence

class determined by quantities a(l)K,L, K ∈ ∆l, L ∈ ∆l+1 (the number of elements

of this class can easily be determined; e.g., R1 belongs to an equivalence class

with the cardinal equal to 416 because RK

(2)1

1 and RK

(2)2

1 are 8 × 4 matrices,...). Each triple (R1, R2, R3) of representatives determines a reference chain,all these chains having the product R1R2R3 equal to e′π (see Theorems 2.3and 2.4(iv)).

430 Udrea P�aun 18

Now, it is easy to see that the chain associated with the alias method isa special case of the reference chain. Therefore (this was to be expected), thealias method is a special case of the reference method.

Another interesting special case of the reference method (and of the refe-rence chain) is the method of uniform generation of the random permutationsof order n from, e.g., Example 2.11 in [17]. In Example 2.11 from [17], it ispresented and analyzed a Markov chain which can do what the swapping met-hod does (for the swapping method, see, e.g., [4, pp. 645�646]). When the

probability distribution of interest, π, is uniform, we have a(l)K,L = |L|

|K| , ∀l ∈ 〈t〉 ,∀K ∈ ∆l, ∀L ∈ ∆l+1, L ⊆ K � another happy case! We here supposed thatthere are t+ 1 partitions; in Example 2.11 from [17], t = n− 1.

Finally, for the next sections, we need to associate a hybrid Metropolis-Hastings chain with a reference chain. Below we state the terms of this associ-ation.

Remark 2.5. This association makes sense (warning!) if, obviously, bothchains are de�ned by means of the same state space S, the same probabilitydistribution (of interest) π on S, and the same sequence of partitions ∆1, ∆2,..., ∆t+1 on S with ∆1 = (S) � ∆2 � ... � ∆t+1 = ({i})i∈S , where t ≥ 1. Weshall use expressions as �the hybrid Metropolis-Hastings chain and its referencechain�, �the Gibbs sampler and its reference chain�, �the reference chain of ahybrid Metropolis-Hastings chain�, etc., meaning that both chains from eachexpression are associated in this manner and the reference chain (with transitionmatrix R = R1R2...Rt) from each expression has the matrices R1, R2, ..., Rtspeci�ed or not, in the latter case, R1, R2, ..., Rt are only from the equivalenceclasses R̂1, R̂2, ..., R̂t, respectively (R̂l = the equivalence class of Rl, ∀l ∈ 〈t〉),so, the reader, in this latter case, has a complete freedom to choose the matricesR1, R2, ..., Rt as he/she wishes.

The association from Remark 2.5 is good for our uni�cation and, as aresult of this, is good for comparisons and improvements (see the next sections).

3. EXACT SAMPLING USING HYBRID CHAINS

In this section, �rst, we show that the Gibbs sampler on 〈〈h〉〉n (i.e., on{0, 1, ..., h}n), h, n ≥ 1, belongs to our collection of hybrid Metropolis-Hastingschains from [18]. Second, we give some interesting classes of probability distri-butions � they are interesting because: 1) supposing that the generation time isnot limited, we can generate any random variable with probability distributionbelonging to the union of these classes exactly (not approximately) by Gibbssampler (sometimes by optimal hybrid Metropolis chain) or by a special Gibbs

19 G method in action: from exact sampling to approximate one 431

sampler with grouped coordinates in just one step; 2) sometimes, the Gibbssampler or a special Gibbs sampler with grouped coordinates is identical withits reference chain. An application on the random variables with geometricdistribution is given. Third, results on the hybrid chains or reference chainsare given.

We begin the de�nition of Gibbs sampler � we refer to the cyclic Gibbssampler. Below we consider the (cyclic) Gibbs sampler on 〈〈h〉〉n , h, n ≥ 1(more generally, we can consider the state space 〈〈h1〉〉 × 〈〈h2〉〉 × ...× 〈〈hn〉〉 ,h1, h2, ..., hn, n ≥ 1), see [10]; see, e.g., also [1, 2�3, 6], [7, pp. 236�241], [8], [9,Chapter 5], [11, pp. 12 and 51�54], [15, pp. 69�81], and [21, Chapters 5 and 7].

Recall that the entry (i, j) of a matrix Z is denoted Zij or, if confusioncan arise, Zi→j .

We use the convention that an empty term vanishes.

Let x = (x1, x2, ..., xn) ∈ S = 〈〈h〉〉n , h, n ≥ 1. Set

x [k |l ] = (x1, x2, ..., xl−1, k, xl+1, ..., xn) ,∀k ∈ 〈〈h〉〉 ,∀l ∈ 〈n〉

(consequently, x [k |l ] ∈ S, ∀k ∈ 〈〈h〉〉 , ∀l ∈ 〈n〉).Let π be a positive probability distribution on S = 〈〈h〉〉n (h, n ≥ 1). Set

the matrices Pl, l ∈ 〈n〉 , where

(Pl)xy =

0 if y 6= x [k |l ] , ∀k ∈ 〈〈h〉〉 ,

πx[k|l ]∑j∈〈〈h〉〉

πx[j|l ]if y = x [k |l ] for some k ∈ 〈〈h〉〉 ,

∀l ∈ 〈n〉 , ∀x, y ∈ S. Set P = P1P2...Pn.

Consider the Markov chain with state space S = 〈〈h〉〉n (h, n ≥ 1) andtransition matrix P above. This chain is called the cyclic Gibbs sampler � theGibbs sampler for short.

For labeling the rows and columns of P1, P2, ..., Pn and other things,we consider the states of S = 〈〈h〉〉n in lexicographic order, i.e., in the or-der (0, 0, ..., 0) , (0, 0, ..., 0, 1) , ..., (0, 0, ..., 0, h) , (0, 0, ..., 0, 1, 0) , (0, 0, ..., 0, 1, 1) ,..., (0, 0, ..., 0, 1, h) , ..., (0, 0, ..., 0, h, 0) , (0, 0, ..., 0, h, 1) , ..., (0, 0, ..., 0, h, h) , ...,(h, h, ..., h, 0) , (h, h, ..., h, 1) , ..., (h, h, ..., h) .

Further, we show that the Gibbs sampler on 〈〈h〉〉n , h, n ≥ 1 (more ge-nerally, on 〈〈h1〉〉 × 〈〈h2〉〉 × ... × 〈〈hn〉〉 , h1, h2, ..., hn, n ≥ 1) belongs to ourcollection of hybrid Metropolis-Hastings chains from [18] and satis�es, more-over, the conditions (c1)�(c4), excepting (c3). More precisely, we show thatthe Gibbs sampler on 〈〈h〉〉n satis�es all the conditions (C1)�(C3) (basic con-ditions) and (c1)�(c4) (special conditions), excepting (c3), and the equationsfrom the de�nition of hybrid Metropolis-Hastings chain. To see this, followingthe second special case from Section 3 in [18] (there it was considered a more

432 Udrea P�aun 20

general framework, namely, when the coordinates are grouped (blocked) intogroups (blocks) of size v), set

K(x1,x2,...,xl) = {(y1, y2, ..., yn) | (y1, y2, ..., yn) ∈ S and yi = xi,∀i ∈ 〈l〉} ,

∀l ∈ 〈n〉 , ∀x1, x2, ..., xl ∈ 〈〈h〉〉 (obviously, K(x1,x2,...,xn) = {(x1, x2, ..., xn)}),

∆1 = (S) ,

and

∆l+1 =(K(x1,x2,...,xl)

)x1,x2,...,xl∈〈〈h〉〉

, ∀l ∈ 〈n〉 .

Obviously, ∆1 = (S) � ∆2 � ... � ∆n+1 = ({x})x∈S . Note also thatthe sets S, K(x1,x2,...,xl), x1, x2, ..., xl ∈ 〈〈h〉〉 , l ∈ 〈n〉 , determine, by inclusionrelation, a tree which we call the tree of inclusions. For simpli�cation, belowwe give the tree of inclusions for S = 〈〈1〉〉n (i.e., for S = {0, 1}n).

S� �

K(0) K(1)

�� ��K(0,0) K(0,1) K(1,0) K(1,1)

� � � � � � � �... · · · · · ·

...�� ��

K(0,0,...,0) K(0,0,...,0,1) · · · K(1,1,...,1,0) K(1,1,...,1)

Following, e.g., [7, pp. 242], we de�ne the matrices Q1, Q2, ..., Qn asfollows:

Ql = Pl, ∀l ∈ 〈n〉 .

It is easy to prove that the matrices Ql, l ∈ 〈n〉 , satisfy the basic conditions(C1)�(C3) from Section 1. Further, it is easy to prove that the matrices Pl andQl satisfy the equations

(Pl)xy =

0 if y 6= x and (Ql)xy = 0,

(Ql)xy min(

1,πy(Ql)yxπx(Ql)xy

)if y 6= x and (Ql)xy > 0,

1−∑

z∈S,z 6=x(Pl)xz if y = x,

∀l ∈ 〈n〉 , ∀x, y ∈ S. (Further, it follows that the conclusion of Theorem 1.11holds, in particular, for P = P1P2...Pn.) Therefore, the Gibbs sampler on 〈〈h〉〉nbelongs to our collection of hybrid Metropolis-Hastings chains from [18]. Now,

21 G method in action: from exact sampling to approximate one 433

it is easy to prove that the Gibbs sampler on 〈〈h〉〉n satis�es, moreover, thespecial conditions (c1)�(c4), excepting (c3). Finally, we have the followingresult.

Theorem 3.1. The Gibbs sampler on 〈〈h〉〉n belongs to our collection of

hybrid Metropolis-Hastings chains. Moreover, this chain satis�es the conditions

(c1)�(c4), excepting (c3).

Proof. See above. �

Based on Theorem 3.1, it is easy now to show that the chain on 〈〈h〉〉n ,h, n ≥ 1 (more generally, on 〈〈h1〉〉 × 〈〈h2〉〉 × ... × 〈〈hn〉〉 , h1, h2, ..., hn,n ≥ 1), de�ned below is, according to our de�nition, a hybrid Metropolis-Hastings chain which satis�es, moreover, all or part of the conditions (c1)�(c4).This chain on 〈〈h〉〉n is a generalization of the (cyclic) Gibbs sampler on 〈〈h〉〉nas follows: the matrices Q1, Q2, ..., Qn of Gibbs sampler (see before Theorem3.1) are, more generally, replaced with the matrices Q1, Q2, ..., Qn (we used the

same notation for these) such that_

Q1,_

Q2, ...,_

Qn of the former matrices are

identical with_

Q1,_

Q2, ...,_

Qn of the latter matrices, respectively; P = P1P2...Pnis the transition matrix of this chain, where, using Metropolis-Hastings rule,P1, P2, ..., Pn are de�ned by means of the more general matrices Q1, Q2, ..., Qn,respectively.

Since we now know the structure of matrices P1, P2, ..., Pn correspondingto the update of coordinates 1, 2, ..., n, respectively, we could study other typesof Gibbs samplers on 〈〈h〉〉n , h, n ≥ 1 (the random Gibbs sampler, etc., see, e.g.,[1], [8], [15, pp. 71�73], and [19]), and, more generally, other types of chains on〈〈h〉〉n , h, n ≥ 1 (a generalization of the random Gibbs sampler, etc.), derivedfrom the generalization from the above paragraph of Gibbs sampler.

Recall that R+ = {x | x ∈ R and x > 0} . Recall that the states of S =〈〈h〉〉n are considered in lexicographic order.

Theorem 3.2. Let S = 〈〈h〉〉n , h, n ≥ 1. Let w = (h+ 1)t− 1, 0 ≤ t ≤ n.Consider on S the probability distribution

π = (c0, c0a, ..., c0aw, c1, c1a, ..., c1a

w, ..., ch, cha, ..., chaw,

c0, c0a, ..., c0aw, c1, c1a, ..., c1a

w, ..., ch, cha, ..., chaw, ...,

c0, c0a, ..., c0aw, c1, c1a, ..., c1a

w, ..., ch, cha, ..., chaw)

(the sequence c0, c0a, ..., c0aw, c1, c1a, ..., c1a

w, ..., ch, cha, ..., chaw appears

(h+ 1)n−t−1 times if 0 ≤ t < n and c0, c0a, ..., c0aw only appears if t = n),

where c0, c1, ..., ch, a ∈ R+. Then, for the Gibbs sampler and, when h = 1,for the optimal hybrid Metropolis chain with the matrices Q1, Q2, ..., Qn such

that_

Q1,_

Q2, ...,_

Qn are identical with_

Q1,_

Q2, ...,_

Qn of the Gibbs sampler on

434 Udrea P�aun 22

S = 〈〈1〉〉n, respectively, we have, using the same notation, P = P1P2...Pn, forthe transition matrices of these two chains,

P = e′π

(therefore, the stationarity of these chains is attained at time 1).

Proof. Since (see the proof of Theorem 3.1)

∆1 = (S) ,

∆2 =(K(0),K(1), ...,K(h)

),

∆3 =(K(0,0),K(0,1), ...,K(0,h),K(1,0),K(1,1), ...,K(1,h), ...,K(h,0),K(h,1), ...,K(h,h)

),

...

∆n+1 = ({x})x∈S ,

we have

|S| = (h+ 1)n

(|S| is the cardinal of S),∣∣K(0)

∣∣ =∣∣K(1)

∣∣ = ... =∣∣K(h)

∣∣ = (h+ 1)n−1 ,∣∣K(0,0)

∣∣ =∣∣K(0,1)

∣∣ = ... =∣∣K(0,h)

∣∣ =∣∣K(1,0)

∣∣ =∣∣K(1,1)

∣∣ = ... =∣∣K(1,h)

∣∣ = ...

... =∣∣K(h,0)

∣∣ =∣∣K(h,1)

∣∣ = ... =∣∣K(h,h)

∣∣ = (h+ 1)n−2 ,

...

|{x}| = (h+ 1)0 = 1, ∀x ∈ S.

Let l ∈ 〈n〉 . Let K ∈ ∆l and L ∈ ∆l+1, L ⊆ K. Then ∃v1, v2, ..., vl ∈ 〈〈h〉〉

such that

K =

{S if l = 1,

K(v1,v2,...,vl−1) if l 6= 1,

and

L = K(v1,v2,...,vl).

Let x = (x1, x2, ..., xn) ∈ K. It follows that x1 = v1, x2 = v2, ..., xl−1 = vl−1

(these equations vanish when l = 1) and, obviously,

x [vl |l ] = (x1, x2, ..., xl−1, vl, xl+1, ..., xn) = (v1, v2, ..., vl−1, vl, xl+1, ..., xn) ∈ L.

Note also that

|K| = (h+ 1)n−l+1 , |L| = (h+ 1)n−l , and (Pl)xx[vl|l ] > 0.

(The reader, if he/she wishes, can use the notation (Pl)x→x[vl|l ] instead of(Pl)xx[vl|l ] .)

23 G method in action: from exact sampling to approximate one 435

First, we consider the Gibbs sampler. To compute the probabilities(Pl)xx[vl|l ] , we consider three cases: n− l < t; n− l = t; n− l > t.

The case n − l < t is a bit more di�cult. In this case, the probabilitiesπx corresponding to the elements x ∈ K are, keeping the order,

ciav, cia

v+1, ..., ciav+(h+1)n−l+1−1

for some i ∈ 〈〈h〉〉 and v ∈⟨⟨w − (h+ 1)n−l+1 + 1

⟩⟩and those corresponding

to the elements x ∈ L are, keeping the order,

ciav+vl(h+1)n−l

, ciav+vl(h+1)n−l+1, ..., cia

v+(vl+1)(h+1)n−l−1.

We have

ciav+vl(h+1)n−l+z

h∑s=0

ciav+s(h+1)n−l+z

=avl(h+1)n−l

h∑s=0

as(h+1)n−l

, ∀z ∈⟨⟨

(h+ 1)n−l − 1⟩⟩

.

It follows that the �rst ratio does not depend on z, ∀z ∈⟨⟨

(h+ 1)n−l − 1⟩⟩

.

Moreover, it does not depend on ci and v.

The others two cases are obvious.

We now have

(Pl)xx[vl|l ] =

avl(h+1)b

h∑s=0

as(h+1)bif n− l = b for some b ∈ 〈〈t− 1〉〉 ,

cvlh∑

i=0ci

if n− l = t,

1h+1 if n− l > t.

Consequently, P1 ∈ G∆1,∆2 , P2 ∈ G∆2,∆3 , ..., Pn ∈ G∆n,∆n+1 . By Theorems 1.6,1.11, and 3.1,

P = e′π.

Second, we consider the optimal hybrid Metropolis chain when h = 1. Inthis case, w = 2t − 1, 0 ≤ t ≤ n, and

π = (c0, c0a, ..., c0aw, c1, c1a, ..., c1a

w,

c0, c0a, ..., c0aw, c1, c1a, ..., c1a

w, ...,

c0, c0a, ..., c0aw, c1, c1a, ..., c1a

w) .

As to the positions of positive entries of Q1, Q2, ..., Qn, we have, by hypothesis,

Q1, Q2, ..., Qn such that_

Q1,_

Q2, ...,_

Qn are identical with_

Q1,_

Q2, ...,_

Qn of

436 Udrea P�aun 24

the Gibbs sampler on S = 〈〈1〉〉n (see Section 1 and Theorem 3.1; see also thesecond special case from Section 3 in [18]), respectively. It follows that

fl =

min

(a−2b , a2b

)if n− l = b for some b ∈ 〈〈t− 1〉〉 ,

min(c0c1, c1c0

)if n− l = t,

1 if n− l > t,

xl =1

fl + rl − 1=

1

fl + 2− 1=

1

fl + 1=

=

1

min(a−2b ,a2b

)+1

if n− l = b for some b ∈ 〈〈t− 1〉〉 ,

1

min(

c0c1,c1c0

)+1

if n− l = t,

12 if n− l > t,

and (cases for c0 and c1 : c0 ≤ c1, c0 > c1 (c0, c1 ∈ R+); cases for a : a ≤ 1, a > 1(a ∈ R+))

(Pl)xx[vl|l ] =

1−vl+vla2b

1+a2bif n− l = b for some b ∈ 〈〈t− 1〉〉 ,

cvlc0+c1

if n− l = t,

12 if n− l > t.

Note that 1 − vl + vla2b = avl2

bbecause vl ∈ 〈〈1〉〉 . It follows that these

transition probabilities are identical with those for the Gibbs sampler whenh = 1. This is an interesting thing. See also Theorem 3.6 (the optimal hybridMetropolis chain is not considered there because of this thing).

Proceeding as in the Gibbs sampler case, it follows that

P = e′π. �

We call the probability distribution from Theorem 3.2 the wavy probabilitydistribution (of �rst type).

Remark 3.3. As to the class of wavy distributions from Theorem 3.2, theGibbs sampler is better than the optimal hybrid Metropolis chain when the lat-

ter chain has the matrices Q1, Q2, ..., Qn such that_

Q1,_

Q2, ...,_

Qn are identical

with_

Q1,_

Q2, ...,_

Qn of the Gibbs sampler on S = 〈〈h〉〉n, respectively. For thissee Theorem 3.2 and the following two examples.

(1) Consider (the probability distribution)

π =(c0, c0a, c0a

2, c1, c1a, c1a2, c2, c2a, c2a

2)

25 G method in action: from exact sampling to approximate one 437

on S = 〈〈2〉〉2 , where c0, c1, c2, a ∈ R+, ci 6= cj ,∀i, j ∈ 〈〈2〉〉 . π is a wavyprobability distribution. Suppose, for simpli�cation, that c0 < c1 < c2. ByTheorem 3.2, for the Gibbs sampler, we have

P = e′π

(P = P1P2). It is easy to prove, for the optimal hybrid Metropolis chain, that

P 6= e′π

(P = P1P2; we used the same notation for matrices in both cases).

(2) Consider

π = (c, ca, ..., caw, c, ca, ..., caw, ..., c, ca, ..., caw)

on S = 〈〈h〉〉n , c, a ∈ R+, h, n ≥ 1. π is also a wavy probability distribution(the case when c0 = c1 = ... = ch := c). By Theorem 3.2, for the Gibbssampler, we have

P = e′π

(P = P1P2...Pn). It is easy to prove, for the optimal hybrid Metropolis chainwhen, e.g.,

π =(c, ca, ..., ca8

)on S = 〈〈2〉〉2 , c, a ∈ R+, a 6= 1 (for a = 1, π = the uniform probabilitydistribution), that

P 6= e′π

(we also used the same notation for matrices in both cases with the only di�e-rence that P = P1P2 here).

By Theorem 3.2 and Remark 3.3 it is possible that, on 〈〈h〉〉n , the Gibbssampler or a special generalization of it be the fastest chain in our collection ofhybrid Metropolis-Hastings chains. The word �fastest� refers to Markov chainsstrictly, not to computers. The running time of our hybrid chains on a computeris another matter (the computational cost per step is the main problem; on acomputer, a step of a Markov chain can be performed or not).

Example 3.4. Consider the probability distribution π on S = 〈〈1〉〉100 (i.e.,on S = {0, 1}100), where

π(0,0,...,0) = d, πx =1− d

2100 − 1,∀x ∈ S, x 6= (0, 0, ..., 0) ,

where d ∈ (0, 1) (e.g., d = 12 , or d = 3

4 , or d = 910). Since the sampling from S

using the Gibbs sampler or optimal hybrid Metropolis chain can be intractable

438 Udrea P�aun 26

(on any computer) for some d, one way is breaking of d into many �pieces�. Forthis we consider the probability distribution ρ on S1 = 〈〈1〉〉101, where

ρ(0,x) =1− d

2100 − 1, ρ(1,x) =

1− (1−d)2100

2100−1

2100, ∀x ∈ S

((0, x) and (1, x) are vectors from S1). Since

ρ(0,x) = πx,∀x ∈ S, x 6= (0, 0, ..., 0) ,

it follows that the sampling from S can be performed via the sampling fromS1. Indeed, letting X be a random variable with the probability distributionπ, if, using ρ (on S1), we select a value equal to (0, u) for some u ∈ S, u 6=(0, 0, ..., 0) ∈ S, then we set X = u � this value of X is selected according toits probability distribution π on S � while, if, using ρ too, we select a valueequal to (0, 0, ..., 0) ∈ S1 or (1, v) for some v ∈ S, then we set X = (0, 0, ..., 0)(obviously, (0, 0, ..., 0) ∈ S) � this value of X is also selected according to itsprobability distribution. By Theorem 3.2 the Gibbs sampler and optimal hybridMetropolis chain sample exactly (not approximately) from S1 (equipped withρ); this implies that the sampling from S (equipped with π) is also exactly.

The wavy probability distribution(s) from Theorem 3.2 has (have) so-mething in common with the geometric distribution. This fact suggests thenext application.

Application 3.5. To generate, exactly, a random variable with geometricdistribution

(p, pq, pq2, ...

), p, q ∈ (0, 1) , q = 1 − p, we can proceed as follows

(see, e.g., also [4, p. 500]). We split the geometric distribution into two parts,a tail carrying small probability and a main body of size 2n, where n, n ≥ 0,is suitably chosen. The main body contains the �rst 2n values of geometricdistribution and determines the probability distribution

π =(Zp,Zpq, Zpq2, ...Zpq2n−1

),

where

Z =1

1− q2n.

We choose the main body with the probability 1− q2n (= p+ pq+ ...+ pq2n−1)and the tail with the probability q2n . (See also discrete mixtures in, e.g., [4,p. 16], the decomposition method in, e.g., [4, p. 66], and the compositionmethod in, e.g., [4, p. 66].) If the output of choice is the main body, thenwe can sample exactly (not approximately) from {1, 2, ..., 2n} (equipped withthe probability distribution π), using the Gibbs sampler or optimal hybridMetropolis chain when n ≥ 1, see Theorem 3.2 (the stationarity is attained attime 1 for each of these chains). Obviously, to use the former or latter chain, we

27 G method in action: from exact sampling to approximate one 439

need another distribution, µ � we replace the probability distribution π = (πi)on {1, 2, ..., 2n}, π1 = Zp, π2 = Zpq, ..., π2n = Zpq2n−1, with µ = (µi) on〈〈1〉〉n , µ(0,0,...,0) = π1, µ(0,0,...,0,1) = π2, µ(0,0,...,0,1,0) = π3, µ(0,0,...,0,1,1) = π4,..., µ(1,1,...,1,0) = π2n−1, µ(1,1,...,1) = π2n . Otherwise, i.e., if the output of choiceis the tail, we can proceed as follows. Supposing that X is a random variablewith the geometric distribution above, i.e.,

(p, pq, pq2, ...

), then, due to the

lack-of-memory property of X, X−2n (X > 2n here) is a random variable withthe same geometric distribution as X, i.e.,

(p, pq, pq2, ...

). Therefore, further,

we can work with X − 2n and its probability distribution(p, pq, pq2, ...

)(we

again split this distribution into two parts, a main body and a tail, ...), etc.The case when all the main bodies are of size 1 (20 = 1) is well-known, see,e.g., [4, p. 498]; we here gave a generalization of this case by Gibbs sampler oroptimal hybrid Metropolis chain.

The next result says that, sometimes, the Gibbs sampler (in some caseseven the optimal hybrid Metropolis chain, see Theorem 3.2 and its proof andthe next result) is identical with its reference chain.

Theorem 3.6. Consider on S = 〈〈h〉〉n , h, n ≥ 1, the wavy probability

distribution π from Theorem 3.2. Consider on S (equipped with π) the Gibbs

sampler with transition matrix P = P1P2...Pn and its reference chain with

transition matrix R = R1R2...Rn (see Remark 2.5). Then

(Pl)xy = a(l)K,L, ∀l ∈ 〈n〉 ,∀K ∈ ∆l,∀L ∈ ∆l+1 with L ⊆ K,∀x ∈ K,∀y ∈ L with (Pl)xy > 0

(∆l, l ∈ 〈n+ 1〉 , are the partitions determined by the Gibbs sampler, see the

proof of Theorem 3.1), andP = R.

If, moreover,_

Rl =_

P l, ∀l ∈ 〈n〉 , thenPl = Rl, ∀l ∈ 〈n〉 .

(Therefore, under all the above conditions, the Gibbs sampler is identical with

its reference chain, leaving the initial probability distribution aside.)

Proof. First, we show that (Pl)xy = a(l)K,L, ∀l ∈ 〈n〉 , ∀K ∈ ∆l, ∀L ∈ ∆l+1

with L ⊆ K, ∀x ∈ K, ∀y ∈ L with (Pl)xy > 0. For the Gibbs sampler onS = 〈〈h〉〉n , in the proof of Theorem 3.2, it was shown that

(Pl)xx[vl|l ] =

avl(h+1)b

h∑s=0

as(h+1)bif n− l = b for some b ∈ 〈〈t− 1〉〉 ,

cvlh∑

i=0ci

if n− l = t,

1h+1 if n− l > t.

440 Udrea P�aun 28

Recall that

a(l)K,L =

∑x∈L

πx∑x∈K

πx, ∀l ∈ 〈n〉 , ∀K ∈ ∆l, ∀L ∈ ∆l+1, L ⊆ K,

see the reference method in Section 2. Recall that |K| = (h+ 1)n−l+1 and|L| = (h+ 1)n−l , see the proof of Theorem 3.2.

Case 1. n − l = b for some b ∈ 〈〈t− 1〉〉 . By proof of Theorem 3.2 wehave

a(l)K,L =

∑x∈L

πx∑x∈K

πx=

(h+1)b−1∑w1=0

ciav+vl(h+1)b+w1

(h+1)b+1−1∑w2=0

ciav+w2

=

(h+1)b−1∑w1=0

avl(h+1)b+w1

(h+1)b+1−1∑w2=0

aw2

.

Since(h+1)b−1∑w1=0

avl(h+1)b+w1 = avl(h+1)b(

1 + a+ ...+ a(h+1)b−1)

and

(h+1)b+1−1∑w2=0

aw2 =(

1 + a+ ...+ a(h+1)b−1)

+

+(a(h+1)b + a(h+1)b+1 + ...+ a2(h+1)b−1

)+ ...

...+(ah(h+1)b + ah(h+1)b+1 + ...+ a(h+1)b+1−1

)=

=(

1 + a+ ...+ a(h+1)b−1)

+ a(h+1)b(

1 + a+ ...+ a(h+1)b−1)

+ ...

...+ ah(h+1)b(

1 + a+ ...+ a(h+1)b−1)

=

=(

1 + a+ ...+ a(h+1)b−1)(

1 + a(h+1)b + ...+ ah(h+1)b),

we have

a(l)K,L =

avl(h+1)b

h∑s=0

as(h+1)b.

Case 2. n − l = t. By de�nition of π (see Theorem 3.2) and proof ofTheorem 3.2 we have∑

x∈Lπx = cvl + cvla+ ...+ cvla

w = cvl (1 + a+ ...+ aw)

29 G method in action: from exact sampling to approximate one 441

and ∑x∈K

πx = (c0 + c0a+ ...+ c0aw) + (c1 + c1a+ ...+ c1a

w) + ...

...+ (ch + cha+ ...+ chaw) = (1 + a+ ...+ aw)

h∑i=0

ci.

Consequently,

a(l)K,L =

cvlh∑i=0

ci

.

Case 3. n− l > t. By de�nition of π and proof of Theorem 3.2, setting

σL =∑x∈L

πx,

it is easy to see that ∑x∈K

πx = (h+ 1)σL.

Consequently,

a(l)K,L =

1

h+ 1.

From Cases 1�3, we have (Pl)xx[vl|l ] = a(l)K,L. Therefore, (Pl)xy = a

(l)K,L

(y = x [vl |l ] for some vl ∈ 〈〈h〉〉), ∀l ∈ 〈n〉 , ∀K ∈ ∆l, ∀L ∈ ∆l+1 with L ⊆ K,∀x ∈ K, ∀y ∈ L with (Pl)xy > 0.

By Theorem 1.6 and above result we have P = e′π. By Theorem 2.4(iv),R = e′π. Therefore, P = R.

The other part of conclusion is obvious. �

In [18], we modi�ed our hybrid (Metropolis-Hastings) chains such thatthe modi�ed hybrid chains have better upper bounds for ‖pn − π‖1 (see Theo-rem 1.12; see also Theorems 1.8 and 1.9). Below we present this modi�cation.

If P = P1P2...Pt (see Section 1) is the transition matrix of a hybridMetropolis-Hastings chain, we replace the product Ps+1Ps+2...Pt (1 ≤ s < t) by

the block diagonal ∆s+1-stable matrix (recall that ∆l =(K

(l)1 ,K

(l)2 , ...,K

(l)ul

),

∀l ∈ 〈t+ 1〉 , see Section 1)

P ∗ = P ∗ (s) =

A∗(s+1)1

A∗(s+1)2

...

A∗(s+1)us+1

,

442 Udrea P�aun 30

where

A∗(s+1)z =

1∑i∈K(s+1)

z

πie′(∣∣∣K(s+1)

z

∣∣∣) (πi)i∈K(s+1)z

,∀z ∈ 〈us+1〉

(recall that e = e (n) = (1, 1, ..., 1) ∈ Rn and e′ is its transpose; (πi)i∈K(s+1)z

is

a (row) vector, ∀z ∈ 〈us+1〉).It is easy to prove that the chain with transition matrix P1P2...PsP

∗ isalso a hybrid Metropolis-Hastings chain. Therefore, we did not enlarge ourcollection of hybrid Metropolis-Hastings chains, we only found an interestingsubcollection of it.

The chain with transition matrix P1P2...PsP∗, being a hybrid Metropolis-

Hastings chain, has a reference chain, which is associated as in Remark 2.5. Ifthe partitions on S (the state space) of the hybrid chain with transition matrixP above are ∆1, ∆2, ..., ∆t+1, where ∆1 = (S) � ∆2 � ... � ∆t+1 = ({i})i∈S ,then the partitions on S of the chain with transition matrix P1P2...PsP

∗ andof its reference chain are ∆1, ∆2, ..., ∆s+1, ∆t+1 (obviously, ∆1 = (S) � ∆2 �... � ∆s+1 � ∆t+1 = ({i})i∈S), see also Remark 3.10.

If we consider that, see Remark 2.5, the reference chain of a hybrid(Metropolis-Hastings) chain with transition matrix P = P1P2...Pt has the tran-sition matrix R = R1R2...Rt, then it is easy to see that

P ∗ = P ∗ (s) = Rs+1Rs+2...Rt

(by Theorem 2.4(i), Rs+1Rs+2...Rt is a block diagonal ∆s+1-stable matrix).Therefore, this modi�cation of our hybrid chains is also related to the referencechain � this is a good thing because the reference chain is a very fast chain(see Theorem 2.4). We can work with P ∗ or with the sparser matrices Rs+1,Rs+2, ..., Rt than P ∗ (the latter way was also suggested in [18]). Obviously,the reference chain of a hybrid chain with transition matrix P1P2...PsP

∗ hasthe transition matrix R1R2...RsP

∗.To unify the theory of hybrid chains with P ∗ � this means that we refer

to the theory of hybrid chains with transition matrix P1P2...PsP∗ � and that

without P ∗, below we consider that s ∈ 〈t〉 and set (see [18])

P = P (s) =

{P1P2...Pt if s = t,

P1P2...PsP∗ if 1 ≤ s < t.

Theorem 3.7 ([18]). Concerning P above we have πP = π and P > 0.

Proof. See [18]. �

Now, we consider the Gibbs sampler on 〈〈h〉〉n (h, n ≥ 1) with transitionmatrix P = P1P2...Pn. By Theorem 3.1 this is a hybrid chain. Further, we

31 G method in action: from exact sampling to approximate one 443

consider a generalization of the Gibbs sampler on 〈〈h〉〉n . This is a chain whichupdates the �rst coordinate, then the second one, ..., then sth one, then the lastn − s ones simultaneously, 1 ≤ s < n, then the �rst one, then the second one,..., then sth one, then the last n − s ones simultaneously, etc., the transitionmatrices being P1 for the update of �rst coordinate, P2 for the update of secondone, ..., Ps for the update of sth one, P(s+1,s+2,...,n) for the update of the lastn− s ones, where, letting x = (x1, x2, ..., xn) ∈ 〈〈h〉〉n and setting

x [k1, k2, ..., kn−s | s+ 1, s+ 2, ..., n ] = (x1, x2, ..., xs, k1, k2, ..., kn−s) ,

∀k1, k2, ..., kn−s ∈ 〈〈h〉〉 , we consider(P(s+1,s+2,...,n)

)xy

=

=

0

if y 6= x [k1, k2, ..., kn−s | s+ 1, s+ 2, ..., n ] , ∀k1, k2, ..., kn−s ∈ 〈〈h〉〉 ,

πx[k1,k2,...,kn−s| s+1,s+2,...,n ]∑j1,j2,...,jn−s∈〈〈h〉〉

πx[j1,j2,...,jn−s| s+1,s+2,...,n ]

if y = x [k1, k2, ..., kn−s | s+ 1, s+ 2, ..., n ] for some k1, k2, ..., kn−s∈〈〈h〉〉

(y ∈ 〈〈h〉〉n , π is the target probability distribution). We call this chain havingthe transition matrix P1P2...PsP(s+1,s+2,...,n) theGibbs sampler (on 〈〈h〉〉n) withthe last n− s coordinates grouped. It is easy to see that

P(s+1,s+2,...,n) = P ∗,

where P ∗ = P ∗ (s) (see above for P ∗; here, t = n), so, the transition matrixP = P1P2...Pn of Gibbs sampler must be replaced, to obtain the transitionmatrix of the other chain, by, simply, the transition matrix

P1P2...PsP∗,

(for s = n − 1, we have P ∗ = P ∗ (n− 1) = Pn; therefore, the two chains havethe same transition matrix when s = n−1). So, concerning the Gibbs sampler,the above modi�cation of (our) hybrid chains refers, simply, to grouping the lastn−s coordinates. All these lead to the following generalization of Theorem 3.1.

Theorem 3.8. The Gibbs sampler on S = 〈〈h〉〉n (h, n ≥ 1) with the last

n−s coordinates grouped or not belongs to our collection of hybrid (Metropolis-

Hastings) chains from [18]. Moreover, this chain satis�es the conditions (c1)�(c4), excepting (c3).

Proof. See above and Theorem 3.1. �

444 Udrea P�aun 32

Other types of Gibbs samplers with grouped coordinates could also belongto our collection of hybrid chains. For grouping the coordinates and the group(block) update (in which a group of coordinates are updated simultaneously),see, e.g., [1] and [19].

Below we give a result similar to Theorem 3.2.

Theorem 3.9. Let S = 〈〈h〉〉n , h, n ≥ 1. Let w = (h+ 1)t−1, 0 ≤ t < n.Consider on S the probability distribution

π = (c0u0, c0u1, ..., c0uw, c1u0, c1u1, ..., c1uw, ..., chu0, chu1, ..., chuw,

c0u0, c0u1, ..., c0uw, c1u0, c1u1, ..., c1uw, ..., chu0, chu1, ..., chuw, ...,

c0u0, c0u1, ..., c0uw, c1u0, c1u1, ..., c1uw, ..., chu0, chu1, ..., chuw)

(the sequence c0u0, c0u1, ..., c0uw, c1u0, c1u1, ..., c1uw, ..., chu0, chu1, ..., chuw ap-

pears (h+ 1)n−t−1 times), where c0, c1, ..., ch, u0, u1, ..., uw ∈ R+. Then, for the

Gibbs sampler with the last t coordinates grouped, having the transition matrix

P = P1P2...Pn−tP∗, we have

P = e′π(therefore, the stationarity of this chain is also attained at time 1).

Proof. We use part of the proof of Theorem 3.2, namely, all before �First,we consider the Gibbs sampler.�. Let l ∈ 〈n− t〉 . We have (recall, from theproof of Theorem 3.2, that |K| = (h+ 1)n−l+1 , |L| = (h+ 1)n−l , vl ∈ 〈〈h〉〉 ,...)

(Pl)xx[vl|l ] =

cvlh∑

i=0ci

if n− l = t,

1h+1 if n− l > t.

Consequently, P1 ∈ G∆1,∆2 , P2 ∈ G∆2,∆3 , ..., Pn−t ∈ G∆n−t,∆n−t+1 . SinceP ∗ ∈ G∆n−t+1,∆n+1 (recall that ∆n+1 = ({x})x∈S), by Theorems 1.6, 3.7, and3.8 we have

P = e′π. �

We call the probability distribution from Theorem 3.9 the wavy probabilitydistribution (of second type).

Remark 3.10. Consider a hybrid Metropolis-Hastings chain with transitionmatrix P = P (s), see before Theorem 3.7 (s ∈ 〈t〉). Then the reference chainof this chain has the transition matrix

R = R (s) =

{R1R2...Rt if s = t,

R1R2...RsR∗ if 1 ≤ s < t,

where R∗ = P ∗ = Rs+1Rs+2...Rt, see the second paragraph before Theorem3.7 for the case 1 ≤ s < t. The hybrid chain and its reference chain have (see

33 G method in action: from exact sampling to approximate one 445

Remark 2.5) the same partitions in the case s = t (∆1, ∆2, ..., ∆t+1 with∆1 = (S) � ∆2 � ... � ∆t+1 = ({i})i∈S (S = the state space)) and the samepartitions in the case 1 ≤ s < t (∆1, ∆2, ..., ∆s+1, ∆t+1 with ∆1 = (S) �∆2 � ... � ∆s+1 � ∆t+1 = ({i})i∈S).

The next result is similar to Theorem 3.6.

Theorem 3.11. Consider on S = 〈〈h〉〉n , h, n ≥ 1, the wavy probabi-

lity distribution π from Theorem 3.9. Consider on S (equipped with π) the

Gibbs sampler with the last t coordinates grouped, having the transition ma-

trix P = P1P2...Pn−tP∗, and its reference chain, having the transition matrix

R = R1R2...Rn−tR∗ (see the paragraph before Theorem 3.8 and Remark 3.10).

Then

(Pl)xy = a(l)K,L, ∀l ∈ 〈n− t〉 ,∀K ∈ ∆l,∀L ∈ ∆l+1 with L ⊆ K,

∀x ∈ K,∀y ∈ L with (Pl)xy > 0

(∆1,∆2, ...,∆n−t+1,∆n+1 (∆n+1 = ({x})x∈S) are the partitions determined by

the Gibbs sampler with the last t coordinates grouped), and

P = R.

If, moreover,_

Rl =_

P l, ∀l ∈ 〈n− t〉 , then

Pl = Rl, ∀l ∈ 〈n− t〉 .

(Therefore, under all the above conditions, the Gibbs sampler with the last tcoordinates grouped is identical with its reference chain, leaving the initial pro-

bability distribution aside.)

Proof. See the proofs of Theorems 3.6 and 3.9. �

Theorems 3.6 and 3.11 are the special cases of next result.

Theorem 3.12 (Uniqueness Theorem). Let S = 〈r〉 . Let π be a positive

probability distribution on S. Let ∆1, ∆2, ..., ∆t+1 ∈Par(S) with ∆1 = (S) �∆2 � ... � ∆t+1 = ({i})i∈S , where t ≥ 1. Consider the reference chain with

transition matrix R determined by S, π, and ∆1, ∆2, ..., ∆t+1, see Section 2(R = R1R2...Rt, ...). Let P1, P2, ..., Pt ∈ Sr such that

(a1) P1 ∈ G∆1,∆2 , P2 ∈ G∆2,∆3 , ..., Pt ∈ G∆t,∆t+1 ;

(a2) (Pl)LK = 0, ∀l ∈ 〈t〉 − {1} , ∀K,L ∈ ∆l, K 6= L (therefore, Pl is a

block diagonal matrix and ∆l-stable matrix on ∆l, ∀l ∈ 〈t〉 − {1});(a3) (Pl)

UK is a row-allowable matrix, ∀l∈〈t〉 , ∀K∈∆l, ∀U ∈∆l+1, U⊆K.

Let

b(l)K,L =

∑j∈L

(Pl)ij , ∀l ∈ 〈t〉 , ∀K ∈ ∆l, ∀L ∈ ∆l+1, L ⊆ K,

446 Udrea P�aun 34

where i ∈ K. (By (a1), b(l)K,L does not depend on i ∈ K.) Consider the chain

with state space S and transition matrix P = P1P2...Pt. Suppose that πP = π.Then

P = Rand

b(l)K,L = a

(l)K,L, ∀l ∈ 〈t〉 , ∀K ∈ ∆l, ∀L ∈ ∆l+1, L ⊆ K.

If, moreover,_

Rl =_

P l, ∀l ∈ 〈t〉 , then

Pl = Rl, ∀l ∈ 〈t〉 .

(Therefore, under all the above conditions, the chain with transition matrix

P = P1P2...Pt is identical with the reference chain, leaving the initial probabilitydistribution aside.)

Proof. We only prove the �rst part of conclusion, the other is obvious. By(a1) and Theorem 1.6, P is a stable matrix. It follows that P = e′π becauseπP = π. Since, by Theorem 2.4(iv), R = e′π, it follows that

P = R.

We have, by Theorem 1.6,

πj =(P−+

1 P−+2 ...P−+

t

)S{j} ,∀j ∈ S.

Let j ∈ S. Proceeding as in the proof of Theorem 2.4(ii), we �nd a sequenceU (1), U (2), ..., U (t+1) such that

U (1) ∈ ∆1, U(2) ∈ ∆2, ..., U

(t+1) ∈ ∆t+1,

{j} = U (t+1) ⊆ U (t) ⊆ ... ⊆ U (1) = S,and(

P−+l

)U(l)U(l+1) > 0 and

(P−+l

)V U(l+1) = 0,∀l ∈ 〈t〉 , ∀V ∈ ∆l, V 6= U (l).

Consequently,

πj =(P−+

1

)SU(2)

(P−+

2

)U(2)U(3) ...

(P−+t

)U(t){j} = b

(1)

S,U(2)b(2)

U(2),U(3) ...b(t)

U(t),{j}.

We conclude that

πk = b(1)

S,U(2)b(2)

U(2),U(3) ...b(t)

U(t),{k}, ∀k ∈ U(t).

Setd(t−1) = b

(1)

S,U(2)b(2)

U(2),U(3) ...b(t−1)

U(t−1),U(t) .

d(t−1) does not depend on k ∈ U (t). Then

b(t)

U(t),{k} =πk

d(t−1), ∀k ∈ U (t).

35 G method in action: from exact sampling to approximate one 447

Since Pt ∈ Sr, by (a2) we have∑k∈U(t)

b(t)

U(t),{k} = 1.

It follows thatd(t−1) =

∑k∈U(t)

πk.

Therefore,

b(t)

U(t),{k} =πk∑

k∈U(t)

πk= a

(t)

U(t),{k}, ∀k ∈ U(t).

Further, the two equations for d(t−1) imply∑k∈U(t)

πk = b(1)

S,U(2)b(2)

U(2),U(3) ...b(t−1)

U(t−1),U(t) .

We conclude that∑k∈W

πk = b(1)

S,U(2)b(2)

U(2),U(3) ...b(t−1)

U(t−1),W, ∀W ∈ ∆t,W ⊆ U (t−1).

Setd(t−2) = b

(1)

S,U(2)b(2)

U(2),U(3) ...b(t−2)

U(t−2),U(t−1) .

d(t−2) does not depend on W ∈ ∆t,W ⊆ U (t−1). Then

b(t−1)

U(t−1),W=

∑k∈W

πk

d(t−2), ∀W ∈ ∆t,W ⊆ U (t−1).

Since Pt−1 ∈ Sr, by (a2) we have∑W∈{A| A∈∆t,A⊆U(t−1) }

b(t−1)

U(t−1),W= 1.

It follows thatd(t−2) =

∑k∈U(t−1)

πk.

Therefore,

b(t−1)

U(t−1),W=

∑k∈W

πk∑k∈U(t−1)

πk= a

(t−1)

U(t−1),W, ∀W ∈ ∆t,W ⊆ U (t−1).

Etc.Finally, taking the structure of matrices P1, P2, ..., Pt into account and

since j was arbitrarily chosen, we obtain

b(l)K,L = a

(l)K,L, ∀l ∈ 〈t〉 , ∀K ∈ ∆l, ∀L ∈ ∆l+1, L ⊆ K. �

448 Udrea P�aun 36

Assumptions (a1) and (a2) from Uniqueness Theorem are similar to (A1)and (A2) from Section 2, respectively. (a3) is a bit more general that (A3) fromSection 2. Uniqueness Theorem says that the reference chain is the best chainperforming an exact selection at time 1 from a (�nite) state space (equippedwith a probability distribution) which we can construct for our collection ofhybrid Metropolis-Hastings chains.

In this section, we considered, on 〈〈h〉〉n , h, n ≥ 1, the Gibbs sampler,etc. We also can consider Markov chains on other state spaces, such as, Sn, theset of permutations of order n (n ≥ 2), which have something in common withthe Gibbs sampler on 〈〈h〉〉n. We can construct such a chain on Sn using, e.g.,the sets of neighbors of optimal hybrid Metropolis chain from Section 3 in [18].(Let A = (Aij) be a nonnegative m × m matrix. Suppose that its incidence

matrix,_

A, is symmetric. Let u, v ∈ 〈m〉 , u 6= v. We say that v is a neighbor

of u if Auv > 0.) Using these sets, the positive probabilities (Pl)ij of this chainare de�ned as follows:

(Pl)ij =πj∑

k∈Vl(i)∪{i}πk, ∀l ∈ 〈n〉 ,∀i ∈ Sn, ∀j ∈ Vl (i) ∪ {i} ,

where π = (πi)i∈Sn is the probability distribution of interest and Vl (i) is the setof neighbors of i. It is easy now to prove that this chain on Sn with transitionmatrix P = P1P2...Pn also belongs to our collection of hybrid Metropolis-Hastings chains.

4. APPROXIMATE SAMPLING USING HYBRID CHAINS

In this section, we discuss certain basic things on the approximate sam-pling via hybrid chains (in particular, via Gibbs sampler).

A Metropolis-Hastings type chain is a chain designed such that

πP = π,

where π = the probability distribution of interest, P = the transition matrixof the chain. This equation is related to the static part of chain, i.e., to thestationarity of chain, but not related to dynamic part of chain, i.e., to the speedof convergence of chain.

Concerning the dynamic part of our hybrid Metropolis-Hastings chains,we constructed a chain, called the reference chain (we do not use, obviously, justone reference chain for all our hybrid Metropolis-Hastings chains, see, for speci-�cation, Remark 2.5, etc.). The reference chain is a chain intended for uni�ca-tions and, as a result of these, intended for comparisons (a hybrid Metropolis-Hastings chain can be compared � this makes sense � with its reference chain)and improvements.

37 G method in action: from exact sampling to approximate one 449

The previous sections lead to the fundamental idea on the speed of con-vergence:

the nearer our hybrid chain is of its reference chain,

the faster our hybrid chain is.

Concerning �nearer�, the comparison rule is as follows. We compare∣∣∣∥∥∥(Pl)

LK

∥∥∥∣∣∣∞

with∣∣∣∥∥∥(Rl)

LK

∥∥∥∣∣∣∞

= a(l)K,L, ∀l ∈ 〈t〉 , ∀K ∈ ∆l, ∀L ∈ ∆l+1, L ⊆ K, where

|‖·‖|∞ is the matrix ∞-norm, P1, P2, ..., Pt and R1, R2, ..., Rt being thematrices of hybrid chain and its reference chain, respectively. We call this theG comparison. Obviously, the G comparison is not an entrywise comparison.

This idea has, in general, a positive part and a negative one. The negativepart of our idea refers to the fact that, in general, we do not know the di�erences∣∣∣∥∥∥(Pl)

LK

∥∥∥∣∣∣∞−a(l)

K,L whenK and L are too large. On the other hand, fortunately,

we can know all or part of these di�erences when K and L are small � basedon these di�erences, we could guess if the hybrid chain is fast or not.

Excepting some cases, such as, those where P1P2...Pt = P1P2...PsP∗ (e.g.,

for the Gibbs sampler on 〈〈h〉〉n , P ∗ = P ∗ (n− 1) = Pn (here, s = n− 1) and,therefore, P1P2...Pn = P1P2...Pn−1P

∗), we believe, not going into details onthe terms of comparison, that the hybrid chains with P ∗ are faster than thosewithout P ∗ because P ∗ = P ∗ (s) = Rs+1Rs+2...Rt. This statement refers toMarkov chains strictly, not to computers (for the latter, it remains to be alsoseen which of these chains is faster). To analyze the hybrid chains with P ∗

or without P ∗ (the fundamental idea on the speed of convergence in each ofthese cases, etc.), Theorems 1.8, 1.9, and 1.12 could be of use (see also [18]).In particular, Theorems 1.8, 1.9, and 1.12 could be of use to analyze the Gibbssampler with grouped coordinates (into blocks of v, etc.) or not. For groupingthe coordinates and the group (block) update (in which a group of coordinatesare updated simultaneously), recall, e.g., [1] and [19].

The classes of wavy probability distributions from Theorems 3.2 and 3.9are very good guides for our collection of hybrid chains

the nearer the target probability distribution is of one of these distributions,

the faster our hybrid chain is.

(In Theorem 3.9, the hybrid chain is a Gibbs sampler with grouped coordinates.)

If the target probability distribution is not a wavy probability distribu-tion, then one way, which could become popular and fruitful, is breaking someπ′is of the target probability distribution π into a few or many �pieces� (seeExample 3.4), or the addition some π′is of π, or, more general than these two,

450 Udrea P�aun 38

the transformation of π into another probability distribution such that the newprobability distribution be as near as possible of some wavy probability distri-bution.

Example 4.1 (A useful transformation). Consider the probability distri-bution π = (πi) on S = 〈〈1〉〉2 , π(0,0) = 0.1, π(0,1) = 0.4, π(1,0) = 0.4,π(1,1) = 0.1. We transform π into another probability distribution, ν = (νi)on S, ν(0,0) = π(0,0), ν(0,1) = π(0,1), ν(1,0) = π(1,1), ν(1,1) = π(1,0). The Gibbssampler (or optimal hybrid Metropolis chain) for ν attains its stationarity attime 1 (see Theorem 3.2) while the Gibbs sampler for π does not.

We think that our hybrid chains (in particular, the Gibbs sampler) arenow better understood. Moreover, due to the G method, part of the sam-pling methods from exact and approximate sampling theory are now uni�ed,Markovian uni�ed. More precisely, it was put together:

1) the alias method, swapping method, reference method (these are exactsampling methods; the last method is a generalization of the �rst two methods);

2) the methods based on the Gibbs sampler and some of its related chains(namely, the Gibbs sampler with the last n− s coordinates grouped, the chainfrom the �rst paragraph after the proof of Theorem 3.1, and the chain fromthe last paragraph from Section 3), method based on our hybrid Metropolis-Hastings chains (these are, in general (warning!), approximate sampling met-hods; the last method is a generalization of the former methods);

3) the reference method, method based on our hybrid Metropolis-Hastingschains (the uni�cation of methods from 1) and 2)).

The last uni�cation, 3), is due to two things:a) the matrices P1, P2, ..., Pt of hybrid chain and the matrices R1, R2, ...,

Rt of its reference chain have certain common properties, respectively (see theconditions (C1)�(C3), (c1)�(c4), and (A1)�(A3)); supposing that the hybridchain satis�es, in addition, the condition (c4) and its reference chain have the

representatives R1, R2, ..., Rt such that_

R1 =_

P 1,_

R2 =_

P 2, ...,_

Rt =_

P t, wecan even have P1 = R1, P2 = R2, ..., Pt = Rt in some cases, see Theorems 3.6and 3.11;

b) limn→∞

Pn = R, where P = P1P2...Pt and R = R1R2...Rt (see Theo-

rem 1.11 and the �rst paragraph after its proof and Theorem 2.4(iv)); we caneven have P = R (P = R implies Pn = R, ∀n ≥ 1) in some cases, see Theorems3.6 and 3.11.

This uni�cation (done in a nonhomogeneous Markovian framework) streng-thens our statements from [18, p. 227]: �We conclude saying that we believethat the homogeneous Markov chain framework is too narrow to design fastalgorithms of Metropolis-Hastings type. Moreover, we believe that our hybrid

39 G method in action: from exact sampling to approximate one 451

Metropolis-Hastings chain works better than the Metropolis-Hastings chain, atleast on Sn, the set of permutations of order n, and on {0, 1, ..., h}n .�.

This uni�cation (�ve methods in all) is the main contribution of this work.To enlarge this uni�cation, the reader � an exercise for the reader � should, �rst,take the conditional distribution method in the �nite case (on 〈〈h1〉〉×〈〈h2〉〉×...×〈〈hn〉〉 , h1, h2, ..., hn, n ≥ 1) into account. For the conditional distributionmethod, see, e.g., [4, p. 555]. A great dream is the uni�cation of samplingmethods, as many as possible, from exact and approximate sampling theory,but the greatest dream is �nding the fastest sampling methods.

Addendum. The Mallows model through Cayley metric and that throughKendall metric (models for ranked data) are two interesting examples of proba-bility distributions which have something in common with the wavy probabilitydistribution(s) of �rst type. The achievements from

[U. P�aun, G method in action: fast exact sampling from set of permuta-

tions of order n according to Mallows model through Cayley metric, Braz. J.Prob. Stat. 31 (2017), 338�352]and

[U. P�aun, G method in action: fast exact sampling from set of permutati-

ons of order n according to Mallows model through Kendall metric, Submitted]of our hybrid Metropolis-Hastings chain are impressive: fast exact samplings,computing the normalizing constants (exactly) by equation P = e′π, computingcertain important probabilities (also exactly) by Uniqueness Theorem. Ouropinion is that the Metropolis-Hastings chain cannot do these things.

REFERENCES

[1] Y. Amit and U. Grenander, Comparing sweep strategies for stochastic relaxation. J.Multivariate Anal. 37 (1991), 197�222.

[2] A.A. Barker, Monte Carlo calculations of the radial distribution functions for a proton-

electron plasma. Aust. J. Phys. 18 (1965), 119�133.

[3] G. Casella and E.I. George, Explaining the Gibbs sampler. Amer. Statist. 46 (1992),167�174.

[4] L. Devroye, Non-Uniform Random Variate Generation. Springer-Verlag, New York,1986; available at http://cg.scs.carleton.ca/∼luc/rnbookindex.html.

[5] R.L. Dobrushin, Central limit theorem for nonstationary Markov chains, I, II. TheoryProbab. Appl. 1 (1956), 65�80, 329�383.

[6] B. Efron, A 250-year argument : belief, behavior and the bootstrap. Bull. Amer. Math.Soc. 50 (2013), 129−146.

[7] M. Evans and T. Swartz, Approximating Integrals via Monte Carlo and Deterministic

Methods. Oxford University Press, Oxford, 2000.

[8] G.S. Fishman, Coordinate selection rules for Gibbs sampling. Ann. Appl. Probab. 6

(1996), 444�465.

452 Udrea P�aun 40

[9] D. Gamerman and H.F. Lopes, Markov Chain Monte Carlo: Stochastic Simulation for

Bayesian Interference, 2nd Edition. Chapman & Hall, Boca Raton, 2006.

[10] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions and the Bayesian

restoration of images. IEEE Pattern Anal. Mach. Intell. 6 (1984), 721�741.

[11] W.R. Gilks, S. Richardson and D.J. Spiegelhalter (Eds.), Markov Chain Monte Carlo in

Practice. Chapman & Hall, Boca Raton, 1996.

[12] W.K. Hastings, Monte Carlo sampling methods using Markov chains and their applica-

tions. Biometrika 57 (1970), 97�109.

[13] M. Iosifescu, Finite Markov Processes and Their Applications. Wiley, Chichester & Ed.Tehnic�a, Bucharest, 1980; corrected republication by Dover, Mineola, N.Y., 2007.

[14] D.L. Isaacson and R.W. Madsen, Markov Chains: Theory and Applications. Wiley, NewYork, 1976; republication by Krieger, 1985.

[15] N. Madras, Lectures on Monte Carlo Methods. AMS, Providence, Rhode Island, 2002.

[16] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller, Equationsof state calculations by fast computing machines. J. Chem. Phys. 21 (1953), 1087�1092.

[17] U. P�aun, G∆1,∆2 in action. Rev. Roumaine Math. Pures Appl. 55 (2010), 387�406.

[18] U. P�aun, A hybrid Metropolis-Hastings chain. Rev. Roumaine Math. Pures Appl. 56(2011), 207�228.

[19] G.O. Roberts and S.K. Sahu, Updating schemes, correlation structure, blocking and

parametrization for the Gibbs sampler. J. Roy. Statist. Soc. Ser. B 59 (1997), 291�317.

[20] E. Seneta, Non-negative Matrices and Markov Chains, 2nd Edition. Springer-Verlag,Berlin, 1981; revised printing, 2006.

[21] G. Winkler, Image Analysis, Random Fields and Dynamic Monte Carlo Methods: AMathematical Introduction, 2nd Edition. Springer-Verlag, Berlin, 2003.

Received 14 July 2016 Romanian Academy,

Department of Mathematics,

�Gheorghe Mihoc � Caius Iacob Institute�

of Mathematical Statistics

and Applied Mathematics

Calea 13 Septembrie nr. 13

050711 Bucharest 5, Romania

[email protected]