ijicc a version of geiringer-like theorem for decision ...jer/papers/version.pdf · findings –...

A version of Geiringer-liketheorem for decision makingin the environments with

randomness and incompleteinformation

Boris MitavskiyDepartment of Computer Science, Aberystwyth University, Aberystwyth, UK

Jonathan RoweSchool of Computer Science, University of Birmingham, Birmingham, UK, and

Chris CanningsSchool of Mathematics and Statistics, University of Sheffield, Sheffield, UK

AbstractPurpose – The purpose of this paper is to establish a version of a theorem that originated frompopulation genetics and has been later adopted in evolutionary computation theory that will lead tonovel Monte-Carlo sampling algorithms that provably increase the AI potential.

Design/methodology/approach – In the current paper the authors set up a mathematicalframework, state and prove a version of a Geiringer-like theorem that is very well-suited for thedevelopment of Mote-Carlo sampling algorithms to cope with randomness and incomplete informationto make decisions.

Findings – This work establishes an important theoretical link between classical populationgenetics, evolutionary computation theory and model free reinforcement learning methodology.Not only may the theory explain the success of the currently existing Monte-Carlo tree samplingmethodology, but it also leads to the development of novel Monte-Carlo sampling techniques guidedby rigorous mathematical foundation.

Practical implications – The theoretical foundations established in the current work provideguidance for the design of powerful Monte-Carlo sampling algorithms in model free reinforcementlearning, to tackle numerous problems in computational intelligence.

Originality/value – Establishing a Geiringer-like theorem with non-homologous recombination wasa long-standing open problem in evolutionary computation theory. Apart from overcoming thischallenge, in a mathematically elegant fashion and establishing a rather general and powerful versionof the theorem, this work leads directly to the development of novel provably powerful algorithms fordecision making in the environment involving randomness, hidden or incomplete information.

Keywords Decision making, Programming and algorithm theory, Monte Carlo methods,Markov processes, Reinforcement learning, Partially observable Markov decision processes,Monte Carlo tree search, Geiringer theorem, Evolutionary computation theory, Markov chains

Paper type Research paper

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1756-378X.htm

This work has been sponsored by EPSRC EP/D003/05/1 “Amorphous Computing” and EPSRCEP/I009809/1 “Evolutionary Approximation Algorithms for Optimization: Algorithm Designand Complexity Analysis” Grants.

IJICC5,1

36

Received 21 October 2011Revised 14 November 2011Accepted 7 December 2011

International Journal of IntelligentComputing and CyberneticsVol. 5 No. 1, 2012pp. 36-90q Emerald Group Publishing Limited1756-378XDOI 10.1108/17563781211208233

1. IntroductionA great number of questions in machine learning, computer game intelligence, controltheory, and numerous other applications involve the design of algorithms fordecision-making by an agent under a specified set of circumstances. In the mostgeneral setting, the problem can be described mathematically in terms of the state andaction pairs as follows. A state-action pair is an ordered pair of the form !s; ~a" where~a # {a1;a2; . . . ;an} is the set of actions (or moves, in case the agent is playing a game,for instance) that the agent is capable of takingwhen it is in the state (or, in case of a game,a statemight be sometimes referred to as a position) s. Due to randomness, hidden features,lack of memory, limitation of the sensor capabilities, etc. the state may be only partiallyobservable by the agent.Mathematically thismeans that there is a functionf:S ! O (as amatter of fact, a randomvariable with respect the unknownprobability space structure onthe setS) whereS is the set of all stateswhich could be either finite or infinitewhileO is theset (usually finite due to memory limitations) of observations having the property thatwhenever f(s1) # f(s2) (i.e. whenever the agent cannot distinguish states s1 and s2) thenthe corresponding state action pairs !s1; ~a" and !s2; ~b" are such that ~a # ~b (i.e. the agentknows which actions it can possibly take based only on the observation it makes). Thegeneral problemof reinforcement learning is to decidewhich action is best suitedgiven theagent’s knowledge (that is the observation that the agent has made as well as the agent’spast experience). In computational settings “suitability” is naturally described in terms ofa numerical reward value. In the probability theoretic sense the agent aims to maximizethe expected reward (the expected reward considered as a random variable on theenormous and unknown conditional probability space of states given a specificobservation and an action taken). Most common models such as partially observableMarkov decision processes (POMDPs) assume that the next state and the correspondingnumerical rewards depend stochastically only on the current observation and action. In anumber of situations the immediate rewards after executing a single action are unknown.The so-called “model free” reinforcement learning methods, such as Monte Carlotechniques (i.e. algorithms based on repeated random sampling) are exploited to tackleproblems of this type. In such algorithms a large number of rollouts (i.e. simulations orself-plays) are made and actions are assigned numerical payoffs that get updateddynamically (i.e at every simulation of an algorithm). While the simulated self-playsstarted with a specific chosen action, say a, are entirely random, the action a itself ischosen with respect to a dynamically updated probability distribution which ensures theexploration versus exploitation balance: the technique known as upper confidence bounds(UCB). It may be worth emphasizing that the UCB methodology is based on a solidmathematical foundation (Agrawal, 1995; Kaelbling, 1994a, b; Auer, 2002). A combinationof UCB with Monte Carlo sampling lead to tremendous break through in computer Goperformance level (Chaslot et al., 2006; Coulom, 2007; Gelly and Silver, 2011) and muchresearch is currently undergoing to widen the applicability of the method. Some of theparticularly challenging and interesting directions involve decision making in theenvironments (or games) involving randomness, hidden information and uncertainty or in“continuous” environments where appropriate similarities on the set of states must beconstructed due to runtime and memory limitations and also action evaluation policesmust be enhanced to cope with drastic changes in the payoffs as well as an enormouscombinatorial explosion in the branching factor of the decision tree. In recent years anumber of heuristic approaches have been proposed based on the existing probabilistic

Geiringer-liketheorem for

decision making

37

planning methodology. Despite some of these newly developed methods have alreadyachieved surprisingly powerful performance levels (Yoon et al., 2007; Zinkevich et al.,2007; Ciancarini and Favini, 2010; Van den Broeck et al., 2009), the authors believe there isstill room for drastic improvement based on the rigorous mathematical theory originatedfrom classical population genetics (Geiringer, 1944) and later adopted in traditionalevolutionary computation theory (Poli et al., 2003; Mitavskiy and Rowe, 2005, 2006).Theorems of this type are known as Geiringer-like results and they address the limiting“frequencyof occurrence” of various sets of “genes” as recombination is repeatedlyappliedover time. The main objective of the current work is to establish a rather general andpowerful version of a Geiringer-like theorem with “non-homologous” recombinationoperators in the setting of Monte Carlo sampling. This theorem leads to simple dynamicalgorithms that exploit the intrinsic similaritywithin the space of observations to increaseexponentially the size of the already existing sample of rollouts yielding significantlymoreinformative action-evaluation at very little or even no additional computational cost at all.The details of how this is done will be described in Sections 3 and 4. Due to spacelimitations, the actual algorithms will appear in sequel papers. As a matter of fact, webelieve the interested readers may actually design such algorithms on their own afterstudying Sections 3 and 4. It may be worth pointing out that other researchers alsoemphasize the idea of exploiting intrinsic similarities within the state space when copingwith POMDPs (Kee-Eung, 2008).

2. OverviewDue to the interdisciplinary nature of this work the authors did their best to make thepaper accessible on various levels to a potentially wide audience having diversebackgrounds and research interests ranging from practical software engineering toapplied mathematics, theoretical computer science and high-level algorithm designbased on solid mathematical foundation. Section 3 is essential for understanding themain idea of the paper. It provides the notation and sets up a rigorous mathematicalframework, while the informal comments motivating the various notions introduced,assist the reader’s comprehension. Section 4 contains all the necessary definitions andconcepts required to state and to explain the results of the article. It ends with thestatement of Geiringer-like theorem aimed at applications to decision making in theenvironments with randomness and incomplete information where no immediaterewards are available. This is the central aim of the paper. A reader who is only after acalculus level understanding with the aim of developing applications within anappropriate area of software engineering may be satisfied reading Section 4 andfinishing their study at this point. Section 5 is devoted to establishing and deriving themain results of the article in a mathematically rigorous fashion. Clearly this isfundamentally important for understanding where these results come from and how onemaymodify them as needed.We strongly encourage all the interested readers to attemptunderstanding the entire Section 5. Subsection 1 does require familiarity withelementary group theory. A number of textbooks on this subject are available (Dummitand Foote, 1991) but all of them containwaymorematerial than necessary to understandour work. To get the minimal necessary understanding, the reader is invited to look atthe previous papers on finite population Geiringer theorems of the first two authors(Mitavskiy and Rowe, 2005, 2006). Finally, Section 6 is included only for the sake ofstrengthening the general finite-population Geiringer theorem to emphasize its validity

IJICC5,1

38

for nonhomogenious time Markov chains, namely Theorem 23. Example 24 explainswhy this is of interest for the algorithmdevelopment. Thematerial in Section 6 is entirelyindependent of the rest of the paper. One could read it either at the beginning or at theend. The authors suspect this theory is known in modern math, but the literatureemphasizing Theorems 78 and 82 is virtually impossible to locate. Moreover,mathematics behind these theorems is classical, general, simple and elegant. WhileSection 6 is probably not of any interest to software engineers (Theorem 23 may bethought to strengthen the justification of the main ideas), more mathematically inclinedaudience will find it enjoyable and easy to read.

3. Equivalence/similarity relation on the statesLet S denote the set of states (enormous but finite in this framework). Formally eachstate ~s [ S is an ordered pair !s; ~a" where ~a is the set of actions an agent can possiblytake when in the state ~s. Let , be an equivalence relation on S. Without loss ofgenerality we will denote every equivalence class by an integer 1, 2, . . . , i, . . . , [ N sothat each element of S as an ordered pair (i, a) where i [ N and a [ A with A beingsome finite alphabet. With this notation (i, a) , ( j, b) iff i # j. Intuitively, S is the set ofstates and, is the similarity relation on the states. For example, in a card game if thetwo states corresponding to the same player have cards of roughly equivalent value(for that specific game) and their opponent’s cards are unknown (and there might besome more hidden and random effects) then the two states will be consideredequivalent under sim. We will also require that for two equivalent states s1 # {s1; ~a1}and s2 # {s2; ~a2} under, there are bijections f 1 : ~a1 ! ~a2 and f 2 : ~a2 ! w ~a1. For thetime being, these bijections should be obvious from the representation of theenvironment (and actions) and reflect the similarity between these actions.

Remark 1. In theory we want functions f1 and f2 to be bijections and inverses ofone another for the theoretical model to be perfectly rigorous, but in practice thereshould probably be no strict requirement on that. In fact, we believe that in practice onemay even want to relax the assumption on , to be an equivalence relation.

As described in Sections 1 and 2, the most challenging question when applying anMCT type of an algorithm to deal with randomness and incomplete information orsimply with a large branching factor of the game tree is to evaluate the actions underconsideration making the most out of the sample of independent rollouts. Quitesurprisingly, very powerful programs have already been developed and tested inpractice against human players (Zinkevich et al., 2007; Ciancarini and Favini, 2010;Van den Broeck et al., 2009; Xie et al., 2010), however the action-evaluation algorithmsused in these software are purely heuristic and no theoretical foundation is presented toexplain their success. In the next section we will set up the stage to state the main resultof this paper which motivates new algorithms for evaluating actions (or moves) at thechance nodes and hopefully will provide some understanding for the success of thealready existing techniques in the future research.

4. Mathematical framework, notion of crossover/recombination andstatement of the finite population Geiringer theorem for action evaluation4.1 Basic mathematical framework and definitionsDefinition 2. Suppose we are given a chance node ~s # !s; ~a " and a sequence {ai}

bi#1

of actions in ~a (it is possible that a i # aj for i – j ). We may then call ~s a root state, or a


decision making

39

state in question, the sequence {ai}bi#1, the sequence of moves (actions) under

evaluation and the set of moves A # {aja # ai for some i with 1 # i # b}, the set ofactions (or moves) under evaluation.

Definition 3. A rollout with respect to the state in question ~s # !s; ~a" and an actiona [ ~a is a sequence of states following the action a and ending with a terminal labelf [ S where S is an arbitrary set of labels[1], which looks as {(a, s1, s2, . . . , st21, f )}.For technical reasons which will become obvious later we will also require that si – sjfor i – j (it is possible and common to have si , sj though). We will say that the totalnumber of states in a rollout (which is k 2 1 in the notation of this definition) is theheight of the rollout.

Remark 4. Notice that in Definition 3 we included only the initial move a made atthe state in question (see Definition 2) which is the move under evaluation (seeDefinition 2). The moves between the intermediate states are chosen randomly and arenot evaluated so that there is no reason to consider them.

Remark 5. In Subsection 3 we have introduced a convenient notation for states toemphasize their respective equivalence classes. With such notation a typical rolloutwould appear as a sequence {!a; !i1; a1"; !i2; a2"; . . . ; !it21; at21"; f "} with ij [ N whileai [ A. According to the requirement in Definition 3, ij # ik for j – k ) ak – aj.A single rollout provides rather little information about an action particularly due tothe combinatorial explosion in the branching factor of possible moves of the player andthe opponents. Normally a large, yet comparable with total resource limitations,number of rollouts is thrown to evaluate the actions at various positions. Thechallenging question which the current work addresses is how one can take fulladvantage of the parallel sequence of rollouts. Since the main idea is motivated byGeiringer theorem which is originated from population genetics (Geiringer, 1944) andlater has also been involved in evolutionary computation theory (Poli et al., 2003;Mitavskiy and Rowe, 2005, 2006) we shall exploit the terminology of the evolutionarycomputation community here.

Definition 6. Given a state in question ~s # !s; ~a " and a sequence {ai}bi#1 of moves

under evaluation (in the sense of Definition 2) then a population P with respect to the

state ~s # !s; ca" and the sequence {ai}bi#1 is a sequence of rollouts P # rl!i "i

n ob

i#1where ri # a i; s

i1; s

i2; . . . ; s

il!i "21; f i

! "n o. Just as in Definition 3 we will assume

that s ik – s jq whenever i – j (which, in accordance with Definition 3, is as strong asrequiring that sik – sjq whenever i – j or k – q)[2]. Moreover, we also assume that theterminal labels fi are also all distinct within the same population, i.e. for i – j theterminal labels f i – f j[3]. In a very special case when s ij , sqk ) j # k we will say thatthe population P is homologous. Loosely speaking, a homologous population is onewhere equivalent states cannot appear at different “heights”.

Remark 7. Each rollout r l!i "i in Definition 6 is started with the corresponding moveai of the sequence of moves under evaluation (see Definition 2). It is clear that if onewere to permute the rollouts without changing the actual sequences of states thecorresponding populations should provide identical values for the correspondingactions under evaluation. In fact, most authors in evolutionary computation theory(Vose, 1999) do assume that such populations are equivalent and deal with thecorresponding equivalence classes of multisets corresponding to the individuals(these are sequences of rollouts). Nonetheless, when dealing with finite-population

IJICC5,1

40

Geiringer-like theorems it is convenient for technical reasons which will become clearwhen the proof is presented (Mitavskiy and Rowe, 2005, 2006) to assume the orderedmultiset model, i.e. the populations are considered formally distinct when theindividuals are permuted. Incidentally, ordered multiset models are useful for othertypes of theoretical analysis in Schmitt (2001, 2004).

Example 8. A typical population with the convention as in Remark 7 might look asin Figure 1. The height of the leftmost rollout in Figure 1 would then be 5 since itcontains five states. The reader can easily see that the heights of the rollouts in thispopulation read from left to right are 5, 4, 3, 5, 3, 1 and 4, respectively. Clearly, the totalnumber of states within the population is the sum of the heights of all the rollouts in thepopulation. In fact, this very simple observation is rather valuable when establishingthe main result of the current article as will become clear in Subsection 4 of Section 5.

The main idea is that the random actions taken at the equivalent states should beinterchangeable since they are chosen somehow at random during the simulation stageof the MCT algorithm. In the language of evolutionary computing, such a swap ofmoves is called a crossover. Due to randomness or incomplete information (togetherwith the equivalence relation which can be defined using the expert knowledge of aspecific game being analyzed) in order to obtain the most out of a sample (population inour language) of the parallel rollouts it is desirable to explore all possible populationsobtained by making various swaps of the corresponding rollouts at the equivalentpositions. Computationally this task seems expensive if one were to run the type ofgenetic programming (GP) described precisely below, yet, it turns out that we canpredict exactly what the limiting outcome of this “mixing procedure” would be[4]. Wenow continue with the rigorous definitions of crossover.

Figure 1.An example of a

population consistingof seven rollouts

State in question during the simulation andevaluation stages of MCT

1a

a b g a x x p

6a

6d

3c

3a

3d

7a

Notes: Equivalence classes of states are denoted bydistinct numbers so that the letters written next tothese numbers distinguish the individual states as inRemark 5; distinct actions under evaluation (seeDefinition 2) are denoted by different letters of Greekalphabet

f1

f2

f3

f4

f5

f6

f7

5a 1b 6b 4b

2b

7b

5c

2c

2d

4c

6c

1d

2e

3b

5b

2a 4a 1c


decision making

41

Representation of rollouts suggested in Remark 5 is convenient to define crossoveroperators for two given rollouts. We will introduce two crossover operations below.

Definition 9. Given two rollouts r1 # !a1; !i1; a1"; !i2; a2"; . . . ; !it!1"21; at!1"21"; f "and r2 # !a2; ! j1; b1"; ! j2; b2"; . . . ; ! jt!2"21; bt!2"21"; g" of lengths t(1) and t(2),respectively, that share no state in common (i.e. as in Definition 3,) there are two(non-homologous) crossover (or recombination) operators we introduce here. For anequivalence class label m [ N and letters c, d [ A define the one-pointnon-homologous crossover transformation xm;c;d!r1; r2" # !t1; t2" where:

t1 # !a1; !i1; a1"; !i2; a2"; . . . ; !ik21; ak21"; ! jq; bq"; ! jq$1; bq$1"; . . . ; ! jt!2"21; bt!2"21"; g"

and:

t2 # !a2; ! j1; b1"; ! j2; b2"; . . . ; ! jq21; bq21"; !ik; ak"; !ik$1; ak$1"; . . . ; !it!1"21; at!1"21"; f "

if [ik # jq # m and either (ak # c and bq # d ) or (ak # d and bq # c)] and (t1, t2) # (r1, r2)otherwise.Likewise, we introduce a single position swap crossover nm;c;d!r1; r2" #!v1; v2" where:

v1 # !a1; !i1; a1"; !i2; a2"; . . . ; !ik21; ak21"; ! jq; bq"; !ik$1; ak$1"; . . . ; !it!1"21; at!1"21"; f "

while:

v2 # !a2; ! j1; b1"; ! j2; b2"; . . . ; ! jq21; bq21"; !ik; ak"; ! jq$1; bq$1"; . . . ; ! jt!2"21; bt!2"21"; g"

if [ik # jq # m and either (ak # c and bq # d ) or (ak # d and bq # c)] and (v1, v2) # (r1, r2)otherwise. In addition, a singe swap crossover is defined not only on the pairsof rollouts but also on a single rollout swapping equivalent states in the analogousmanner: If:

r #!a; !i1; a1"; !i2; a2"; . . . ; !ij21; aj21"; !ij; aj"; !ij$1; aj$1"; . . . ;

!ik21; ak21"; !ik; ak"; !ik$1; ak$1"; . . . ; !it!1"21; at!1"21"; f "and [ij # ik and either (aj # c and ak # d ) or (aj # d and ak # c)] then

nm;c;d!r" #!a; !i1; a1"; !i2; a2"; . . . ; !ij21; aj21"; !ij; ak"; !ij$1; aj$1"; . . . ;

!ik21; ak21"; !ik; aj"; !ik$1; ak$1"; . . . ; !it!1"21; at!1"21"; f "and, of course, nm,c,d(r) fixes r (i.e. nm;c;d!r" # r) otherwise.

Remark 10. Notice that Definition 9 makes sense thanks to the assumption that norollout contains an identical pair of states in Definition 3.

Remark 11. Intuitively, performing one point crossover means that thecorresponding player might have changed their strategy in a similar situation dueto randomness and a single swap crossover corresponds to the player not knowingthe exact state they are in due to incomplete information, for instance.

Just as in case of defining crossover operators for pairs of rollouts, thanks to theassumption that all the states in a population of rollouts are formally distinct

IJICC5,1

42

(see Definition 6), it is easy to extend Definition 9 to the entire populations of rollouts.In view of Remark 11, to get the most informative picture out of the sequence of parallelrollouts one would want to run the GP routine without selection and mutation andusing only the crossover operators specified above for as long as possible and then, inorder to evaluate a certain move a, collect the weighted average of the terminal values(i.e. the values assigned to the terminal labels via some rational-valued assignmentfunction) of all the rollouts starting with the move a which ever occurred in theprocess. We now describe precisely what the process is and give an example.

Definition 12. Given a population P and a transformation of the form xi,x,y, thereexists at most one pair of distinct rollouts in the population P, namely the pair ofrollouts r1 and r2 such that the state (i, x) appears in r1 and the state (i, y) appears in r2.If such a pair exists, then we define the recombination transformation xi;x;y!P" # P 0

where P 0 is the population obtained from P by replacing the pair of rollouts (r1, r2) withthe pair xi;x;y!r1; r2" as in Definition 9. In any other case we do not make any change,i.e. xi;x;y!P" # P . The transformation ni,x,y(P) is defined in an entirely analogousmanner with one more amendment: if the states (i, x) and (i, y) appear within the sameindividual (rollout), call it:

r # !a; ! j1; a1"; ! j2; a2"; . . . ; !i; x"; . . . ; !i; y"; . . . ; !it!1"21; at!1"21"; f ";

and the state (i, x) precedes the state (i, y), then these states are interchanged obtainingthe new rollout:

r 0 # !a; ! j1; a1"; ! j2; a2"; . . . ; !i; y"; . . . ; !i; x"; . . . ; !it!1"21; at!1"21"; f ":

Of course, it could be that the state (i, y) precedes the state (i, x) instead, in which casethe definition would be analogous: if:

r # !a; ! j1; a1"; ! j2; a2"; . . . ; !i; y"; . . . ; !i; x"; . . . ; !it!1"21; at!1"21"; f "

then replace the rollout r with the rollout:

r 0 # !a; ! j1; a1"; ! j2; a2"; . . . ; !i; x"; . . . ; !i; y"; . . . ; !it!1"21; at!1"21"; f ":

Remark 13. It is very important for the main theorem of our paper that each of thecrossover transformations xi,x,y and ni,x,y is a bijection on their common domain, that isthe set of all populations of rollouts at the specified chance node. As a matter of fact, thereader can easily verify by direct computation from Definitions 12 and 9 that each ofthe transformations xi,x,y and ni,x,y is an involution on its domain, i.e. ;i, x, y wehave x 2

i;x;y # n 2i;x;y # 1 where 1 is the identity transformation.

Examples below illustrate the important extension of recombination operators toarbitrary populations pictorially.

Example 14. Continuing with Example 8, suppose we were to apply therecombination (crossover) operator x1,c,d to the population of seven rollouts pictured inFigure 1. The unique location of states (1, c) and (1, d ) in the population is emphasizedby the boxes in Figure 2. After applying the crossover operator x1,c,d we obtain thepopulation pictured on Figure 3.

On the other hand, applying the crossover transformation n1,c,d to the population inFigures 1 and 2 results in the population pictured on Figure 4.


decision making

43

Example 15. Consider now the population Q pictured in Figure 5. Suppose we applythe transformations x6,a,b and n6,a,b to the population Q.

The states (6, a) and (6, b) are enclosed within the dashed squares in Figure 6.Since these states appear within the same rollout, according to Definition 12, the

crossover transformation x6,a,b fixes the population Q (i.e. x6,a,b(Q) # Q). On the otherhand, the population n6,a,b(Q) is pictured on Figure 7.

4.2 Example: a specific card game “Ecarte”An illustrative example comes from the two-player card game Ecart e, popular inFrance in the nineteenth century. The game proceeds in two phases: the discard phaseand the play. We will be concerned with the discard phase in which each player tries toobtain the best possible hand. Hands are “evaluated” during the second phase by beingplayed against each other (similarly to games in the whist family).

Figure 2.The unique states (1, c)and (1, d ) in the populationpictured in Figure 1 areenclosed in dashedsquares

1a

7a

5a

6a

2a

1b 6b

5b

4b

2b

7b

5c

3b

a a x x pb g

2c

4c3c

6d

2d

1d

2e

6c3d

4a 1c 3a

State in questen during the simulation andevaluation stages of MCT

f1

f2

f3

f4

f5

f6

f7

Figure 3.The subrollouts rooted atthe states (1, c) and (1, d ) inthe population pictured inFigure 2 are pruned andthen swapped

1a

7a

5a

6a

2a

1b 6b

5b 4b

2e

7b

6c

3b

a a x x pb g

2c

4c3c

6d

2d

1c

2b

5c

3d

4a 1d 3a

State in questen during the simulation andevaluation stages of MCT

f1

f2

f3

f4

f5

f6

f7

IJICC5,1

44

Each player is dealt five cards. The non-dealer can then propose an exchange of cards.If the dealer agrees, both players discard a (non-zero) number of cards and replace themwith new cards from the deck. Further rounds of exchange can take place, as long asthe non-dealer proposes and the dealer accepts.

One can see in this discard phase of the game three important elements. First, thereis a considerable amount of missing information, for example regarding the opponent’scards and the random order of the remaining cards in the deck. Second, it is possible fora sequence of exchanges to return a player to a state equivalent to a previous one. Forexample, if a player holds the nine of clubs and the remaining cards are spades, anexchange of the nine of clubs for the nine of diamonds leaves the hand essentiallyunchanged. A more subtle situation would be if the nine of clubs were exchanged for

Figure 4.The uniquely positionedlabels (1, c) and (1, d ) are

enclosed within thedashed squares in Figure 2

are swapped

1b

3c

6d

f2

f4

f5

f1

f3

f7

f65a

5c

6a

3d

7a

4c

6c

4b

2a1a

a b g a x x p

5b 2e

2c6b

7b

2b

4a


1d 3a 2d 3b

1c

Figure 5.A population of rollouts Q

1b

3c

6d

f2 f4 f5

f1f3

f7

f6

5a5c

6a

3d

7a

4c 6c 4b

2a

1a

5b

2e2c6b7b

2b 4a


1d 3a 2d 3b

1c


decision making

45

the ten of clubs. Again the hand is essentially unchanged because the set of cards inplay that each card can defeat is the same. Third, it is not known how long theexchange phase will last. The decision to terminate this phase is one of the availablestrategies. Consequently, rollouts have to accommodate variable length plays withpotential repeating equivalent states, and a large degree of missing information.

4.3 The main idea of the current workSuppose a certain initial population of rollouts has been simulated during a simulationstage of the MCT. Although the rollouts have been simulated independently,

Figure 6.The uniquely positionedlabels (6, a) and (6, b) areenclosed within thedashed squares

1b

3c

6d

f2 f4 f5

f1f3

f7

f6

5c 5a

6a

3d

7a

4c 6c 4b

2a

1a

5b

2e2c6b7b

2b 4a


1d 3a 2d 3b

1c

Figure 7.The uniquely positionedlabels (6, a) and (6, b)which are enclosed withinthe dashed squares onFigure 6 are interchangedto obtain the populationn6,a,b(Q) pictured above

1b

3c

6d

f2 f4 f5

f1f3

f7

f6

5c 5a

6b

3d

7a

4c 6c 4b

2a

1a

5b

2e2c6a7b

2b 4a


1d 3a 2d 3b

1c

IJICC5,1

46

the similarity classes of states encountered during the simulations are likely to repeatin a number of settings (such as, for instance, the French card game Ecarte described inthe previous subsection). Assume now we were to run a GP routine performing swaps(crossovers/recombinations) of rollouts in accordance with Definitions 9 and 12 withoutany selection or mutation. Clearly the swaps correspond to potential populations ofrollouts that could have been simulated just as likely, provided that something differentin the environment (and/or opponent’s hand) has taken place. Intuitively speaking, ifwe were to run the GP longer and longer time, we would be getting significantly moreenriched information about potential outcomes and hence improve the quality of thepayoff estimates. The central idea of this article is that one can actually anticipate thelong term (or limiting) frequency of occurrence of various rollouts provided such a GProutine has ran. This type of predictions is what the Geiringer-like theorems are about.In Mitavskiy and Rowe (2006) a rather general simple and powerful theorem (named“finite population Geiringer theorem”) has been established in the setting of Markovchains (populations being the states of the Markov chain: more on this in the nextsection) which tells us that under certain conditions that are satisfied by mostrecombination operators, the stationary distribution of this Markov chain is uniform.Furthermore, a methodology has been developed to derive what we call “Geiringer-like”theorems that address the limiting frequency of occurrence of various schemata (in ourcase subsets of rollouts: more on this in the upcoming Subsection 6). Based on such atheorem it is not hard to invent efficient parallel dynamic algorithms that estimate theexpected action payoff values based on the sample obtained after the entire “infinitetime” run of the GP routine described above. It may be worthwhile to mention that“homologous recombination” (translating into the setting of MCT this would mean thatthe similarity classes may occur only at the same heights of the corresponding rollouts)versions of a Geiringer-like theorem have been obtained previously in the setting of GPusing the methodology appearing in Mitavskiy and Rowe (2005, 2006). A version ofGeiringer-like theorem with non-homologous recombination (Theorem 40) remained anopen question and will be established in the current article.

4.4 Specializing the finite population Geiringer theorem to the setting of Monte Carlosampling for POMDPsDefinition 16. Let n # {1, 2, . . . , n} denote the set of first n natural numbers.Consider any probability distribution m on the set of all finite sequences of crossovertransformations:

F # <1

n#1{xi;x; yjx; y [ A and i [ N}< {ni;x; yjx; y [ A and i [ N}# $n

% &< {1}

which assigns a positive probability to the singleton sequences[5] and to the identityelement 1 (i.e. to every element of the subset:

S # {1}< {xi;x; yjx; y [ A and i [ N}< {ni; x; yjx; y [ A and i [ N}# $1

:

Given a sequence of transformations ~Q # {Qi! j"; x! j"; y! j"}nj#1 where eachQ is either x or

n (i.e. ;j either Qi! j"; x! j"; y! j" # xi! j"; x! j"; y! j" or Qi! j"; x! j"; y! j" # ni! j"; x! j"; y! j"), consider thetransformation:


decision making

47

~Q # Qi!n"; x!n"; y!n"+Qi!n21"; x!n21"; y!n21"+. . .+Qi!2"; x!2"; y!2"+Qi!1"; x!1"; y!1"

on the set of all populations starting at the specified chance node obtained bycomposing all the transformations in the sequence ~Q. The identity element 1 stands forthe identity map on the set of all possible populations of rollouts. Now define theMarkov transition matrix Mm on the set of all populations of rollouts (see Definition 6and Remark 5) as follows: given populations X and Y of the same size k, the probabilityof obtaining the population Y from the population X after performing a singlecrossover stage, pX!Y # m!SX!Y " where:

SX!Y # {GjG [ F and T!G"!X" # Y}

where:

T!G" #~Q if G # ~Q

The identity map if G # 1:

8<

:

Example 17 below illustrates the first part of Definition 16.Example 17. Consider the sequence of five recombination transformations:

~Q # !x1;c;d; x2;c;e; x5;a;b; x1;a;b; x2;a;b":According to Definition 16 the sequence ~Q gives rise to the composed recombinationtransformation:

~Q # x2;a;b+x1;a;b+x5;a;b+x2;c;e+x1;c;d:

The reader may verify as a small exercise that ~Q!P" # Q where P is the populationshown on Figure 1 while the population Q is the one appearing in Figure 5. If one wereto append the recombination transformation n6,a,b to the sequence of rollouts ~Qobtaining the sequence:

!Q1 # !x1;c;d; x2;c;e; x5;a;b; x1;a;b; x2;a;b; n6;a;b"

then, by associativity of composition, we have ~Q1 # n6;a;b+ ~Q so that ~Q1!P" #n6;a;b! ~Q!P"" # n6;a;b!Q"where Q, as above, is the population shown on Figure 5 so that,according to Example 15, the population

!Q1!P" is the one appearing in Figure 7.

Remark 18. Evidently the map T:F ! P P introduced at the end of Definition 16can be regarded as a random variable on the set F described at the beginning ofDefinition 16 where P denotes the set of all populations of rollouts containing kindividuals so that P P is the set of all endomorphisms (functions with the same domainand codomain) on P and the probability measure mT on P P is the “pushforward”measure induced by T, i.e. mT!S" # m!T21!S""[6]. To alleviate the complexity ofverbal (or written) presentation we will usually abuse the language and use the set Fin place of P P so that a transformation F [ P P is identified with the entire setT21!F" [ F. For example:

if we write m!{FjF [ F and F!X" # Y}" we mean

m!{GjG [ F and T!G"!X" # Y}":

IJICC5,1

48

It may be worth pointing out that the set T 21 is not necessarily a singleton, i.e. themap T is not one-to-one as Example 19 below demonstrates.

Example 19. Consider any i – j and any a, b, c and d [ A. Notice that thetransformations ni,a,b and nj,c,d commute since the order in which elements of distinctequivalence classes are interchanged within the same population of rollouts isirrelevant. Thus, the sequences x1 # !ni;a;b; nj;c;d" and x2 # !nj;c;d; ni;a;b" induce exactlythe same transformation Q on the set of populations of rollouts. Here is another veryimportant example. Notice that every transformation Qi,a,b where Q could be either xor n is an involution on the set of populations of rollouts, i.e. Qi;a;b+Qi;a;b # e where e isthe identity map since performing a swap at identical positions twice brings back theoriginal population of rollouts. Therefore, any ordered pair !Qi;a;b;Qi;a;b" of repeatedtransformations induce exactly the same transformation as the symbol 1, namely theidentity transformation on the population of rollouts.

One more remark is in order here.Remark 20. Notice that any concatenation of sequences in F (which is what

corresponds to the composition of the corresponding functions) stays in F. In otherwords, the family of maps induced by F is closed under composition.

Of course, running the Markov process induced by the transition matrix inDefinition 16 infinitely long is impossible, but fortunately one does not have to do it.The central idea of the current paper is that the limiting outcome as time goes toinfinity can be predicted exactly using the Geiringer-like theory and the desiredevaluations of moves can be well-estimated at rather little computational cost in mostcases. As pointed out in Example 19 above, each of the transformations Qi,a,b is aninvolution and, in particular, is bijective. Therefore, every composition of thesetransformations is a bijection as well. We deduce, thereby, that the family F consistsof bijections only (see Remark 18). The finite population Geiringer theorem (Mitavskiyand Rowe, 2006) now applies and tells us the following.

Definition 21. Given populations P and Q of rollouts at a specified state in questionas in Definition 6 (see also Remark 5), we say that P , Q if there is a transformationF [ F such that Q # F(P).

Theorem 22. The Geiringer Theorem for POMDPs. The relation , introduced inDefinition 21 is an equivalence relation. Given a population P of rollouts at a specifiedstate in question, the restriction of the Markov transition matrix introduced inDefinition 16 to the equivalence class [P ] of the population P under, is a well-definedMarkov transition matrix which induces an irreducible and aperiodic Markov chain on[P ] and the unique stationary distribution of this Markov chain is the uniformdistribution on [P ].

In fact, thanks to the application of the classical contraction mapping principle[7]described in Section 6 of the current paper (namely Theorem 82; interested reader iswelcome to familiarize themselves with Section 6, although this is not essential tounderstand the main objective of the paper), the stationary distribution is uniform in arather strong sense described below.

Theorem 23. Suppose we are given finitely many probability measures m1, m2, . . . ,mN on the collection of sequences of transformationsF as in Definition 16 where eachprobability measure mi satisfies the conditions of Definition 16. Denote by Mi thecorresponding Markov transition matrix induced by the probability measure mi.Let M # {Mi}

Ni#1. Now consider the following stochastic process {!Fn;Xn"}1n#0 on


decision making

49

the state space M £ [P ] where [P ] is the equivalence class of the initial population ofrollouts at the state in question as in Theorem 22:Fn is an arbitrary stochastic process(not necessarily Markovian) on M which satisfies the following requirement:

The random variable Fn is independent of the random variables Xn;Xn$1; . . . !1"

The random variableF0 is arbitrary while X0 # P (recall that P is the initial populationof rollouts at the node in question) with probability 1:

;n [ N the probability distribution of the random variable Xn; namely :

Prob!Xn # · " # Fn21!w" · Prob!Xn21 # · ": !2"It follows then that limn!1Prob!Xn # · " # pwherep is the uniformdistribution on [P ].

We now pause and take some time to interpret Theorem 23 intuitively. Example 24below illustrates a scenario where Theorem 23 applies.

Example 24. Consider the set S of all finite sequences of populations in theequivalence class [P ] of the initial population P which start with the initial populationP (notice that S is a countably infinite set since [P ] is a finite set). Intuitively, eachsequence in S represents prior history. Every sequence ~P # P;P1;P2;P3; . . . ;Pt isassociated with a probability measure h!~P " on the set of populations [P ]. Supposefurther that to every population Q [ [P ] we assign a probability measure mQ on thefamily of recombination transformations induced by F where each measure mQ

satisfies Definition 16. Intuitively, each probability distribution mQmight represent theprobability that the swaps (or sequences of swaps) are reasonable to perform in aspecific population regardless of the knowledge of the prior history or experience inplaying the game, for instance. Starting with the initial population P we apply theprobability measure h(P ) ( here P denotes a singleton sequence) to obtain a populationQ1 [ [P ]. Independently we now apply the Markov transition matrix induced by theprobability measure mP to obtain another population P1 [ [P ]. Next, we select apopulation Q2 with respect to the probability measure h(P, P1 ) and, againindependently, apply the Markov transition matrix mQ1

to the population P1 toobtain a population P2 in the next generation. Continuing recursively, let us say aftertime t [ N we obtained a population Qt at step t and a sequence of populations~Pt # P;P1;P2; . . . ;Pt . Select a population Qt$1 with respect to the probabilitymeasure h!~Pt". Independently select a population Pt$1 via an application of the Markovtransition matrix induced by the probability measure mQt

to the population Pt.Theorem 23 applies now and tells us that in the limit as t ! 1 we are equally likely toencounter any population Q [ [P ] regardless of the choice of the measures involved aslong as the probability measures mQ satisfy Definition 16. A word of caution is in orderhere: it is not in vain that we emphasize that selection is made “independently” here.Theorem 23 simply does not hold without this assumption.

Evidently Example 24 represents just one of numerous possible interpretations ofTheorem 23. We hope that other authors will elaborate on this point. Knowing that thelimiting frequency of occurrence of a any two given populations Q1 and Q2 [ [P ] is thesame, it is sometimes possible to compute the limiting frequency of occurrence of anyspecific rollout and even certain subsets of rollouts using the machinery developed inMitavskiy and Rowe (2005, 2006) which is also presented in Subsection 2 of Section 5 ofthe current paper for the sake of self-containment. To state and derive these

IJICC5,1

50

“Geiringer-like” results we need to introduce the appropriate notions of schemata(Antonisse, 1989; Poli and Langdon, 1998) here.

4.5 Schemata for MCT algorithmDefinition 25. Given a state !s; ~a" in question (see Definition 2), a rollout Holland-Polischema is a sequence consisting of entries from the set ~a<N< {#}< S of the formh # {xi}

ki#1 for some k [ N such that for k . 1we have x1 [ ~a, xi [ Nwhen 1 , i , k

represents an equivalence class of states, and xk [ {#} < S could represent either aterminal label if it is a member of the set of terminal labelsS, or any substring defining avalid rollout if it is a # sign[8]. For k # 1 there is a unique schema of the form #.

Every schema uniquely determines a set of rollouts:

Sh #

{!x1; !x2;a2"; !x3;a3"; . . . ; !xk21;ak21";xk"jai [ A for 1, i, k} if k. 1 and xk [ S

{!x1; !x2;a2"; !x3;a3"; . . . ; !xk21;ak21";! yk;ak"; ! yk$1;ak$1"; . . . ; f "jai [A for 1, i, k; yj [N and aj [ A} if k. 1 and xk # #

the entire set of all possible rollouts if k# 1 or; equivalently; h# #:

8>>>>>>>>>>><

>>>>>>>>>>>:

which fit the schema in the sense mentioned above. We will often abuse the languageand use the same word schema to mean either the schema h as a formal sequenceas above or schema as a set Sh of rollouts which fit the schema. For example,if m h and h * is a schema, we will write h > h * as a shorthand notation for Sh > Sh *

where > denotes the usual intersection of sets. Just as in Definition 3, we will say thatk 2 1, the number of states in the schema h is the height of the schema h.

We illustrate the important notion of a schema with an example below.Example 26. Suppose we are given a schema h # (a, 1, 2, #). Then the rollouts

!a; 1a; 2c; 5a; 3c; f " and !a; 1d; 2a; 3a; 3d; g" [ Sh or one could say that both of them fitthe schema h. On the other hand the rollout !b; 1a; 2c; 5a; 3c; f " ! Sh (or does not fitthe schema h) unless a # b. A rollout !a; 1a; 3a; 5a; 3c; f " ! Sh does not fit the schemah either since x2 # 2 – 3. Neither of the rollouts above fit the schema h * # !a; 1; 2; f "since the appropriate terminal label is not reached in the fourth position. An instance ofa rollout which fits the schema h * would be (a, 1c, 2b, f ).

The notion of schema is useful for stating and proving Geiringer-like results largelythanks to the following notion of partial order.

Definition 27. Given schemata h and gwewill write h . g either if h # #and g – #or h # !x1; x2; x3; . . . ; xk21;#" while g # !x1; x2; x3; . . . ; xk21; yk; yk$1; . . . ; yl21; yl"where yl could be either of the allowable values: a # or a terminal label f [ S.However, if yl # # then we require that l . k.

An obvious fact following immediately from Definitions 25 and 27 is the following.Proposition 28. Suppose we are given schemata h and g. Then h $ g ) Sh $ Sg .

4.6 The statement of Geiringer-like theorems for the POMDPsIn evolutionary computation Geiringer-like results address the limiting frequency ofoccurrence of a set of individuals fitting a certain schema (Poli et al., 2003; Mitavskiy


decision making

51

and Rowe, 2005, 2006). In this work our theory rests on the finite population modelbased on stationary distribution of the Markov chain of all populations potentiallyencountered in the process (see Theorems 22 and 23 and Example 24). The “limitingfrequency of occurrence” (rigorous definition appears in Section 5, Subsection 2,Definitions 43 and 46, however for the readers who aim only at “calculus-level”understanding with the goal of applying the main ideas directly in their softwareengineering work we will discuss the intuitive idea in more detail below) of a certainsubset of individuals determined by a Holland-Poli schema h among all the populationsin the equivalence class [P ] as time increases (i.e. as t ! 1) of the initial population ofrollouts P will be expressed solely in terms of the initial population P and schema h.These quantities are defined below.

Definition 29. For any action under evaluation a define a set-valued function a #from the setVb of populations of rollouts to the power set of the set of natural numbers‘(N) as follows: a # !P" # {iji [ N and at least one of the rollouts in the population Pfits the Holland schema (a, i, #)}. Likewise, for an equivalence class label i [ N definea set valued function on the populations of size b, as i # !P" # {jj’x and y [ A and arollout r in the population P such that r # ! . . . ; !i; x"; ! j; y"; . . ."}< {f j f [ S and ’ anx [ A and a rollout r in the population P such that r # ( . . . ,(i, x),f )}. In words, the seti # (P) is the set of all equivalence classes together with the terminal labels whichappear after the equivalence class i in at least one of the rollouts from the population P.Finally, introduce one more function, namely i #S: Vb !N< {0} by lettingi #S !P" # j{f j f [ S> i # !P"}j, that is, the total number of terminal labels (whichare assumed to be all formally distinct for convenience) following the equivalence classi in a rollout of the population P.

As always, we illustrate Definition 29 in Example 30 below.Example 30. Continuing with Example 8, we return to the population P in Figure 1.

From the picture we see that the only equivalence class i such that a rollout from thepopulation P fits the Holland schema (a, i, #) is i # 1 so that a # !P" # {1}. Likewise,the only equivalence class following the action b is 2, the only equivalence classfollowing the action g is 4 and the only one following p is 3 so that b # (P) # {2},g # (P) # {4} and p # (P) # {3}. The only equivalence classes i following the action jin the population P are i # 3 and i # 2 so that the set j # !P" # {2; 3}.

Likewise the fragment (1, a), (5, a) appears in the first (leftmost) rollout in P, (1, b),(3, c) in the second rollout, (1, c), (4, b) in the forth tollout and (1, d ), (2, e) in the last,seventh rollout. No other equivalence class or a terminal label follows the equivalenceclass of the state 1 in the population P and so it follows that 1 # (P) # {5, 3, 4, 2} and1 #S !P" # j{B}j # 0. Likewise, equivalence class 1 follows the equivalence class 2 inthe second rollout, 7 follows 2 in the forth rollout, 4 follows 2 in the fifth rollout and 6follows 2 in the last, seventh rollout. The only terminal label that follows theequivalence class 2 is f6 in the 6th rollout. Thus, we have 2 # !P" # {1; 7; 4; 6; f 6} and2 #S !P" # j{f 6}j # 1. We leave the reader to verify that:

3 # !P" # {7; 6; 2; 1} so that 3 #S !P" # 0;

4 # !P" # {6; 2; f 5} so that 4 #S !P" # 1;

5 # !P" # {6; f 3; f 4} and so 5 #S !P" # 2;

6 # !P" # {3; 5; f 2; f 7} and so 6 #S !P" # 2

and, finally, 7 # !P" # {5; f 1} so that 7 #S !P" # 1.

IJICC5,1

52

Remark 31. Note that according to the assumption that all the terminal labelswithin the same population are distinct (see Definition 6 together with the comment inthe footnote there). But then, since every rollout ends with a terminal label, we musthave

P1i#1 i #S !P" # b (of course, only finitely many summands, namely these

equivalence classes that appear in the population P may contribute nonzero values toP1i#1 i #S !P") where b is the number of rollouts in the population P, i.e. the size of the

population P. For instance, in Example 30 b # 7 and there are totally seven equivalenceclasses, namely 1-7 that occur within the population in Figure 1 so that we haveP1

i#1 i #S !P" #P7i#1 i #S !P" # 0$ 1$ 0$ 1$ 2$ 2$ 1 # 7 # b.

Another important and related definition we need to introduce is the following.Definition 32. Given a population P and integers i and j [ N representing

equivalence classes, let:

Order!i # j;P" #

0 if i!P" # 0 or j ! i # !P"

j{!!i; a"; ! j; b""j the segment

!!i; a"; ! j; b"" appears in one of the

rollouts in the population P}j otherwise

8>>>>>>>><

>>>>>>>>:

Loosely speaking, Order(i # j, P) is the total number of times the equivalence class jfollows the equivalence class i within the population of rollouts P.

Likewise, given a population of rollouts P, an action a under evaluation and aninteger j [ N, let:

Order!a # j;P" #

0 if i!P" # 0 or j ! a # jj{!a; ! j; b""j the segment

!a; ! j; b"" appears in one of the

rollouts in the population P}j otherwise

8>>>>><

>>>>>:

We now provide an example to illustrate Definition 32.Example 33. Continuing with Example 30 and population P appearing in Figure 1,

we recall that a # (P) # {1}. We immediately deduce that Order(a, j, #) # 0 unlessj # 1. There are two rollouts, namely the first and the forth, that fit the schema (a, 1, #)so that Order(a # 1, P) # 2. Likewise, b # (P) # {2} and exactly one rollout, namely thesecond one, fits the Holland schema (b, 2, #) so that Order(b, j, #) # 0 unless j # 2while Order(b # 2, P) # 1. Continuing in this manner (the reader may want to look backat Example 30), we list all the nonzero values of the function Order (action, P) for thepopulation P in Figure 1: Order!g # 4;P" # Order!j # 3;P" # Order!j # 2;P" #Order!p # 3;P" # 1.

Likewise, recall from Example 30, that 1 # (P) # {5, 3, 4, 2} so that Order(1 # j,P) # 0 unless j # 5 or j # 3 or j # 4 or j # 1. It happens so that a unique rollout existsin the population P fitting each fragment !1; ! j; something in A"" for j # 5, j # 3,j # 4 and j # 2, respectively, namely the first, the second, the forth and thelast (seventh) rollouts. According to Definition 32, we then have


decision making

53

Order(1 # 5,P) # Order(1 # 3,P) # Order(1 # 4,P) # Order(1 # 2,P) # 1. Analogously,2 # (P) # {1,7,4,6,f6} so that Order(2 # j, P) # 0 unless j # 1, 7, 4 or 6. The onlyrollout in the population P involving the fragment with 1 following 2 is the second one,the only one involving 7 following 2 is the forth, the only one involving 4 following 2 isthe fifth, and the only one involving 6 following 2 is the last (the seventh) rollouts,respectively, so that Order(2 # 1,P) # Order(2 # 7,P) # Order(2 # 4,P) # Order(2w6,P) # 1. Continuing in this manner, we list all the remaining nonzero values ofthe “Order” function introduced in Definition 32 for the population P in Figure 1:

Order!3 # 7;P" # Order!3 # 6;P" # Order!3 # 2;P" # Order!3 # 1;P" # 1;

Order!4 # 6;P" # Order!4 # 2;P" # 1;

Order!5 # 6;P" # Order!6 # 3;P" # Order!6 # 5;P" # Order!7 # 5;P" # 1:

Remark 34. It must be noted that all the functions introduced in Definitions 29 and32 remain invariant if one were to apply the “primitive” recombination transformationsfrom the familyS as in Definitions 16 and 12 to the population in the argument. Moreexplicitly, given any population of rollouts P, an action a under evaluation,an equivalence class i [ N, a Holland-Poli schema h # !a; i1; i2; . . . ; ik21; xk" aninteger j with 1 # j # k, and any recombination transformation R [ S, we have:

a # !P" # a # !R!P""; i # !P" # i # !R!P"";i #S !P" # i #S !R!P"";Order!q # r;P" # Order!q # r;R!P"":

Indeed, the reader may easily verify that performing a swap of the elements of the sameequivalence class, or of the corresponding subtrees pruned at equivalent labels,preserves all the states which are present within the population and creates no new ones.Moreover, the equivalence class sequel is also preserved and hence the invariance of thefunctions a # and i # , etc. follows. Since every transformation in the family F is acomposition of the crossover transformations from the familyS, it follows at once thatall of the functions introduced in Definitions 29 and 32 are constant on the equivalenceclasses of populations under the equivalence relation introduced in Definition 21.

Example 35. Recall from Example 14 that the populations in Figures 1, 3 and 4 areequivalent and, likewise, according to Example 15, the populations in Figures 5 and7 are equivalent. Moreover, Example 19 demonstrates that the populations shown inFigures 1 and 5 are also equivalent. Thus, all of the populations that appear in Figures 1,3-5 and 7 belong to the same equivalence class under the relation , introduced inDefinition 21. In view of Remark 34, all the functions appearing in Definitions 29 and32 produce identical values on the populations shown on Figures 1, 3-5 and 7.

Observe that applying any recombination transformation of the form xi,a,b or ni,a,b toa population P of rollouts neither removes any states from the population nor addsany new ones, and hence the following invariance property of the equivalentpopulations that will largely alleviate theoretical analysis in Section 5 follows.

Remark 36. Given any population Q [ [P ], the total number of states in thepopulation Q is the same as that in the population P. Apparently, as we alreadymentioned, the total number of states in a population is the sum of the heights of allrollouts in that population (see Definitions 3 and 6). It follows then, that the sum of theheights of all rollouts within a population is an invariant quantity under

IJICC5,1

54

the equivalence relation in Definition 21. In other words, if Q , P then the sum of theheights of the rollouts in the population Q is the same as the sum of the heights of therollouts in the population P.

There is yet one more important notion, namely that of the “limiting frequency ofoccurrence” of a schema as one runs the GP routine with recombination only we need tointroduce to state the Geiringer-like results of the current paper. Arigorous definition inthe most general framework appears in Subsection 2 of Section 5 (namely, Definitions43 and 46), nonetheless, for less patient readers, who aim only at the “calculus level”understanding, we explain informally what the limiting frequency of occurrence is.

Informal description of the limiting frequency of occurrence: given a schema h anda population P of size m, suppose we run the Markov process {Xn}

1n#0 on the

populations in the equivalence class [P ] of the initial population of rollouts P as inDefinition 16, or, more generally, the non-homogenous time Markov process asdescribed in Theorem 23 (where the Markov transition matrices introduced inDefinition 16 are chosen randomly with respect to another stochastic process (notnecessarily Markovian) that does not depend on the current population but maydepend on the entire history of former populations as well as on other externalparameters independent of the current population). As discussed in the precedingparagraph, this corresponds to “running the GP routine forever” and eachrecombination models the changes in player’s strategies due to incompleteinformation, randomness personality, etc. Up to time t a total of m · t individuals(counting repetitions) have been encountered. Among these a certain number, say h(t),fit the schema h in the sense of Definition 25. We now let F!P; h; t" # !h!t""=m · t to bethe proportion of these individuals fitting the schema h out of the total number ofindividuals encountered up to time t. It follows from Theorem 22 via the instrumentspresented in Section 2 (also available in Mitavskiy and Rowe (2005, 2006)) thatlimt!1F!P; h; t" exists and the formula for it will be given purely in terms of theparameters of the initial population P (more specifically, in terms of the functionsdescribed in Definitions 29 and 32. Although it may be possible to derive the formulasfor limt!1F!P; h; t" in the most general case when the initial population of rollouts Pis non-homologous (in other words when the states representing the same equivalenceclass may appear at various “heights” in the same population of rollouts: seeDefinition 6), the formulas obtained in this manner would definitely be significantlymore cumbersome and would not be as well suited for algorithm development[9] asthe limiting result with respect to “inflating” the initial population P in the sensedescribed below. Remarkably, the formula for the limiting result in the generalnon-homologous initial population case coincides with the one for the homologouspopulations.

Definition 37. Given a population P # {rl!i "i }b

i#1of rollouts in the sense

of Definition 6, where ri # {!ai; ! ji1; ai1"; ! ji2; ai2"; . . . ; ! jil!i "21; ail!i "21"f i"} and a

positive integer m, we first increase the size of the alphabet A by a factor of m:formally, let the alphabet:

A £m # {!a; i "ja [ A; i [ N and 1 # i # m}:

Likewise, we also increase the terminal set of labels S by a factor of m so that:

S £m # {! f ; i "j f [ S; i [ N and 1 # i # m}:


decision making

55

Now we let:

Pm # {r l!i "i;k }1# i# b and 1# k#m

where:

r l! i "i;k # {!ai; ! ji1; ! ai1; k""; ! ji2; ! ai2; k""; . . . ; ! jil!i "21; !ail!i "21; k""; ! f i; k""}:

We will say that the population Pm is an inflation of the population P by a factor of m.Essentially, a population Pm consists of m formally distinct copies of each rollout in

the population P. Intuitively speaking, the stochastic information captured in thesample of rollouts comprising the population Pm (such as the frequency of obtaining astate in the equivalence class of j after a state in the equivalence class of i ) is the sameas the one contained within the population P emphasized by the factor ofm. In fact, thefollowing rather important obvious facts make some of this intuition precise.

Proposition 38. Given a population P of rollouts and a positive integer m considerthe inflation of the population P by a factor of m, Pm as in Definition 37. Then thefollowing are true:

a # !Pm" # a # !P"; i # !Pm" # i # !P"; i #S !Pm" # m · i #S !P"while:

Order!a # j;Pm" # m ·Order!a # j;P";Order!q # r;Pm" # m ·Order!q # r;P" !3"For any population of rollouts Q let Total(Q) denote the total number of states in thepopulation Qwhich is, of course, the same thing as the sum of the heights of all rolloutsin the population Q. Then clearly Total!Pm" # m ·Total!P". In the special case when Pis a homologous population, ;m [ N so is the population Pm.

When using Holland-Poli schemata with respect to any population Q [ [Pm] we willadopt the following convention.

Remark 39. Given a Holland-Poli schema h # !a; i1; i2; . . . ; ik21; f " and apopulation Q [ [Pm], an individual (i.e. a rollout) r of the population Q fits theschema h if and only if it is of the form r # !a; !i1; !a1; j1""; !i2; !a2; j2"" . . . ;!ik21; ak21; jk21"; ! f ; jk"". Informally speaking, everything is as in Definition 25 with theexception that the terminal symbol of the schemah, namely f [ S while the terminalsymbol of the rollout r is an ordered pair of the terminal symbol f coupled with anumerical label between 1 and m so that we require only the first element of theordered pair, namely the function label f, to match.

We are finally ready to state the main result of the current paper.Theorem 40. The Geiringer-like Theorem for MCT. Repeat verbatim the

assumptions of Theorem 23. Let:

h # !a; i1; i2; . . . ; ik21; xk"where xk [ {#}< S be a given Holland-Poli schema. Form [ N consider the randomvariable F(Pm,h,t) described in the paragraph just above (alternatively, a rigorousdefinition in the most general framework appears in Subsection 2 of Section 5:Definitions 43 and 46) with respect to the Markov process Xm

n where m indicates thatthe initial population of rollouts is the inflated population Pm as in Definition 37 with

IJICC5,1

56

the new alphabet A £ m labeling the states (see also Example 24 for help withunderstanding of the Markov process Xn). Then:

m!1limt!1limF!Pm; h; t" #

Order!a # i1;P"b

£

£Yk21

q#2

Order!iq21; iq;P"Pj[iq21# Order!iq21; j;P" $ iq21 #S !P"

!

· LF!P; h" !4"

where:

LF!P; h" #

1 if xk # #

0 if xk # f [ S and f ! xk21 # !P"Fraction if xk # f [ S and f [ xk21 #S !P"

8>><

>>:

where:

Fraction # 1Pj[ik21#!P" Order!ik21; j;P" $ ik21 #S !P"

(we write “LF” as short for “Last Factor”). Furthermore, in the special case when theinitial population P is homologous (see Definition 6), one does not need to take the limitas m ! 1 in the sense that mt!1F!Pm; h; t" is a constant independent of m and itsvalue is given by the right hand side of equation (4)[10].

An important comment is in order here: it is possible that the denominator of one ofthe fractions involved in the product is 0. However, in such a case, the numerator is also0 and we adopt the convention (in this theorem only) that if the numerator is 0 then,regardless of the value of the denominator (i.e. even if the denominator is 0), then thefraction is 0. As a matter of fact, a denominator of some fraction involved is 0 if andonly if one of the following holds: a(P) # 0 or if there exists an index q with1 # q # k 2 1 such that no state in the equivalence class of iq appears in thepopulation P (and hence in either of the inflated populations Pm).

Theorem 40 tells us that given any Holland-Poli rollout schema and a generatingpopulation P, ;1 . 0 ’ a sufficiently largeM so that the right hand side of equation (4)provides an approximation of the limiting frequency of occurrence of the set of rolloutsfitting the schema h starting with the initial population Pm which is the inflation of thepopulation P by a factor of m . M, namely mt!1F!Pm; h; t", with an error at most 1.

As usual, we illustrate Theorem 40 with a numerical example.Example 41. Suppose we were to start with an initial population P pictured on

Figure 1 (of course, it does not matter which of the populations on Figures 5 and 6, etc.we were to start with since all of them are in the same equivalence class underrecombination). Let us say, we are interested in the limiting frequency of occurrence ofthe Holland rollout schema h # !g; 4; 6; f 7". Theorem 40 then tells us that:


Order!g # 4;P"b

·Order!4 # 6;P"P

j[4#;P Order!4 # j;P" $ 4 #S !P"£


decision making

57

£ 1Pj[6#;P Order!6 # j;P" $ 6 #S !P" :

Clearly b # 7 since there are totally seven rollouts in the population P. In Examples 30and 33 we have seen that Order!g # 4;P" # 1, Order!4 # 6;P" # Order!4 # 2;P" # 1,4 #S !P" # 1, Order!6 # 3;P" # Order!6 # 5;P" # 1 and 6 #S !P" # 2 while theremaining function values in the right hand side of the equation above vanish. Wethen deduce that:


1

7·

1

1$ 1$ 1·

1

1$ 1$ 2# 1

84

Theorem 40 is the main result of the current work. It motivates a variety of algorithmsfor evaluating the actions based on the entire, fairly large and seemingly pairwisedisconnected sample of independent parallel rollouts that fully take advantage of theexponentially many possibilities already available within that sample and, at the sametime, should be rather efficient in many situations. The idea behind the construction ofsuch Monte Carlo modification algorithms is to have a bunch of independent agentscrawling over the dynamically constructed and updated digraph with conditionalprobabilities being equal to the fractional factors in the statement of Theorem 40finishing at some terminal state and then updating the expected payoff value stored atthe action that the agent started using the same policy as adopted in traditional MCT.These algorithms as well as the experiments will be the subject of sequel papers.

5. Deriving Geiringer-like theorems for POMDPs5.1 Setting, notation and the general finite-population Geiringer theoremThroughout Section 5 (the current section) the following notation will be used: V is afinite set, called a search space. We fix an integer b [ N and we call Vb #{!x1; x2; . . .xb"jxi [ V} the set of populations of size b; every element ~x #!x1; x2; . . .xb"T [ Vb is called a population of size b and every element x [ V iscalled an individual. Notice that we prefer to think of a population as a “column vector”(hence the “transpose symbol”). Of course, this is just the matter of preference, butnormally when we list the individuals it is natural to write each individual as a stringof “genes or alleles” which appear on the same row and so the b individuals appear on bseparate rows. It is important to emphasize here that populations are ordered b-tuplesso that !x1; x2; . . .xb"T – !xb; x2; . . .x1"T unless x1 # xb. By a family of recombinationtransformations we mean a family of functions F # {FjF : Vb !Vb}. The generalfinite population Geiringer theorem then says the following.

Theorem 42. The Finite Population Geiringer Theorem for EvolutionaryAlgorithms. Suppose we are given a probability measure on the family ofrecombination transformations F on the set of populations Vb of size b as describedabove. Suppose further there is a subfamilyS # F which generates the entire familyF in the sense that;F [ F’ a finite sequence of transformations S1; S2; . . . ; Sl [ Ssuch that F # S1+S2+. . .+Sl . Assume the following about the probability measure m:

;S [ S we have m!S" . 0: !5"

The identity map 1 : Vb !Vb is in S !6"

IJICC5,1

58

Most importantly, assume that every recombination transformation S [ S is bijective(i.e. a one-to-one and onto function onVb). Consider theMarkov transitionmatrixMwithstate space Vb defined as follows: given populations ~x and ~y [ Vb, we let:

p~x!~y # m!{FjF [ F and F!~x" # ~y}": !7"

Nowdefine a relation, onVb as follows: ~x , ~y if and only if’k [ N and recombinationtransformations F1;F2; . . . ;Fk [ F such that %F1+F2+. . .+Fk&!~x" # ~y. We now assertthe following facts:

, is an equivalence relation: !8"

Given an equivalence class of some population ~x; call it %~x&;

the restriction of the Markov transition matrix M to %~x&

is a well–defined Markov transition matrix on the state space %~x&; call it M j%~x&: !9"

;~x [ Vb the Markov transition matrix M j%~x& is doubly stochastic and

it defines an irreducible and aperiodic Markov chain on %~x&: !10"

;~x[Vb the unique stationary distribution of M j%~x& is the uniform distribution on %~x&:!11"

Theorem 42 is a simple yet elegant consequence from basic group theory. In this paperwe assume that the reader is familiar with fundamental notions about groups andgroup actions. Nearly any standard textbook in Abstract Algebra such as, for instance,Dummit and Foote (1991) contains way more group theoretic material than necessaryfor our purpose. For a brief introduction we invite the reader to study Mitavskiy andRowe (2006).

Proof. Since the family of transformations S consists entirely of bijections andany composition of bijections is also a bijection, the family F also consists solely ofbijections. It follows then that the familyF generates a subgroup G of the group of allpermutations on the finite set Vb. Notice that the probability measure m naturallyextends to the entire group G generated by F by defining:

mext!g" #m!g" if g [ F

0 otherwise:

(

Clearly the Markov process defined in the statement of Theorem 42 (see equation (7))can be redefined as:

p~x!~y # mext!{gjg [ G and g!~x" # ~y}": !12"

Furthermore, notice that the group G is of size no bigger than jVbj! , 1 sincejVj , 1. It follows then that every element g [ G can be written as a finitecomposition g # F1+F2+. . .+Fk for F1;F2; . . . ;Fk [ F (because every element F [F # G is a torsion element of G, i.e. F l # 1 for some l [ N so that F l21 # F21). Butthen the relation, can be redefined as ~x , ~y if and only if ’g [ G such that g!~x" # ~y.


decision making

59

We now quickly recognize that the relation, is the orbit-defining equivalence relationwhich partitions the set of all populations of size b, Vb, into the orbits under the actionof the group G. The assertions expressed in equations (8) and (9) now follow at once. Toverify equation (10) we choose any ~y [ Vb and compute directly:

cx[Vb

Xp~x!~y #

~x[Vb

Xmext!{gjg [ G and g!~x" # ~y}" #

#~x[Vb

Xmext!{gjg [ G and g21!~y" # ~x}" # mext!G" # 1

since the sets K!x" # {gjg [ G and g21!~y" # ~x} clearly form a partition of G. Wehave now shown that the Markov transition matrix M is doubly stochastic.Irreducibility follows from finiteness together with the fact that S generates F. Since1 [ S, aperiodicity follows as well. Now the classical result about Markov chains tellsus that there is unique stationary distribution and sinceM is doubly stochastic it mustbe the uniform distribution so that the final assertions expressed in equations (10) and(11) follow at once. A

5.2 A methodology for the derivation of Geiringer-like resultsThe classical Geiringer theorem (Geiringer, 1944) from population genetics tells ussomething about the “limiting frequency of occurrence of certain individuals in apopulation” rather than referring to the limiting distribution of populations. Infact, themathematical model of the classical Geiringer theorem in Geiringer (1944) is entirelydifferent from that of the finite-population Geiringer theorem described in the previoussection. Nonetheless, the finite-population Markov chain model is much more suitedwhen dealing with evolutionary algorithms since all the structures, including thesearch space and populations, in the computational setting are finite and the model inMitavskiy and Rowe (2005, 2006) as well as in the current paper describes exactly whathappens during a stochastic simulation. Knowing that some stochastic process {Xt}

1t#0

on some equivalence class of populations %~x& tends to the uniform distribution over thepopulations (i.e. ;~y [ %~x& we have limt!1P!Xt # ~y" # 1=j%~x&j) it is often possible todeduce what we call Geiringer-like theorems which express the limiting frequency ofoccurrence of specific individuals and specific sets of individuals in terms of theinformation contained in a single representative of the equivalence class only (say, theinitial population). Of course, we need to formulate precisely what the “limitingfrequency of occurrence” is.

Definition 43. Consider a function X : ‘!V" £Vb ! {0; 1; 2; . . . ; b} where ‘(V)denotes the power set of V (i.e. the set of all subsets of V) and Vb is the set of allpopulations of size b, as usual, defined as follows: given a subset S # V and apopulation ~x # !x1; x2; . . . ; xb" [ Vb, we define a function X!S; ~x" # j{ij0 # i #b; xi [ S}j to be the number of individuals in the population ~x which belong to thesubset S (counting their multiplicities).

Example 44. Let us say S # {a} is a singleton set, b # 3 and ~x # !u; v; u" whereu – v. Then X!S; ~x" # 2 since x1 # x3 # u [ S while x2 # v ! S.

Remark 45. Observe that if we fix a subset S # V and let the second argument inthe function X vary, then we get a function of one variable X!S;A" : Vb !{0; 1; 2; . . . ; b} defined naturally by plugging a population of size b in place of the A.

IJICC5,1

60

Definition 46. Choose a subset S # V an equivalence class %~x& of populations ofsize b and let {Xt}

1t#0 be any stochastic process on %~x& (~x could be an initial population,

for instance). It makes sense now to define a random variable:

F!S; ~x; t" #Pt21

i#0 X!S;Xi"b · t

:

Clearly the random variable F!S; ~x; t" counts the fraction of occurrence (or frequencyof encountering) the individuals from the set S before time t. In generallimt!1F!S; ~x; t" does not exist. However, under “nice” circumstances describedbelow everything works out rather well.

Lemma 47. Suppose there is an “attractor” probability distribution r on theequivalence class %~x& for the stochastic process {Xt}

1t#0 in the sense that if X0 # x with

probability 1 then limt!1P!Xt # · " # r where P!Xt # · " denotes the probabilitydistribution of the random variable Xt which can be thought of in terms of a vector inRj%~x&j so that the limt!1 is taken with respect to the L1 norm, let us say[11]. Then:

t!1limF!S; ~x; t" # 1

bEr X!S;A"j%~x&# $

where f . Er denotes the expectation with respect to the probability distribution r on%~x&, while X!S;A"j%~x& is the restriction of the function X(S, A) introduced in Remark45 to the equivalence class %~x&.

A sketch of the proof. Consider a “constant” stochastic process Yt whereeach random variable Yt is distributed according to r. By assumption kP!Xt # · "2P!Yt # · " kL1

! 0 as t ! 1. On the other hand, by the law oflarge numbers:

Er X!S;A"j%~x&# $

#t!1lim

Pt21i#0 X!S;Yi"

tafter routine 12 details

t!1lim

Pt21i#0 X!S;Xi"

t#

# b ·t!1lim

Pt21i#0 X!S;Xi"

b · t#

t!1limF!S; ~x; t"

so that the desired assertion follows after dividing both sides of the equation above by b.In our specific case, thanks to Theorem 42, the probability distribution r in

Lemma 47 is the uniform distribution on the equivalence class %~x&.Notice that a random variable:

X!S;A" #Xb

1#1

Ii!S;A" !13"

where Ii(S, A) is the indicator function of the ith individual in the argumentpopulation with respect to the membership in the subset S. More explicitly, if we aregiven a population ~x # !x1; x2; . . .xb"T then:

Ii!S; ~x" #1 if xi [ S

0 otherwise:

(

!14"

Assume now that all transpositions of individuals within the same population areamong the transformations in the familyS (see the statement of Theorem 42). In other


decision making

61

words, ;i , j the transformation Ti,j sending a population ~x #!x1; x2; . . . ; xi21; xi; xi$1; . . . ; xj21; xj; xj$1; . . . ; xb"T into the population Ti;j!~x" #!x1; x2; . . . ; xi21; xj; xi$1; . . . ; xj21; xi; xj$1; . . . ; xb"T has positive probability of beingchosen. Notice that this is usually a very reasonable assumption since the order ofindividuals in a population should not matter in practical applications. Then weimmediately deduce that any given population ~y [ %~x& if and only if the correspondingpopulation Ti;j!~y" obtained by swapping the ith and the jth individuals in thepopulation ~y is a member of %~x&. When r is the uniform distribution (as in Theorem 42),this is equivalent to saying that all the indicator random variables Ii(S, A) defined inequation (14) above are identically distributed independently of the index i. Inparticular, they are all distributed as I1(S, A). Using equation (13) together withlinearity of expectation, we now deduce that if p denotes the uniform distribution on%~x& then:

Ep X!S;A"j%~x&# $

#Xb

1#1

Ep Ii!S;A"j%~x&# $

# b ·Ep I1!S;A"j%~x&# $

#

# b ·p!{~yj~y# ! y1;y2; . . . ;yb"T [ %~x& and y1 [ S}" # b ·jV!~x;S"jj%~x&j :

!15"

where:

V!~x; S" # {~yj~y # ! y1; y2; . . . ; yb"T [ %~x& and y1 [ S} !16"

is the subset of [~x& consisting solely of populations in %~x& the first individuals of whichare members of the subset S # V. Combining equation (15) with the conclusion ofLemma 47 immediately produces the following very useful fact.

Lemma 48. Under exactly the same setting and assumptions as in Theorem 42together with an additional assumption that all the “swap” transformations definedand discussed in the paragraph following equation (14) are members of the subfamilyS of the family F of recombination transformations, it is true that ;S # V we have:

t!1limF!S; ~x; t" # jV!~x; S"j

j%~x&j

where the set V!~x; S" is defined in equation (16).Lemma 48 allows us to derive Geiringer-like theorems in a rather straightforward

fashion for several classes of evolutionary algorithms via the following simplestrategy: suppose we are given a subset S # V. According to Lemma 48, all we have todo to compute the desired limiting frequency of occurrence of a certain subset S # V isto calculate the ratio !jV!~x; S"j"=j%~x&j. For some subsets of the search space such aratio is quite obvious, yet for others it may be combinatorially non-achievable.In evolutionary computation, it is often possible to define an appropriate notionof schemata (this is precisely what we have done in Section 5 for the case of MCT)which has, intuitively speaking, a “product-like flavor” that allows us to exploit thefollowing observation: suppose we can find a sequence of subsetsS1 $ S2 $ · · · $ Sn21 $ Sn # S. We can then write:

IJICC5,1

62

t!1limF!S; ~x; t" # jV!~x; S"j

j%~x&j # jV!~x; S"jjV!~x; Sn21"j

·jV!~x; Sn21"jjV!~x; Sn22"j

· . . . ·jV!~x; S1"j

j%~x&j #

by Lemmas 48 and 47# 1

bEr X!S1;A"j%~x&# $

·Yn21

k#1

jV!~x; Sk$1"jjV!~x; Sk"j

!17"

The idea is that the individual ratios in the right hand side of equation (17) may bequite simple to compute as happens to be the case when deriving finite populationGeiringer-like theorems for GP with homologous crossover (Mitavskiy and Rowe, 2005,2006). When deriving the finite population version Geiringer-like theorem withnon-homologous recombination in the limit of large population size, rather thancomputing the ratios in equation (17), we will instead estimate each one of them fromabove and from below exploiting the main Geiringer theorem (Theorem 42) togetherwith the methodology for estimating the stationary distributions of Markov chainsbased on the lumping quotient construction appearing in Mitavskiy et al. (2006, 2008)and Mitavskiy and Cannings (2007). All of the necessary apparatus and one enhancedlemma will be summarized and presented in the next subsection for the sake ofcompleteness.

5.3 Lumping quotients of Markov chains and Markov inequalityThroughout the current subsection we shall be dealing with a Markov chain M(not necessarily irreducible) over a finite state space X. {px!y} denotes the Markovtransition matrix with the convention that px!y is the probability of getting y in thenext stage given x. Let p denote a stationary distribution of the Markov chainM (herewe will assume that at least one stationary distribution does exist). Furthermore, wewill assume that the stationary distribution p has the property that ;x [ X p(x) – 0.Suppose we are given an equivalence relation , partitioning the state space X. Theaim of the current section is to construct a Markov chain over the equivalence classesunder , (i.e. over the set X/, ) whose stationary distribution is compatible with thedistribution p and then to exploit the constructed lumped quotient chain to estimatecertain ratios of the stationary distribution values. In fact, this methodology has beensuccessfully used to establish some properties of the stationary distributions of theirreducible Markov chains modeling a wide class of evolutionary algorithms(Mitavskiy et al., 2006, 2008; Mitavskiy and Cannings, 2007).

Definition 49. Given a Markov chain M over a finite state space X determined bythe transition matrix {px!y}, an equivalence relation , on X, and a stationarydistribution p of the Markov chain M satisfying the property that ;x [ X p(x) – 0,define the quotient Markov chain M/, over the state space X/, of equivalenceclasses via , to be determined by the transition matrix {~pU!V}U;V[X=, given as:

~pU!V # 1

p !U"x[U

Xp !x" · px!V # 1

p !U"x[U

X

y[V

Xp !x" · px!y:

Here px!V denotes the transition probability of getting somewhere inside of V givenx. Since V # <y[V {y} it follows that px!V #Py[V px!y and hence the equationabove holds.


decision making

63

Intuitively, the quotient Markov chain M/, is obtained by running the originalchain M starting with the stationary distribution p and computing the transitionprobabilities of the assiciated stochastic process conditioned with respect to thestationary input. Thereby, the following fact should not be a surprise.

Theorem 50. Let p denote a stationary distribution of a Markov chain Mdetermined by the transition matrix {px!y}x;y[X and having the property that;x [ X p(x) – 0. Suppose we are given an equivalence relation , partitioning thestate space X. Then the probability distribution ~p defined as ~p!{O}" # p!O" is astationary distribution of the quotient Markov chain M/, assigning nonzeroprobability to every state (i.e. to every equivalence class under , ).

Proof. This fact can be verified by direct computation. Indeed, we obtain:

O[X=,

X~p!{O}" · ~pO!U #

O[X=,

Xp!O" · 1

p!O"x[O

X

z[U

Xp!x" · px!z #

#x[X

X

z[U

Xp!x" · px!z #

z[U

X

x[X

Xp!x" · px!z

by stationarity of p

#

#z[U

Xp!z" # p!U" # ~p!{U}":

This establishes the stationarity of ~p and Theorem 50 now follows.Although Theorem 50 is rather elementary it allows us to deduce interesting and

insightful results (Mitavskiy et al., 2006, 2008; Mitavskiy and Cannings, 2007) via theobservations presented below. To state these results it is convenient to generalize thenotion of transition probabilities in the following manner (which is coherent withDefinition 49). A

Definition 51. Given a Markov chain M with state space X and a stationarydistribution p, for any two subsets A and B # X, we define:

pA!B #X

a[A

p!a"=p!A" pa!B

where pa!B #Pb[B pa!b.Remark 52. It is worth emphasizing that in case when B # A or A > B # B, the

transition probabilities pA!B are precisely the transition probabilities of variousquotient Markov chains with states which have A and B as their states according toDefinition 49. In particular, if we consider the quotient Markov chain comprised of thestates, A and A c where A c denotes the complement of A, we have 12 pA!A # pA!Ac .

In the current paper we will use a lumping quotient chain consisting of only twoequivalence classes, A and B # A c (i.e. the complement of A in the state space X). Fora 2 by 2 Markov transition matrix we easily see that if p denotes the unique stationarydistribution of the original Markov chain M then, thanks to Theorem 50, we havep!A"pA!A $ p!B"pB!A # p!A" so that p!B"pB!A # p!A"!12 pA!A" # p!A"pA!B

and, if neither A nor B is empty, we have:

p!A"p!B" #

pB!A

pA!B!18"

IJICC5,1

64

Equation (18), tells us that in order to estimate the ratio of the stationary distributionvalues of the Markov chainM on a pair of complementary subsets of the state space Aand B # A c, it is sufficient to estimate the ratio of the generalized transitionprobabilities pB!A and pA!B. Although these transition probabilities do depend on thestationary distribution itself, it is sometimes possible to estimate them using aconvexity-based bound appearing in Mitavskiy et al. (2006, 2008) and Mitavskiy andCannings (2007). For the purpose of the present work we need to introduce a mildgeneralization of this bound appearing below.

Lemma 53. Suppose, as in Definition 51, A and B # X and U # X such that:

p !U > A"p !A" # 1 , 1:

Suppose further that for some constant k with 0 # k # 1 the following is true: ;a [A> U c we have pa!B # k. Then we have pA!B # !12 1"k$ 1. Dually, assume thatfor a constant l with 0 # l # 1 it is true that ;a [ A> U c we have pa!B $ l. ThenpA!B $ !12 1"l.

Proof. Indeed, we have:

pA!B #a[A

X p !a"p !A" pa!B #

a[A>U c

X p !a"p !A" pa!B $

a[A>U

X p !a"p !A" pa!B: !19"

Notice that:

a[A>U c

X p !a"p !A" #

p !A> U c"p !A" # 12

p !U > A"p !A" $ 12 1

while 0 #a[A>U

X p !a"p !A" #

p !A> U "p !A" , 1 !20"

The desired inequalities now follow when we plug in the bounds in the assumptionsinto equation (19) and then use the inequalities in equation (20) together with the factthat probabilities are always between 0 and 1. A

In a special case when U # B Lemma 53 entails the following.Corollary 54. Given any two subsets A and B # X, if for some constant k with

0 # k # 1 it is true that ;a [ Awe have pa!B # k then pA!B # k. Dually, if for someconstant l with 0 # l # 1 it is true that ;a [ A we have pa!B $ l then pA!B $ l.Consequently, if for some constant g it happens that ;a [ A we have pa!B # g thenpA!B # g.

Combining equation (18) with Lemma 53 readily gives us the following.Lemma 55. Suppose A and B # X is a complementary pair of subsets

(i.e. A > B # B and A < B # X). Suppose further that U # X is such that:

p !U > A"p !A" , 1 , 1 and

i!U > B"p !B" , d , 1:

Assume now that we find constants l1, l2, k1 and k2 such that ;b [ U c > B we havel1 # pb!A # k1 and ;a [ U c > A we have l2 # pa!B # k2. Then we have:


decision making

65

!12 d"l1!12 1"k2 $ 1

#p!A"p!B" #

!12 d"k1 $ d

!12 1"l2In order to apply Lemma 55 effectively we need to know that both, (p(U > A))/p(A)and !p!U > B""=p!B" are small. As we shall see in the next subsection, the inductivehypothesis will imply that at least one of these ratios is small. The following simplelemma will allow us to deduce that the remaining ratio is also small as long as a certainratio of generalized transition probabilities is bounded below.

Lemma 56. Suppose A and B # X with A > B # B (notice that we do not requireA < B # X). Then:

p !A" $ p !B" · pB!A

pA!Ac

:

Proof. Let C # X> !A< B"c. Consider the lumped Markov chain on the state space{A, B, C}. Since p is the stationary distribution of the Markov chainM, by Theorem 50(see also Definition 51 and Remark 52) we have:

p !A" # p !B"pB!A $ p !A"pA!A $ p !C"pC!A

so that:

!12 pA!A"p !A" # p !B"pB!A $ p !C"pC!A $ p !B"pB!A

since probabilities are nonnegative. The desired conclusion now follows when dividingboth sides of the inequality above by 12 pA!A # pA!Ac . A

Finally, there is another very simple and general classical inequality that willbe elegantly exploited in the next section to set the stage for the application ofLemma 55 allowing us to avoid unpleasant combinatorial complications.

Lemma 57. Markov inequality. Suppose H is a non-negative valued randomvariable on a probability space V with probability measure Pr. Then ;l . 0 we have:

0 , Pr!H . l ·E!H "" # 1

l! 0 as l!1:

Proof. By definition of expectation we have:

E!H " #Z

VHdPrby positivity of H $

Z

H.l ·E!H "HdPr$ Pr!H . l ·E!H "" · !l ·E!H "":

Now, if Pr(H . 0) # 0 then H # 0 almost surely so that E(H) # 0 and

Pr!H . l ·E!H "" # Pr!H . 0" # 0 ,1

l:

Otherwise, Pr!H . 0" . 0 ) E!H " #RV HdPr . 0 and the desired inequality follows

when dividing both sides of the equation above by l ·E(H). AWe end this section with a very well-known elementary fact about Markov chains

having symmetric transition matrices that will also be used in the proof of Theorem 40.

IJICC5,1

66

Proposition 58. LetM be any Markov chain determined by a symmetric transitionmatrix. Then the uniform distribution is a stationary distribution of the Markov chainM (notice that M is not assumed to be irreducible).

Proof. The reader may easily see that the Markov transition matrix isdoubly-stochastic or verify that the uniform distribution is stationary directly fromthe detailed balance equations[12]. A

5.4 Deriving the Geiringer-like theorem (Theorem 40) for the MCT algorithmWe now recall the setting of Section 4. At first we will prove the theorem for a mildlyextended family of recombination transformations ~F where in addition to thetransformations in Definition 12 ~F also contains all the transpositions (or swaps) of therollouts in a population and these are selected with positive probability (a detaileddescription appears in paragraph following equation (14)). Since every transpositionof rollouts is a bijection on the set of all populations, Theorem 42 still applies,except that the equivalence classes will be enlarged by a factor of (b ·m)!,i.e. %Pm& ~F # !b ·m"! · %Pm&F (this is so because every permutation is a composition oftranspositions). Thanks to the assumption we will be in a position to apply the toolsbased on Lemma 48, namely equation (17). This assumption will be dropped at the endvia apparent symmetry considerations. Indeed, any permutation p of the rollouts in apopulationQ [ [Pm] naturally commutes with all the recombination transformations inDefinition 16 thereby providing a family of bijections between the equivalence class[Pm]F and each of the (b ·m)! disjoint pieces comprising the partition of the equivalenceclass %Pm& ~F. Furthermore, permutations preserve the multisets of rollouts within apopulation so that the frequencies of occurrence of various subsets in the correspondingpieces will be preserved and, thereby, the conclusion of Theorem 40 with the family ofrecombination transformations F replaced by ~F will be exactly the same.

Recall the schema:

h # !a; i1; i2; . . . ; ik21; xk"of height k 2 1 $ 0 in the statement of Theorem 40. Notice that thanks to Proposition 28we can write the given schema h as:

h # hk # hk21 # hk22 # . . . # h2 # h1 !21"where h1 # !a; i1;#" and, in general, when 1 # j , k

hj # !a; i1; i2; . . . ; ij;#"

are Holland schemata. Thanks to equation (17), ;m [ N we have:

t!1lim F!h;Pm; t" #

1

bEr X!h1;A"j%Pm& ~F

! "·Yk21

q#1

jV!Pm; hq$1"jjV!Pm; hq"j

!22"

and, taking the limit as m ! 1:

m!1limt!1limF!h;Pm; t" #

1

b m!1lim Er X!h1;A"j%Pm& ~F

! "·Yk21

q#1m!1lim


!23"


decision making

67

where r is the uniform distribution on [P ]m. First of all, notice that;,m [ N the randomvariable X!h1;A"j%Pm& ~F is a constant function which is equal to Order!a # i1;Pm" #Order!a # i1;P" (see Remark 34 and Proposition 38). It follows trivially then thatEp X!h1;#

A"j%Pm& ~F# Order!a # i1;P" giving us the first ratio factor in the right hand

side of equation (4). In particular, when h # h1 is a schema of height 0 ending with a #,there is no need to take the limit asm ! 1 regardless of whether or not the population Pis homologous. To deal with the remaining ratios in the general case, when thepopulation P is not necessarily homologous, we will exploit the classical and elementaryMarkov inequality (Lemma 57 in a rather elegant manner) to set up the stage for theapplication of Lemmas 53 and 55 in the following manner.

Consider the random variable Hi : %Pm&!N where [Pm] is equipped with theuniform probability measure r, measuring the height of the ith rollout in the populationQ [ [Pm]. In other words:

Hi!Q" # the height of the ith rollout in the population Q:

Notice that ;,i and j with 1 # i # j # b ·m the random variables Hi and Hj areidentically distributed (indeed, thanks to Theorem 42, the swap of the rollouts i and j inthe population Pm is an isomorphism of the probability space [Pm] with itself, call it t,such that Hi+t # Hj and vice versa). In particular, these random variables have thesame expectation. Thanks to Remark 36 and Proposition 38, we deduce that:

E!H 1" #Pb ·m

i#1 E!Hi"b ·m

#EPb ·m

i#1 Hi

! "

b ·m#

# Total!Pm"b ·m

# m · Total!P"b ·m

# Total!P"b

:

!24"

Notice that the right hand side of equation (24) does not depend on m. Inother words, ;m [ N the expected height of the first rollout in the population Pm is thesame and is equal to Total(P)/b. At the same time, according to Proposition 38, thefunctions:

Order!a # j;Pm"!1 and Order!i # j;Pm"!1 as m!1: !25"The above observation opens the door for the application of Markov inequality thatwill, in turn, allow us to exploit Lemma 55 with the aim of estimating the desired ratiosinvolved in equation (17) and then showing that the upper and the lower bounds onthese fractions converge to the corresponding ratios involved in the right hand side ofequation (4) in the conclusion of the statement of Theorem 40. We now proceed indetail. Let d . 0 be an arbitrary small number (informally speaking, d p 1). ChooseM [ N large enough so that:

d 2 ·M .Total!P"

b# E!H 1"

(see equation (24)). For m . M let:

U dm # {QjQ [ %Pm& and H 1!Q" . d ·m}: !26"

IJICC5,1

68

and observe that the Markov inequality (Lemma 57) tells us that:

r!U dm" # r!{QjH 1!Q" . d ·m}" # r H 1 .

1

d· !d 2 ·m"

% &

#since m . M

and by definition of Udmin equation 26

# r H 1 .1

d·E!H 1"

% &by Markov inequality

# 1=!1=d" # d

!27"

where r denotes the uniform probability distribution on the set [Pm].As the reader probably anticipates by now, our aim is to show that each of the ratios

of the form:

m!1limjV!Pm; hq$1"jjV!Pm; hq"j

# Order!iq # iq$1;P"Pj[iq#!P" Order!iq # j;P" $ iq #S !P"

so that equation (4) in the conclusion of Theorem 40 would follow from equation (22)when taking the limit of both sides asm ! 1. First of all, let us take care of the “trivialextremes” when for some qwith 1 # q # k 2 1 we have either (Order!iq21 # iq;P" # 0)or (;j – iq we have Order!iq21 # j;P" # 0 and iq21 #S !P" # 0)) or ((xk [ S) and (eitherik21 #S !P" # 0 or xk ! ik21 # !P") or (;j [ N we have Order!ik21 # j;P" # 0 and xk isthe only terminal label member of the set ik21 # P , i.e. ik21 # P > S # {xk})) or (xk # #).According to Proposition 38, the statement above holds for a population P if and only if;m [ N it holds when the population P is replaced with Pm. In the case when eitherOrder!iq21 # iq;Pm" # 0 or ik21 #S !Pm" # 0 or xk ! ik21 # !P", no individual fitting theschema h is present in any population Q [ [Pm] so that ;m and t [ N we haveF!Pm; h; t" # 0. Thereby the left hand side of equation (4) is trivially 0.The right handside is 0 as well in this case since the numerator of one of the fractions in the product is0 (see the convention remark in the statement of Theorem 40). This finishes theverification of one trivial extreme case. Suppose now for some index q it is the case that;j – iq we have Order!iq21 # j;P" # 0 and iq21 #S !P" # 0. In this case we observe thatany individual occurring in a population Q [ [Pm] which fits the schema hq21, also fitsthe schema hq. In particular, the sets V!Pm; hq$1" and V!Pm; hq" are equal and wetrivially have ;m [ N !jV!Pm; hq$1"j"=!jV!Pm; hq"j" # 1. Of course, thecorresponding ratio:

Order!iq # iq$1;P"Pj[iq#!P" Order!iq # j;P" $ iq #S !P" # 1

as well since Order!iq # j;P" is the only nonzero contributing summand in thedenominator. The last factor ratio is supposed to coincide with the ratio!jV!Pm; h"j"=!jV!Pm; hk21"j". This ratio is either 0 or 1 in the extreme cases andverifying the validity of equation (4) is entirely analogous to the above. We now moveon to the interesting case when none of the trivial extremes above happen. Forschemata x and y we write xny # Sx > !Sy"c (see Definition 25) to denote the set ofrollouts fitting the schema x and not fitting the schema y. Rather than estimating or, incase of homologous population P, evaluating exactly the ratios of the form!jV!Pm; hq$1"j"=!jV!Pm; hq"j" we estimate and, in case of homologous recombination,


decision making

69

evaluate the ratios of the form !jV!Pm; hq$1"j"=!jV!Pm; hqnhq$1"j" since these aremore convenient to tackle using the tools in Section 3. The following very simple factdemonstrates the connection between the two.

Lemma 59. Suppose that ;m whenever 1 # q , k 2 1:

jV!Pm; hq$1"jjV!Pm; hqnhq$1"j

# Order!iq # iq$1;P"Pj[iq#!P" and j–iq$1

Order!iq # j;P" $ iq #S !P"

and neither the numerator nor the denominator of any of the fractions is 0. Then:


# Order!iq # iq$1;P"Pj[iq#!P" Order!iq # j;P" $ iq #S !P" :

Likewise, if:

m!1limjV!Pm; hq$1"j

jV!Pm; hqnhq$1"j# Order!iq # iq$1;P"P

j[iq#!P"and j–iq$1Order!iq # j;P" $ iq #S !P"

and for all sufficiently largem neither the numerator nor the denominator of any of thefractions involved vanishes, then:


# Order!iq # iq$1;P"Pj[iq#!P" Order!iq # j;P" $ iq #S !P" :

Proof. Clearly V!Pm; hq" # V!Pm; hq$1" ]V!Pm; hqnhq$1" where ] emphasizesthat this is a union of disjoint sets. The rest is just a matter of careful verification: wehave:


# jV!Pm; hq$1"jjV!Pm; hq$1"j$ jV!Pm; hqnhq$1"j

# 1

1$ !!jV!Pm; hqnhq$1"j"=jV!Pm; hq$1"j": !28"

Taking the limit as m ! 1 on both sides of equation (28) yields:


# 1

1$ limm!1!!jV!Pm; hqnhq$1"j"=jV!Pm; hq$1"j"!29"

The right hand sides of equations (28) and (29) are easily computed directly from thecorresponding formulas in the assumptions and each of them is:

1

1$ Pj[iq#!P" and j–iq$1

Order!iq # j;P" $ iq #S !P"! "

=Order!iq # iq$1;P"! " #

# 1

!!Order!iq # iq$1;P""=Order!iq # iq$1;P""$P

j[iq#!P" and j–iq$1Order!iq # j;P"$ iq #S !P"

! "=Order!iq # iq$1;P"

! "

IJICC5,1

70

# Order!iq # iq$1;P"Pj[iq#!P" Order!iq # j;P" $ iq #S !P"

yielding the asserted conclusions. AEntirely analogously,Lemma 60. Suppose that ;m:

jV!Pm; hk"jjV!Pm; hk21nhk"j

# 1Pj[ik21#!P" Order!ik21 # j;P" $ ik21 #S !P"2 1

and the denominators do not vanish. Then:

jV!Pm; hk"jjV!Pm; hk21"j

# 1Pj[ik21#!P" Order!ik21 # j;P" $ ik21 #S !P" :

Likewise, if:

m!1limjV!Pm; hk"j

jV!Pm; hk21nhk"j# 1P

j[ik21#!P" Order!ik21 # j;P" $ ik21 #S !P"2 1

and for all sufficiently large m the denominators of any of the fractions involvedvanishes, then:

m!1limjV!Pm; hk"jjV!Pm; hk21"j

# 1Pj[ik21#!P" Order!ik21 # j;P" $ ik21 #S !P" :

To estimate or, in the special case of homologous population P, to compute exactly, theratios !jV!Pm; hq$1"j"=!jV!Pm; hqnhq$1"j" the following strategy will be employed.For a given m [ N consider the set of all populations V!Pm; hq" (i.e. the set of thesepopulations in [Pm] the first individual of which fits the schema hq). Let now pq,m

denote the uniform probability measure on the set V!Pm; hq". We then have:

jV!Pm; hq$1"jjV!Pm; hqnhq$1"j

# !!jV!Pm; hq$1"j"=jV!Pm; hq"j"!!jV!Pm; hq"nhq$1"j"=jV!Pm; hq"j"

# pq;m!V!Pm; hq$1""pq;m!V!Pm; hqnhq$1""

!30"

and, more generally, ; set of rollouts S:

jV!Pm; hq$1 > S"jjV!Pm; hqnhq"j

# !!jV!Pm; hq$1 > S"j"=jV!Pm; hq"j"!!jV!Pm; hq"nhq$1"j"=jV!Pm; hq"j"

# pq;m!V!Pm; hq$1 > S""pq;m!V!Pm; hqnhq$1""

!31"

The idea behind equations (30) and (31) is to construct a Markov chain with a uniformstationary distribution on the state space V!Pm; hq" thereby opening the door to anapplication of Lemma 55. It seems the easiest construction to accomplish our task uses


decision making

71

Proposition 58. Recall the transformations of the form xi;x;y as in Definitions 9 and 12from Definition 16. We now construct our Markov chain, call it Mq, on the setV!Pm; hq" where q , k as follows: given a population of rollouts Qt [ V!Pm; hq" attime t, let (iq, x) be the state in the first rollout and qth position in the population Qt.Consider the set:

Statesm!iq # Qt" # {! j; z"jj [ iq # !Qt"; z [ A £m and the state ! j; z"

appears in the population Qt following a state with equivalence class iq}<

<{! f ; j"j1 # j # m and f [ iq #S P}:

!32"Now select a state or a terminal label; call either one of these v, from the set finite setStatesm!iq # Qt" uniformly at random. Since each state appears uniquely in apopulation Qt, by definition of the set Statesm!iq # Qt" in eqution 32, the state precedingthe element v selected from Statesm!iq # Qt", call it u, is of the form u # !iq; y" wherey [ A £ m. Now let Qt$1 # xiq;x;y!Qt". Notice that there are two mutually exclusivecases here:

. Case 1. The states u and (iq, x) appear in different rollouts (or, equivalently, thestate u does not appear in the first rollout since the state (iq,x) does by definition).In this case Qt$1 – Qt and the state in the first rollout of the population Qt$1 inthe q $ 1st position is v. In this case we will say that the element v is mobile.

. Case 2. The states u and (iq, x) appear in the same rollout (of course, it has to bethe first rollout). In this case Qt$1 # Qt . We will say that the element v isimmobile.

Notice that in either of the cases, the population Qt$1 [ V!Pm; hq" so that the Markovprocess is well defined on the set of populations V!Pm; hq" # %Pm&. We nowemphasize the following simple important facts.

Lemma 61. ;Q [ V!Pm; hq" jStatesm!iq # Q"j # m · jStates1!iq # P"j and

jStates1!iq # P"j #X

j[iq#!P"Order!iq # j;P" $ iq #S !P"

Proof. The fact that jStates1!iq # P"j #P

j[iq#!P" Order!iq # j;P" $ iq #S !P" followsdirectly from the definitions. Definition of the set Statesm!iq # Q" in equation (32)together with Remark 34 tell us that Statesm!iq # Q" # States1!iq # Pm" (where Pmplays the role of P for the time being) so that:

Statesm!iq # Q" # jStates1!iq # Pm"j #j[iq#!Pm"

XOrder!iq # j;Pm" $ iq #S !Pm" #

by Proposition 38#X

j[iq#!P"m · Order!iq # j;P" $m · iq #S !P" #

#m ·X

j[iq#!P"Order!iq # j;P" $ iq #S !P"

! "by the already proven fact

# m · jStates1!iq # P"j:

AAnother very simple important observation is the following.

IJICC5,1

72

Lemma 62. Given any two populations Q and Q 0 [ V!Pm; hq", let pqQ!Q 0 denotethe transition probability of the Markov chain Mq as constructed above. Then eitherpqQ!Q 0 # 0 or pqQ!Q 0 # !1=!m · jStates1!iq # P"j"". Moreover, pqQ!Q 0 # pqQ 0!Q and theuniform distribution is a stationary distribution of the Markov chain Mq.

Proof. From the construction it is clear that if pqQ!Q 0 – 0 then there must be anelement s [ Statesm!iq # Q" which appears in a rollout in the population Q differentfrom the first one and it is the state at the qth position of the first rollout of thepopulation Q\,0 while Definition 16 tells us that the state (iq, x) in the qth position of thefirst rollout of the population Q appears in Q\,0 in some rollout that is not the first one(the former position of the state s that is now in position q of the first rollout of Q\,0) andit is also a member of the set Statesm!iq # Q 0" according to the way Statesm!iq # Q 0" isintroduced in equation (32). According to Lemma 61 Statesm!iq # Q" # Statesm!iq #Q 0" # m · jStates1!iq # Q"j so that the desired conclusion that pqQ!Q 0 # pqQ 0!Q followsfrom the construction of the Markov chainMq. The uniform probability distribution isa stationary distribution of the Markov chain Mq since we have just shown that theMarkov transition matrix is symmetric (see also Proposition 58). A

Recall the generalized transition probabilities introduced in Definition 51. For theremaining part of this section it is convenient to introduce the following definition.

Definition 63. Given a population Q [ V!Pm; hq$1", let Mobileq!Q" denote thenumber of mobile elements (see case 1 above) in the set Statesm!iq # Q" that move thepopulation Q away from the setV!Pm; hq$1" (and hence, into the set V!Pm; hqnhq$1")under the application of the Markov chain Mq as constructed above. Dually, givenQ [ V!Pm; hqnhq$1", let Mobileq!Q" denote the number of mobile elements in the setStatesm!iq # Q" that move the population Q away from the set V!Pm; hqnhq$1" (andhence, into the set V!Pm; hq$1").

Suppose, for the time being, that the set V!Pm; hq$1" – B. Given a populationQ [ V!Pm; hq$1", notice that:

Mobileq!Q"#j[iq#Q and j–iq$1

POrder!iq # j"!Q" $ iq #S !Q" if q, k2 1

j[iq#Q

POrder!iq # j"!Q" $ iq #S !Q"2m if q # k2 1

8>><

>>:!33"

Notice that in case the population P is homologous (and hence so are Pm and Q) thereare no immobile elements in the population Q so that the inequality (33) turns into anexact equation. In general, from case 2 above it is clear that the total number of all theimmobile elements is crudely bounded above by the height of the first rollout in thepopulation Q, H1(Q). We now obtain a lower bound on the total number of mobileelements in the set Statesm!iq # Q" that move the population Q away from the setV!Pm; hq$1" into the set V!Pm; hqnhq$1": this number is at least:

Mobileq!Q" $

$j[iq#Q and j–iq$1

POrder!iq # j"!Q" $ iq #S !Q"2 H 1!Q" if q , k2 1

j[iq#Q

POrder!iq # j"!Q" $ iq #S !Q"2m2 H 1!Q" if q # k2 1

8>><

>>:!34"


decision making

73

Analogously, if the population Q [ V!Pm; hqnhq$1" then the total number of mobileelements in the set Statesm!iq # Q" that move the population Q away from the setV!Pm; hqnhq$1" (and hence, into the set V!Pm; hq$1"):

Mobileq!Q" #Order!iq # iq$1"!Q" if q , k2 1

m if q # k2 1

(

!35"

and, as before, the inequality turns into an exact equation in the case when Q is ahomologous population. At the same time:

Mobileq!Q" $Order!iq # iq$1"!Q"2 H 1!Q" if q , k2 1

m2 H 1!Q" if q # k2 1

(

!36"

In view of Proposition 38 and Remark 34 inequalities (33)-(36) can be rewrittenverbatim replacing Order!iq # iq$1"!Q" withm · Order!iq # iq$1"!P", and Order!iq # j"!Q"with m · Order!iq # j"!P".

For the case of homologous population Q the situation is particularly simple.Lemma 64. Suppose the population P is homologous. Suppose further, that neither

one of the sets V!Pm; hq$1" and V!Pm; hqnhq$1" is empty. Then ;m [ N we have:

pqV!Pm; hq$1"!V!Pm; hqnhq$1" #

Pj[iq# P and j–iq$1

Order!iq# j"!P"$iq#S!P"Pj[iq #P

Order!iq#j"!P"$iq#S!P"if q , k2 1

Pj[iq# P

Order!iq#j"!P"$iq#S!P"21Pj[iq #P

Order!iq#j"!P"$iq#S!P"if q # k2 1

8>>>><

>>>>:

;

pqV!Pm; hqnhq$1"!V!Pm; hq$1" #

Order!iq# iq$1"!P"$iq#S!P"Pj[iq #P

Order!iq# j"!P"$iq#S!P"if q , k2 1

1Pj[iq #P

Order!iq#j"!P"$iq#S!P"if q # k2 1

:

8>><

>>:

Consequently, ;m [ N:

pq;m!V!Pm; hq$1""pq;m!V!Pm; hqnhq$1""

#

Order!iq# iq$1"!P"$iq#S!P"Pj[iq #Pandj–iq$1

Order!iq#j"!P"$iq#S!P"if q , k2 1

1Pj[iq #P

Order!iq#j"!P"$iq#S!P"21if q # k2 1

8>><

>>:

Proof. The first and the second conclusions follow from equations (33) and (35)combined with Lemma 62, Definition 51 and comment following equation (36). The lastconclusion is an immediate application of equation (18) to the lumping quotient of theMarkov chain Mq into the two states A # V!Pm; hq$1" and B # V!Pm; hqnhq$1". A

All that remains to do now to establish Theorem 40 in the special case ofhomologous population P is to show that whenever 1 # q # k 2 1 and none of the“trivial extremes” takes place (see the beginning of this subsection), the setsV!Pm; hq$1" and V!Pm; hqnhq$1" are nonempty. This will be done later jointly withthe corresponding fact needed for the general case. Meanwhile, we return to the

IJICC5,1

74

estimation of the ratios of the form !pq;m!V!Pm; hq$1"""=!pq;m!V!Pm; hqnhq$1""" inthe general case. Suppose, for now, the following statement is true:

;q with 1 # q , k’const!q" [ !0; 1" such that ; sufficiently large m

we have rm!V!Pm; hq$1"" . const!q" and rm!V!Pm; hqnhq$1"" . const!q" !37"In the general case of non-homologous population P the presence of immobile statessignificantly complicates the situation. This is where Markov inequality comes to therescue telling us that as m increases the height of the first rollout (and hencethe number of immobile states) being large becomes more and more rare event so thatthe bounds in the inequalities (33) and (34) as well as inequalities (35) and (36) get closerand closer together. We now proceed in detail. Recall the construction of the sets U d

mstarting with equation (24) and ending with inequality (27). Let d . 0 be given.According to inequality (27) ’M1 large enough so that ;m . M1 we haverm!U d · const!q$1"

m " , d · const!q$ 1". where const(q $ 1) is as in the assumptionstatement (equation (37)). We now have:

pq;m V!Pm; hq$1"> U d · const!q$1"m

! "

pq;m!V!Pm; hq$1""#

pq;m U d · const!q$1"m

! "

pq;m!V!Pm; hq$1""#

# racjU d ·const!q$1"m kV!Pm;hq"j

!jV!Pm;hq$1"j"=!jV!Pm;hq"j"# jU d ·const!q$1"

m jj!V!Pm;hq$1"j

# !!jU d ·const!q$1"m j"=!j%Pm&eFj"

!!jV!Pm;hq$1"j"=!j%Pm& ~Fj""#

#rm Ud · const!q$1"

m

! "

rm!V!Pm; hq$1""#

d · const!q$ 1"const!q$ 1" # d: !38"

Analogously:

pq;m V!Pm; hqnhq$1"> U d · const!q$1"m

! "

pq;m!V!Pm; hqnhq$1""#

#pq;m Ud · const!q$1"

m

! "


rm Ud · const!q$1"m

! "

rm!V!Pm; hqnhq$1""# d

!39"

Now observe that as long as a population Q [ V!Pm; hq$1"nU d · const!q$1"m , the height of

the first rollout H 1!Q" # !d · const!q$ 1"" ·m # d ·m (recall how the sets of the formU[

m are introduced from equation (26)). Now, for q , k 2 1 inequalities (33), (34) andLemma 61 tell us that for ;m . M1 we have:

m ·P

j[iq#!P"; j–iq$1Order!iq # j;P"

! "$ iq #S !P"

! "2 d ·m

m · jStates1!iq # P"j#

# pQ!V!Pm;hqnhq$1" #m ·

Pj[iq#!P"; j–iq$1

Order!iq # j;P"! "

$ iq #S !P"! "

m · jStates1!iq # P"j


decision making

75

so that dividing the numerator and the denominator by m gives:

Pj[iq#!P";j–iq$1

Order!iq # j;P"! "

$ iq #S 2d

jStates1!iq # P"j#

# pQ!V!Pm;hqnhq$1" #

Pj[iq#!P";j–iq$1

Order!iq # j;P" $ iq #S !P"jStates1!iq # P"j

!40"

Entirely analogous and, by now, well familiar to the reader reasoning with inequality(39) playing the role of inequality (38) shows that whenever m . M1 and a populationQ [ V!Pm; hqnhq$1"nU d · const!q$1"

m we have:

Order!iq # iq$1;P"2 d

jStates1!iq # P"j# pQ!V!Pm;hq$1" #

Order!iq # iq$1;P"jStates1!iq # P"j

!41"

Now inequalities (38)-(41) allow us to apply Lemma 55 with A # V!Pm; hq$1",B # V!Pm; hqnhq$1" and U # Ud · const!q$1"

m and concluding that ;m . M1 we have:

!12 d" · !!Order!iq # iq$1;P"2 d"=jStates1!iq # P"j"!12 d" · P

jıniq#!P";j–iq$1Order!iq # j;P" $ iq #S !P"

! "=jStates1!iq # P"j"

! "$ d

#pq;m!V!Pm; hq$1""


#!12 d" · !!Order!iq # iq$1;P""=jStates1!iq # P"j" $ d

!12 d" · Pj[iq#!P";j–iq$1

Order!iq # j;P" $ iq #S !P"2 d! "

=jStates1!iq # P"j! " :

Multiplying the numerator and the denominator of the leftmost and the rightmostfractions by the constant jStates1!iq # P"j which does not depend on m we obtain:

!12 d" · Order!iq # iq$1;P"2 d · jStates1!iq # P"j# $

!12 d" · Pj[iq#!P"; j–iq$1Order!iq # j;P" $ iq #S !P"

! "$ d · jStates1!iq # P"j

#

#pq;m!V!Pm; hq$1""


!12 d" · Order!iq # iq$1;P" $ d · jStates1!iq # P"j!12 d" Pj[iq#!P"; j–iq$1

Order!iq # j;P" $ iq #S !P"2 d · jStates1!iq # P"j! "

!42"

Now simply observe that the leftmost and the rightmost sides of the inequality(42) are both differentiable (and, hence, continuous) functions of d on the domain(20.5, 0.5) (notice that the denominators do not vanish on this domain thanks tothe assumption that neither of the trivial extremes takes place). It followsimmediately then that both, the leftmost and the rightmost sides of the inequality(42) converge to the same value, namely to the desired ratio:

IJICC5,1

76

R # Order!iq # iq$1;P"Pj[iq#!P";j–iq$1

Order!iq # j;P" $ iq #S !P"

as d ! 0. From the definition of a limit of a real-valued function at a point, itfollows that given any 1 . 0 we can choose small enough d . 0 such that both,the leftmost and the rightmost sides of the inequality (42) are within 1 error of R.We have now shown that depending on this d we can then choose sufficientlylarge M so that the ratio !pq;m!V!Pm; hq$1"""=!pq;m!V!Pm; hqnhq$1""", beingsqueezed between the two quantities within the 1 error of R, is itself within theerror at most 1 of R. In summary, we have finally proved the following.

Lemma 65. Assume that the statement in equation (37) is true. Then whenever1 , q , k 2 1 we have:

m!1limpq;m!V!Pm; hq$1""

pq;m!V!Pm; hqnhq$1""# Order!iq # iq$1;P"P

j[iq#!P";j–iq$1Order!iq # j;P" $ iq #S !P" :

An entirely analogous argument shows the following.Lemma 66. Assume that the statement in equation (37) is true. Then:

m!1limpk21;m!V!Pm; hk""

pk21;m!V!Pm; hk21nhk""#

# 1Pj[ik21#!P" Order!ik21 # j;P" $ ik21 #S !P"2 1

:

According to Lemmas 59 and 60, equations (30) and (31), Lemmas 65, 66, 64 andequations (22) and (23), all that remains to be proven to establish Theorem 40 is thefollowing.

Lemma 67. Suppose neither of the trivial extremes takes place. Then the statementin equation (37) is true. Furthermore, in case of homologous recombination thestatement is true for all m (not only for large enough m).

Proof. We proceed by induction on the index q. First of all, recall from thebeginning of the current subsection that we have already shown that ;m [ N wehave:

rm!V!Pm; h1""by Lemma 48#

t!1lim F!h1;Pm; t" #

Order!a # i1;P"b

. 0

where the last inequality holds because none of the trivial extremes takes place so thatOrder(a # i1;P" – 0 (recall that rm denotes the uniform probability distribution on [Pm]so that rm!V!Pm; h1"" # !jV!Pm; h1"j"=!%Pm&"). Since V!Pm; h1" # V!Pm; h2" ]V!Pm; h1nh2" we also have rm!V!Pm; h2"" $ rm!V!Pm; h1nh2"" # rm!V!Pm; h1"" #!Order!a # i1;P""=b # const0 where 1 $ const0 . 0 and const0 is independent of m. Itfollows then that at least one of the following is true: rm!V!Pm; h2"" $ !const0=2" orrm!V!Pm; h1nh2"" $ !const0=2". In the general case, chooseM1 large enough so that ;

m . M1 we have rm!Uconst0

4m " # !const0=4" (recall the part of the proof starting with

equation (24) and ending with inequality (27)). It follows then that either:


decision making

77

rm V!Pm; h2"nUconst0

4m

% &$

const04

or rm V!Pm; h1nh2"nUconst0

4m

% &$

const04

:

An already familiar argument exploiting Corollary 54, inequalities (33)-(36) andLemma 61 shows that, thanks to the assumption that no trivial extremes take place,and observing that 1 2 (const0/4) $ (1/4) for all large enough m the ratios:

p1V!Pm;h2"nU

!const0=4"m

# $!V!Pm;h1nh2"

p1V!Pm;h1nh2"!V!Pm;h2"$ k1

and, likewise:

p1V!Pm;h1nh2"nU

const0=4m

# $!V!Pm;h2"

p1V!Pm;h2"!V!Pm;h1nh2"$ k2

where both, k1 and k2 . 0 and independent of m. Now we apply Lemma56 to the sets B # V!Pm; h2"nU !const0=4"

m and A # V!Pm; h1nh2" in the case when rmV!Pm; h2"nU !const0=4"

m

! "$ !const0=4" or to the pair of sets B # V!Pm; h1nh2"n

U !const0=4"m and A # V!Pm; h2" in the case when rm V!Pm; h1nh2"nU !const0=4"

m

! "$

!const0=4", tells us that if we let const!1" # min{!const0=4"; !const0=4" · k1;!const0=4" · k2} then the statement in equation (37) is true for q # 1. This establishesthe base case of induction. Now observe that if the statement in equation (37) holds forsome q then it is true, in particular, that ’ a constant const(q) independent ofm such thatfor all large enoughmwehaveV!Pm; hq" . const!q". Now thevalidity of the statement inequation (37) for q $ 1 follows from an entirely analogous argument to the one in the basecase of induction with const(q) playing the role of const(0) and the Markov chain Mq

replacing theMarkov chainM1. In the case of homologous recombination, an even simpler(since there is no need to worry about the height of the first rollout), analogous argumentshows that the statement in equation (37) holds ;m. V A

6. A further strengthening of the general finite population GeiringerTheorem for evolutionary algorithms6.1 A form of the classical contraction mapping principle for a family of maps having thesame fixed pointThe material of this section requires familiarity with elementary point set topology orwith basic theory of metric spaces (Simmons, 1983). Throughout this section (X, d )denotes a complete metric space. We recall the following from classical theory of metricspaces.

Definition 68. We say that a map f :X ! X is a contraction on X if ’k , 1 suchthat ;x; y [ X we have d! f !x"; f ! y"" # k · d!x; y". We also call k a contraction rate[13].We may then say that f is a contraction with contraction rate at most k.

The classical result known as contraction mapping principle states the following.Theorem 69. Contraction Mapping Principle. Suppose (X, d ) is a complete metric

space and f : X ! X is a contraction on X in the sense of Definition 68. Then ’! z [ Xsuch that ;y [ X we have limn!1f n! y" # z.

IJICC5,1

78

Proof. The proof can be found in nearly every textbook on point set topology suchas Simmons (1983), for instance. A

In our application we will exploit the following natural extension of Definition 68.Definition 70. Suppose (X, d ) is a complete metric space. We say that a family of

maps F # {f j f : X ! X} is an equi-contraction family if ’k , 1 such that ;f [ Fand ;x; y [ X we have d! f !x"; f ! y"" # k · d!x; y".

Evidently, if the family F of contractions is finite, one can take the maximum of aset K # {kf j;x; y [ X we have d! f !x"; f ! y"" # kf · d!x; y"} so that we immediatelydeduce the following important (for our application) corollary.

Corollary 71. IfF is any finite family of contractions on the metric space X thenFis an equi-contraction family.

The classical contraction mapping principle says that every contraction map ona complete metric space has a unique fixed point. Here we need a slight extensionof Theorem 69, which probably appears as an exercise in some point set topologyor real analysis textbook, but for the sake of completeness it is included in ourpaper.

Theorem 72. Suppose we are given an equi-contraction family F on the completemetric space (X, d ). Suppose further that every f [ F has the same unique fixed pointz (in accordance with Theorem 69). Consider any sequence of composed functionsg1 # f 1; g2 # f 2+g1 . . . ; gn # f n+gn21 where each fi [ F (it is allowed for f i # f jwhen i – j). Then ;y [ X limn!1gn! y" # z exponentially fast for some constantk , 1. In particular, the convergence rate does not depend either on the sequence{gi}

1i#1 (as long as it is constructed in the manner described above). Moreover, in case d

is a bounded metric (i.e. supx;y[Xd!x; y" , 1), the convergence rate does not dependeven on the choice of the initial point y [ X.

Proof. Since all the functions fi have the same fixed point z, it is clear byinduction that ;n we have gn!z" # z. Since F is an equi-contraction family, inaccordance with Definition 70 ’k , 1 such that d! f !x"; f ! y"" # k · d!x; y". We nowhave d!g1! y"; z" # d! f 1! y"; f 1!z"" # k · d! y; z". If d!gm! y"; z" # km · d! y; z", thend!gm$1 ! y"; z" # d! f m$1!gm! y""; f m$1!z"" # k · d!gm! y"; z" # k · !km · d! y; z"" # km$1 ·d! y; z" so that by induction it follows that ;n [ N we have d!gn! y"; z" # kn · d! y; z".But k , 1 so that d!gn! y"; z"! 0 exponentially fast as n ! 1 which isanother way of stating the first desired conclusion. If supx;y[Xd!x; y" , 1 thend!gn! y"; z" # kn · d! y; z" # kn · supx;y[Xd!x; y". A

6.2 What does Theorem 72 tell us about Markov Chains?Suppose M is a Markov chain on a finite state space X with transition matrixP # {px!y}x;y[X. Clearly P extends to the linear map on the free vector space RX

spanned by the pointmass probability distributions which form an orthonormal basisof this vector space (isomorphic to RjXj, of course) under the L1 norm defined as thesum of the absolute values of the coordinates: kPx[X rxxkL1 #

Px[X jrxj. The linear

endomorphism P defined by the matrix {px!y}x;y[X with respect to the basis Xrestricts to the probability simplex:

eX #x[X

Xrxxj;x [ X0 # rx # 1

x[X

Xrx # 1

8<

:

9=

; !43"


decision making

79

(which is closed and bounded in RX and hence is compact which is way stronger thanwe need). The following well-known fact from basic Markov chain theory allows us toapply the tools from Subsection 1. For the sake of completeness a proof is included.

Theorem 73. Suppose M with notation as above is an irreducible Markov chain(meaning that ;x, y [ X we have px!y . 0). Then P # {px!y}x;y[X : eX !eX (seeequation (43)) is a contraction (see Definition 68) on the complete and bounded probabilitysimplex eX with respect to the metric induced by the L1 norm, i.e. k~ukL1

# mx[Xjuxjwhere ~u #Px[X ux[14]. Moreover, the contraction rate (see Definition 68) is at most1 2 jXj1 where 1 . 0 is any number smaller thanminx;y[Xpx!y.

Proof. First notice that given any Markov transition matrix R # {rx!y}x;y[X, andany two probability distributions p and s [ eX, we have:

kR!p2 s"kL1 #y[X

X

x[X

Xrx!y!p!x"2 s!x""

''''''

''''''#

y[X

X

x[lX

Xrx!yjp!x"2 s!x"j #

#x[X

X

y[X

Xrx!yjp!x"2 s!x"j #

x[X

Xjp!x"2 s!x"j # kp2 skL1 :

In summary, we have shown that:

; Markov transition matrix R # {rx!y}x;y[X on the state space X and

; probability distributions p;s [ eX we have

kR!p2 s"kL1# kR!p"2 R!s"kL1

# kp2 skL1 !44"There is one more simple fact we observe: let J denote anX £ Xmatrix with all entriesequal to 1. Given any vector ~u #Px[X uxx, we have J · ~u # ~v #Px[X vxx where;y [ Xwe have vy #

Px[X ux independently of y. It is clear then that the kernel of the

matrix J:

Ker!J " # {~uj~u #x[X

Xuxx and

x[X

Xux # 0}:

In particular, if p and s are probability distributions on X, then the sums ofcoordinates

Px[X !p !x"" #Px[X !s !x"" # 1 so that the vector p 2 s [ Ker! J ",

i.e. J !p 2 s" # 0. In summary, we deduce the following:

; probability distributions p and s [ eX we have J !p2 s" # 0: !45"The assumption that px!y . 0 together with the assumption thatX is a finite set implythat we can find a positive number 1 . 0 such that 0 , 1 , min{px!yjx; y [ X}. LetN # jXj denote the size of the state space X and notice that by the choice of 1 in theprevious sentence, ;x [ Xwe have N · [,

Py[X px!y # 1 so that a # 1 2 N1 . 0.

We can now write:

P # !P 2 1J " $ 1J # a1

a!P 2 1J "

% &$ 1J # aQ$ 1J !46"

whereQ # !1=a"!P 2 1J " # {qx!y}x;y[X is a stochastic matrix, i.e. ;x [ X the sum ofthe entries:

IJICC5,1

80

y[X

Xqx!y #

y[X

X pxwy 2 1

a#P

y[X ! px!y 2 1"12 N1

#P

y[X px!y

! "2 N1

12 N1# 1:

so thatQ is a matrix representing a Markov chain on the state spaceX. Now, given anytwo distributions p and s [ eX, using the decomposition of the matrix P given inequation (46) together with the facts expressed in equation (45) we obtain:

P!p2 s" # !aQ$ 1J "!p2 s" # aQ!p2 s" $ 1J !p2 s" # aQ!p2 s"so that, since Q is a matrix which represents a Markov chain, the fact expressed inequation (44) readily gives us the desired conclusion that:

kP!p2 s"kL1# kaQ!p2 s"kL1 # akQ!p2 s"kL1

# akp2 skL1which shows that P is a contraction since we demonstrated before that 0 , a , 1.A

In Corollary 71 we saw that any finite family of contraction maps is anequi-contraction family. For Markov transition matrices (also called stochasticmatrices in the literature) significantly more is true. The following notion is naturallymotivated by Definition 70 and Theorem 73.

Definition 74. Given a family of Markov transition matrices:

F # {{pix!y}x;y[Xji [ I;p [ eX and ;i [ I and ;y [ X we have

x[X

Xpix!ypx # py and b #

i[I;x and y[Xinf pix!y . 0}

indexed by some setI, sharing a common stationary distribution p and such that thegreatest lower bound of all the entries from all the matrices in F, let us call it b, isstrictly positive (or, equivalently, is not 0) we say thatF is a family of interchangeableMarkov transition matrices with lower bound b.

Apparently, Theorem 73 immediately implies the following.Corollary 75. Every interchangeable familyF of Markov transition matrices with

lower bound b is an equi-contraction family with a common contraction rate at mosta # 12 jXj1 for any 1 with 0 , 1 , b.

Moreover, families of interchangeable Markov transition matrices can often beeasily expended as follows.

Corollary 76. Suppose that a family F of Markov transition matrices over thesame state space X is interchangeable with lower bound b. Then so is the convex hullof the family F:

e!F" #{TjT #Xk

i#1

tiMi where k [ N and ;0 , i , k we have

0 , ti # 1Xk

i#1

ti # 1}:

Proof. Given a matrix T # {tx!y}x;y[X [ e!F", we can write T #Pkj#1 tjMj [

e!F" with Mj # {pjx!y}x;y[X [ F, 0 , tj # 1 andPk

j#1 tj # 1. But then ;x,y [ X


decision making

81

we have tx!y #Pk

j#1 tj · pjx!y $

Pkj#1 tj · b # b so that the desired conclusion follows

at once. ACombining Theorem 73, Corollaries 71 and 76 readily gives the following.Corollary 77. Suppose we are given a finite family F of Markov transition

matrices such that all the entries of each matrix M [ F are strictly positive. Thene(F) is an equi-contraction family.

Corollary 77 extends the applicability of the finite population Geiringer theoremappearing in Mitavskiy and Rowe (2005, 2006) (and, possibly some otherhomogenous-time Markov chain constructions) to non-homogenous time Markovchains generated by arbitrary stochastic processes in the sense below.

Theorem 78. Consider any finite set X. Let F denote a finite family of Markovtransition matrices on X such that all the entries of each matrix M [ F are strictlypositive and all the matrices in F have a common stationary distribution p. Nowconsider any stochastic process {Zn}

1n#1 with each Zn # !Fn;Xn" on F £ X having

the following properties:

F0 and X0 are independent random variables: !47"

For n $ 1Fn does not depend on Xn;Xn$1; . . . ; !however; it may depend on

X0;X1; . . . ;Xn21 as well as many other implicit parameters": !48"The stochastic process Xn is a non-homogenous time Markov chain on X withtransition matrices Fn!w". More explicitly:

If Fk!v" # {pkx!y}x;y[X then ;y [ X we have

P!Xn # y" #x[X

XP!Xn21 # x"pn21

x!y: !49"

Then the non-homogenous time Markov chain converges to the unique stationarydistribution p exponentially fast regardless of the initial distribution of X0. Moreprecisely, ’a [ (0, 1) such that ;t [ N we have:

kP!Xt # · "2 pkL1# a t

where P!Xt # · " denotes the probability distribution of the random variable Xt.Proof. Observe that if we want to compute the distribution of X1 given the

distribution of X0, we need to select a Markov transition matrix M # {mx!y}x;y[X [F with respect to the probability distribution of F0 which is independent of X0. Thevalue of X1 is then obtained by selecting a value x of X0 with respect to the initialdistribution P!X0 # · " and then obtaining the next state X1 # y with probabilityP!X1 # y" # mx!y. Thereby ;y [ X we may write:

P!X1 # y" #M[F

X

x[X

XP!F0 # M and X0 # x"mx!y

by independence#

#M[F

X

x[X

XP!X0 # x"P!F0 # M "mx!y #

IJICC5,1

82

#x[X

XP!X0 # x"

M[F

XP!F0 # M "mx!y: !50"

Since F is a finite set, ;M [ F we have P!F0 # M " [ %0; 1& andPx[X P!X0 # x" # 1, we deduce that the matrix T0 #

PM[F P!F0 # M " ·M [

e!F" is a Markov transition matrix and equation (50) can be alternatively written inthe vector form as:

P!X1 # · " # T0 ·P!X0 # · ": !51"

Continuing inductively, if we assume:

P!Xk # · " # Tk21+. . .+T1+T0 ·P!X0 # · " !52"

for k $ 1 where the Markov transition matrices Ti [ e!F", then it followsanalogously to the above reasoning that:

P!Xk$1 # y" #M[F

X

x[X

XP!Fk # M and Xk # x"mx!y

by independence of Fk and Xk#

#x[X

XP!Xk # x"

M[F

XP!Fk # M "mx!y

so that for the same reasons as before we may conclude that:

P!Xk$1 # · " # Tk ·P!Xk # · " # Tk · !Tk21+. . .+T1+T0 ·P!X0 # · "" #

# Tk+Tk21+. . .+T1+T0 ·P!X0 # · ": !53"where Tk [ e!F" #PM[F P!Fk # M " ·M [ e!F" for the same reason asT0 [ F. We now conclude by induction that ;t [ N we have:

P!Xt # · " # Tt21+. . .+T1+T0 ·P!X0 # · " !54"

where ;i [ N< {0} we have Ti [ e!F". According to Corollary 77 the family ofMarkov transition matricese(F) is an equi-contraction family with the same commonstationary distribution p and now the desired conclusion follows immediately fromTheorem 72. A

Remark 79. It is interesting to notice that the non-homogenous time Markovprocess Xn in Theorem 78 may be generated by non-Markovian processes Fnwhere theMarkov transition matrices Fn depend not only on the past history F0;F1; . . . ;Fn21

but also on the history of the stochastic process Xn itself. This property is interestingnot only from the mathematical point of view but also in regard to the main subject ofthe current paper: the application to the Monte Carlo Tree search method. Due to thepast history in a certain game as well as other possibly hidden circumstances (such ashuman mood, psychological state, etc.), a player may suspect the states beinginterchangeable to bigger or smaller degree. Theorems like 78 demonstrate that inmost cases this will not matter in the limiting case which strengthens the theoreticalfoundation in support of the main ideas presented in this work.


decision making

83

One can extend Theorem 78 further to be applicable to a wider class of families ofMarkov transition matrices having a common stationary distribution than just thesehaving all positive entries.

Definition 80. We say that a family F of Markov transition matrices isirreducible and aperiodic with a common stationary distribution p if p is a stationarydistribution of every matrix in F and ’k [ N such that ; sequence oftransformations {Mi}

ki#1 with Mi [ F the composed Markov transition matrix T #

M 1+M 2+. . .+Mk has strictly positive entries and p is a stationary distribution ofevery Markov transition matrix M [ F. We also say that k is the common reachableindex.

If we were to start with a finite irreducible and aperiodic family of Markovtransition matrices F with a common reachable index k in the sense of Definition 80then the corresponding family:

fFF # {LjL # M 1+M 2+. . .+Mk with Mi [ F} !55"

has the size j ~Fj # jFjk , 1 and every matrix in the family ~F has strictly positiveentries. It follows immediately from Corollary 77 that e! ~F" is an equi-contractionfamily. Now suppose that we are dealing with the same stochastic process asdescribed in the statement of Theorem 78 with the only exception that the family Fis a finite irreducible and aperiodic family with a common reachable index k ratherthan “a finite family of Markov transition matrices on X such that all the entries ofeach matrix M [ F are strictly positive”. Notice that the proof of Theorem 78 doesnot use the assumption that the Markov transition matrix entries are strictly positiveup to the last step following equation (54). Therefore, it follows that the sameequation holds for a finite irreducible and aperiodic family of Markov transitionmatrices, i.e.:

;t [ N we have P!Xt # · " # Tt21+. . .+T1+T0 ·P!X0 # · " !56"

where ;i [ N we have Ti [ e!F". We now observe the following simple fact.Lemma 81. The family of linear transformations (and Markov transition matrices

in particular):

ge!F"e!F" # e! ~F"

where:

ge!F"e!F" # {TjT # T1+T2+. . .+Tk with Ti [ e!F"} !57"

and the family e! ~F" is the convex hull of the family ~F introduced in equation (55) inthe sense of the defining equation in Corollary 76.

Proof. Given a transformation:

T # T1+T2+. . .+Tk [ ge!F"e!F"; !58"

since each Ti [ e!F", we have:

IJICC5,1

84

;i with 1 # i # k we have Ti #Xl!i "

j#1

tijMj!i " with 0 # tij # 1 andXl!i "

j#1

tij # 1: !59"

Plugging equation (59) into equation (58) and using the linearity of Ti s we obtain:

T #Xl!1"

j#1

t1j Mj!1"

!

+Xl!2"

j#1

t2j Mj!2"

!

+. . .+Xl!i "

j#1

tijMj!i "

!

+. . .+Xl!k"

j#1

tkj Mj!k"

!

#

#Xl!1"

j!1"#1

Xl!2"

j!2"#1

. . .Xl!k"

j!k"#1

Yk

i#1

tij!i "

!

Mj!1"+Mj!2"+. . .+Mj!k" [fFF

since 0 #Qk

i#1 tij!i " # 1 and:

Xl!1"

j!1"#1

Xl!2"

j!2"#1

. . .Xl!k"

j!k"#1

Yk

i#1

tij!i "

!

#Xl!1"

j#1

t1j

!Xl!2"

j#1

t2j

!

. . .Xl!k"

j#1

tkj

!

# 1

from equation (59) so that the desired conclusion follows at once. ANow continue with equation (56) so that we can write:

;t [ N we have P!Xt # · " # Tt21+. . .+T1+T0 ·P!X0 # · " #

r2fold composition

# Tt218. . .8Tm · k$18Tm · k|(((((((((((((((((({z((((((((((((((((((}k2fold composition

8Tm · k218. . .8T !m21" · k$18T!m21" · k|((((((((((((((((((((((((((({z(((((((((((((((((((((((((((}8. . .

. . .8

k2fold composition

T2k218. . .8Tk$18Tk|(((((((((((((({z((((((((((((((}8

k2fold composition

Tk21. . .8T18T0|((((((((((({z(((((((((((} ·P!X0 # · " #

# Tt21+. . .+Tm · k$1+Tm · k+Fm21+Fm22+. . .+F1+F0 ·P!X0 # · " !60"where m # b!t=k"c and r , k is the remainder after dividing t by k and each Fi [ge!F"e!F" # e!fFF" thanks to Lemma 81. Since le! ~F" is an equi-contraction family(see equation (55) and the discussion which follows this equation), it followsimmediately that we can find a constant a [ [0, 1) such that:

kFm21+Fm22+. . .+F1+F0 ·P!X0 # · "kL1, am · kP!X0 # · "kL1

:

Furthermore, according to equation (44) which concludes the first part of the proof ofTheorem 73, we also have:

kTt21+. . .+Tm · k$1+Tm · k+Fm21+Fm22+. . .+F1+F0 ·P!X0 # · "kL1 ## k!Tt21+. . .+Tm · k$1+Tm · k"+!Fm21+Fm22+. . .+F1+F0 ·P!X0 # · ""kL1 #

# kFm21+Fm22+. . .+F1+F0 ·P!X0 # · "kL1, am · kP!X0 # · "kL1 :

The observations above lead to the following extension of Theorem 78.


decision making

85

Theorem 82. Consider any finite set X. Suppose F a is a finite irreducible andaperiodic family with a common reachable index k and all the matrices in F have acommon stationary distribution p. Now consider any stochastic process {Zn}

1n#1 with

each Zn # !Fn;Xn" on F £ X having the following properties:

F0 and X0 are independent random variables: !61"For n $ 1Fn does not depend on Xn;Xn$1; . . . ; !however; it may depend on

X0;X1; . . . ;Xn21 as well as many other implicit parameters": !62"The stochastic process Xn is a non-homogenous time Markov chain on X withtransition matrices Fn(w). More explicitly:

If Fk!v" # {pkx!y}x;y[X then ;y [ X we have

P!Xn # y" #x[X

XP!Xn21 # x"pn21

x!y: !63"

Then the non-homogenous time Markov chain converges to the unique stationarydistribution p exponentially fast regardless of the initial distribution of X0. Moreprecisely, ’a [ !0; 1" such that ;t [ N we have:

kP!Xt # · "2 pkL1# am!t"

where P!Xt # · " denotes the probability distribution of the random variable Xt andm!t" # bt=kc.

7. Conclusions and upcoming workThis is the first in a sequel of papers leading to the development and applications of verypromising and novel Monte Carlo sampling techniques for reinforcement learning in thesetting of POMDPs. In thisworkwe have established a version of Geiringer-like theoremwith non-homologous recombination (namely Theorem 40) well-suitable for thedevelopment of dynamic programming Monte Carlo search algorithms to cope withrandomness and incomplete information. More explicitly, the theorem provides aninsight into how one may take full advantage of a sample of seemingly independentrollouts by exploiting symmetries within the space of observations as well as additionalsimilarities that may be provided as expert knowledge. Due to space limitations theactual algorithms will appear in the upcoming works. Additionally, the generalfinite-population Geiringer theorem appearing in the PhD thesis of the first author aswell as in Mitavskiy and Rowe (2005, 2006) has been further strengthened in Section 6(using Theorem 82) with the aim of amplifying the reasons why the above ideas arehighly promising in applications, not mentioning the mathematical importance. (see thespecial case of Theorem 82, namely Theorem 23).

Notes

1. Intuitively, each terminal label in the set S represents a terminal state that we can assign anumerical value to via a function f:S ! Q. The reason we introduce the set S of formallabels as opposed to requiring that each terminal label is a rational number straight away, isto avoid confusion in the upcoming definitions.

IJICC5,1

86

2. The last assumption that all the states in a population are formally distinct (although theymay be equivalent) will be convenient later to extend the crossover operators from pairs tothe entire populations. This assumption does make sense from the intuitive point of view aswell since the exact state in most games involving randomness or incomplete information issimply unknown.

3. This assumption does not reduceany generality since one can choose an arbitrary (possibly amany to one) assignment function f:S ! Q, yet the complexity of the statements of ourmain theorems will be mildly alleviated.

4. In this paper we will need to “inflate” the population first and then take the limit of asequence of these limiting procedures as the inflation factor increases. All of this will berigorously presented and discussed in Subsection 6 and in Section 5.

5. This technical assumption may be altered in various manner as long as the induced Markovchain remains irreducible.

6. The sigma algebra on P P is the one generated by T with respect to the sigma-algebra that isoriginally chosen onF, however in practical applications the sets involved are finite and soall the sigma-algebras can be safely assumed to be power sets.

7. This simple and elegant classical result about complete metric spaces lies in the heart ofmany important theorems such as the “existence uniqueness” theorem in the theory ofdifferential equations, for instance.

8. This notion of a schema is somewhat of a mixture between Holland’s and Poli’s notions.

9. This is an open question, yet its practical importance is highly unclear.

10. The case of homologous recombination has been established in a different butmathematically equivalent framework in Mitavskiy and Rowe (2005, 2006) nonetheless wewill derive it along with the general fact expressed in equation (4) to illustrate the newlyenhanced methodology based on the lumping quotients of Markov chains described inSubsection 3.

11. It is well-known that any two norms on finite dimensional real or complex vector spaces areequivalent so that the choice of the norm is irrelevant here.

12. This is also a particular case of the well-known reversibility property of Markov chains.

13. Evidently contraction rate is not unique with such a notion. Nonetheless, the minimalcontraction rate does exist since it is the inf{kjk is a contraction rate}.

14. Of course, the total variation norm, which is a constant scaling of the L1 norm by a factor of1/2, can be used in place of the L1 norm alternatively.

References

Agrawal, R. (1995), “Sample mean based index policies with O(log n) regret for the multi-armedbandit problem”, Advances in Applied Probability, Vol. 27, pp. 1054-78.

Antonisse, J. (1989), “A new interpretation of schema notation that overturns the binaryencoding constraint”, Proceedings of the Third International Conference on GeneticAlgorithms.

Auer, P. (2002), “Using confidence bounds for exploration-exploitation trade-offs”, Journal ofMachine Learning Research, Vol. 3, pp. 397-422.

Chaslot, G., Saito, J., Bouzy, B., Uiterwijk, J. and van den Herik, H. (2006), “Monte-Carlo strategiesfor computer Go”, Proceedings of the 18th Belgian-Dutch Conference on ArtificialIntelligence.


decision making

87

Ciancarini, P. and Favini, G. (2010), “Monte Carlo tree search in Kriegspiel”, Artificial Intelligence,Vol. 174 No. 11, pp. 670-84.

Coulom, R. (2007), “Comparing Elo ratings of move patterns in the game of Go”, paper presentedat Computer Games Workshop 2007.

Dummit, D. and Foote, R. (1991), Abstract Algebra, Prentice-Hall, Englewood Cliffs, NJ.

Geiringer, H. (1944), “On the probability of linkage in Mendel Ian heredity”, Annals ofMathematical Statistics, Vol. 15, pp. 25-57.

Gelly, S. and Silver, D. (2011), “Monte-Carlo tree search and rapid action value estimation incomputer Go”, Artificial Intelligence, Vol. 175 No. 11, pp. 1856-75.

Kaelbling, L. (1994a), “Associative reinforcement learning: a generate and test algorithm”,Machine Learning, Vol. 15, pp. 299-319.

Kaelbling, L. (1994b), “Associative reinforcement learning: functions in k-DNF”, MachineLearning, Vol. 15, pp. 279-98.

Kee-Eung, K. (2008), “Exploiting symmetries in POMDPs for point-based algorithms”,Proceedings of the 23rd National Conference on Artificial Intelligence.

Mitavskiy, B. and Cannings, C. (2007), “An improvement of the ‘quotient construction’ methodand further asymptotic results on the stationary distribution of the Markov chainsmodeling evolutionary algorithms”, paper presented at IEEE Congress on EvolutionaryComputation (CEC-2007).

Mitavskiy, B. and Rowe, J. (2005), “A schema-based version of Geiringer theorem for nonlineargenetic programming with homologous crossover”, In Proceedings of Foundations ofGenetic Algorithms 8 (FOGA-2005), Springer, Berlin, pp. 156-75.

Mitavskiy, B. and Rowe, J. (2006), “An extension of Geiringer theorem for a wide class ofevolutionary algorithms”, Evolutionary Computation, Vol. 14 No. 1, pp. 87-118.

Mitavskiy, B., Rowe, J., Wright, A. and Schmitt, L. (2006), “Exploiting quotients of Markovchains to derive properties of the stationary distribution of the Markov chain associated toan evolutionary algorithm”, paper presented at Simulated Evolution and Learning(SEAL-2006).

Mitavskiy, B., Rowe, J., Wright, A. and Schmitt, L. (2008), “Quotients of Markov chains andasymptotic properties of the stationary distribution of the Markov chain associated to anevolutionary algorithm”, Genetic Programming and Evolvable Machines, Vol. 9 No. 2,pp. 109-23.

Poli, R. and Langdon, B. (1998), “Schema theory for genetic programming with one-pointcrossover and point mutation”, Evolutionary Computation, Vol. 6 No. 3, pp. 231-52.

Poli, R., Stephens, C., Wright, A. and Rowe, J. (2003), “A schema theory based extension ofGeiringer’s theorem for linear GP and variable length GAs under homologous crossover”,paper presented at Foundations of Genetic Algorithms 7 (FOGA-2003).

Schmitt, L. (2001), “Theory of genetic algorithms”, Theoretical Computer Science, Vol. 259,pp. 1-61.

Schmitt, L. (2004), “Theory of genetic algorithms II: models for genetic operators overthe string-tensor representation of populations and convergence to global optima forarbitrary fitness function under scaling”, Theoretical Computer Science, Vol. 310,pp. 181-231.

Simmons, G. (1983), Introduction to Topology and Modern Analysis, R.E. Krieger Pub. Co.,Malabar, FL.

IJICC5,1

88

Van den Broeck, G., Driessens, K. and Ramon, J. (2009), “Monte-Carlo tree search in poker usingexpected reward distributions”, Advances in Machine Learning, Proceedings Book Series:Lecture Notes in Artificial Intelligence, Vol. 5828, Springer, Berlin.

Vose, M. (1999), The Simple Genetic Algorithm: Foundations and Theory, MIT Press,Cambridge, MA.

Xie, F., Liu, Z., Wang, Y., Huang, W. and Wang, S (2010), Systematic Improvement ofMonte-Carlo Tree Search with Self-generated Neural-networks Controllers, Learning andIntelligent Optimization Book Series: Lecture Notes in Computer Science, Vol. 6073,Springer, Berlin.

Yoon, S., Fern, A. and Givan, R. (2007), “FF-replan: a baseline for probabilistic planning”, paperpresented at International Conference on Automated Planning and Scheduling(ICAPS-2007).

Zinkevich, M., Bowling, M. and Burch, N. (2007), “A new algorithm for generating equilibria inmassive zero-sum games”, paper presented at AAAI Conference on ArtificialIntelligence-2007.

Further reading

Auger, A. and Doerr, B. (2011), Theory of Randomized Search Heuristics: Foundations and RecentDevelopments, World Scientific Publishing Company, River Edge, NJ.

Kocsis, L. and Szepesvari, C. (2006), “Bandit based Monte-Carlo planning”, paper presented at15th European Conference on Machine Learning.

About the authors

Boris Mitavskiy gained his PhD in Mathematics during 2004 in the University ofMichigan, USA. He was then employed as a Postdoctoral Research Fellow towork on the Digital Business Ecosystems Project in the University ofBirmingham Computer Science Department under the supervision of DrJonathan Rowe. In May 2006 he joined the University of Sheffield as a ResearchAssociate on the Amorph Computing Project. Recently he was appointed as aResearch Associate on the Evolutionary Algorithms Approximate Complexity

Project in the Department of Computer Science, Aberystwyth University. His research interestsare in applications of category theory, Markov chains, random walks on groups, large deviationinequalities, and other mathematical structures and concepts to the theory of evolutionarycomputing, theory of small-world and scale-free networks, random graph theory,gene-regulatory networks and to other complex systems dealing with artificial intelligenceand genetics. Boris Mitavskiy is the corresponding author and can be contacted at: [email protected]

Jonathan Rowe is a Professor in Natural Computation at the University ofBirmingham. He obtained his PhD in 1991 from the University of Exeter andworked in industry for a couple of years. He then returned to academia where,amongst other things, such as multi-agent systems, artificial life and othercomplex adaptive systems he has studied the theory of genetic and otherevolutionary algorithms. In addition to publishing many papers on the subjects,he co-authored the graduate-level text book Genetic Algorithms: Principles and

Perspectives with Colin Reeves. He has also helped organize several workshops andconferences in the field, including “Parallel Problem Solving from Nature”, “Foundations ofGenetic Algorithms” (FOGA) and the Dagstuhl seminars on the Theory of EvolutionaryComputation.


decision making

89

Chris Cannings is a Professor of Mathematics in the Division of GenomicMedicine, University of Sheffield. His research interests include deterministicand stochastic modeling in evolutionary biology, population, molecular andhuman genetics. Current projects lie within genetic epidemiology, evolutionarygames, genomics and proteomics, and the theory of random graphs,combinatorics and stochastic processes. He has published over 100 researchpapers and is on the editorial boards of the Journal of Applied Probability and

Advances in Applied Probability, is a member of the Scientific Advisory Board ofMyriad Genetics,an Expert for INSERM, a member of the EPRSC Peer Review Panel and of the MRC Panel ofExperts.

IJICC5,1

90

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints

ijicc a version of geiringer-like theorem for decision ...jer/papers/version.pdf · findings –...

Documents