the conditional markov chain in a genetic context

14
Journal of Genetics Volume 63, Number 2 Dec. 1977 THE CONDITIONAl. MARKOV CHAIN IN A GENETIC CONTEXT PREM NARAIN Institute of Agricultural ResearchStat~stics, Library Avenue New Delhi 110 01Z Population genetics deals with a study of changes which the gene pool of a Mendelian population may undergo when it is exposed to systematic forces such as selection, mutation and migration. When the population is of limited size, the sample of genes transmitted to the next generation can deviate randomly from the true genetic composition of the parental generation and these random changes can accumulate over several generations. In other words, the change in the gene frequency over time due to systematic as well as random forces is a stochastic process. Usually the behaviour of the gene frequency in a generation depends only on its value in the immediately preceding generation so that the process is Markovian in structure. It can be "studied either approximately as a diffusion process in which gene frequency as well as .generations are treated as continuous or strictly as a finite Marker chain in which gene frequency is a dis- crete random variable and generations are discrete. In a series of investigations [Narain (1969}, Narain and Robertson {1969}, Roberts0n and Narain (1971}, Narain (1971a}], it was shqwn how the process can be treated as a finite Markov chain and how the use of a transition probability matrix can be helpful in a genetic context..In particular, it was shown how to calculate the probability of fixation of a gene as well as the first two moments of the distribution of time taken for its fixation, disregarding the cases in which it is lost. The calculation of the first two moments of time until fixation of a particular allele was also attempted by the diffusion approach [Narain (1970), Narain (.1974)]. The last investigation as well as that of Ewens (1973).have demonstrated that invoking a conditional process facilitates the calculation of the moments of the distribution of time taken for the fixation of a gene. This aspect is intimately connected with the concept of average number of generations required to attain limits of genetic improvement due to artificial selection which was first introduced in Narain (1969) and later elaborated in Narain (1971b). Although the diffusion approach to the conditional process is completely documented in Ewens (1973) and Narain (1974), the transition matrix approach is only briefly indicated in the former reference. The purpose of this paper is therefore to describe the conditional Marker chain and demonstrate its application In a genetic context relating to response to selection in finite populations. In addition, the theory has been applied to study the effect of linkage on the mean and. variance of time until fixa- tion of a gamete in populations practising self-fertilization. 49

Upload: prem-narain

Post on 25-Aug-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Journal of Genetics Volume 63, Number 2 Dec. 1977

THE CONDITIONAl. MARKOV CHAIN IN A G E N E T I C C O N T E X T

P R E M NARAIN Institute of Agricultural ResearchStat~stics, Library Avenue

New Delhi 110 01Z

P o p u l a t i o n g e n e t i c s d e a l s wi th a s tudy of c h a n g e s w h i c h the gene pool of a M e n d e l i a n p o p u l a t i o n m a y u n d e r g o when it is e x p o s e d to s y s t e m a t i c f o r c e s s u c h as s e l e c t i o n , m u t a t i o n and m i g r a t i o n . When the p o p u l a t i o n is of l i m i t e d s i z e , the s a m p l e of genes t r a n s m i t t e d to the next g e n e r a t i o n c a n d e v i a t e r a n d o m l y from the true genetic composition of the parental generation and these random

changes can accumulate over several generations. In other words, the change in the gene frequency over time due to systematic as well as random forces is a stochastic process. Usually the behaviour of the gene frequency in a generation

depends only on its value in the immediately preceding generation so that the process is Markovian in structure. It can be "studied either approximately as a diffusion process in which gene frequency as well as .generations are treated as continuous or strictly as a finite Marker chain in which gene frequency is a dis- crete random variable and generations are discrete. In a series of investigations

[Narain (1969}, Narain and Robertson {1969}, Roberts0n and Narain (1971}, Narain (1971a}], it was shqwn how the process can be treated as a finite Markov chain and how the use of a transition probability matrix can be helpful in a genetic context..In particular, it was shown how to calculate the probability of fixation of a gene as well as the first two moments of the distribution of time taken for its fixation, disregarding the cases in which it is lost. The calculation of the first two moments of time until fixation of a particular allele was also attempted by the diffusion approach [Narain (1970), Narain (.1974)]. The last investigation as well as that of Ewens (1973).have demonstrated that invoking a conditional process facilitates the calculation of the moments of the distribution of time

taken for the fixation of a gene. This aspect is intimately connected with the concept of average number of generations required to attain limits of genetic improvement due to artificial selection which was first introduced in Narain (1969) and later elaborated in Narain (1971b). Although the diffusion approach to the conditional process is completely documented in Ewens (1973) and Narain (1974), the transition matrix approach is only briefly indicated in the former

reference. The purpose of this paper is therefore to describe the conditional

Marker chain and demonstrate its application In a genetic context relating to response to selection in finite populations. In addition, the theory has been applied to study the effect of linkage on the mean and. variance of time until fixa- tion of a gamete in populations practising self-fertilization.

49

50 C o n d i t i o n a l M a r k o v C h a i n in g e n e t i c s

CONDITIONAL MARKOV CHAIN

A s s u m i n g no m u t a t i o n , c o n s i d e r a f i n i t e p o p u l a t i o n of g a m e t e s of s i z e 2N ( co r r e spon .d ing to a popu la t ion of d ip lo id i n d i v i d u a l s of s i ze N) and a s i n g l e locus w i t h t w o a l l e l e s A and a. Such a p o p u l a t i o n c a n a s s u m e (ZN+I) s t a t e s ~20, E1 . . . . E Z N _ I , E 2 N , . the s t a t e E i r e p r e s e n t i n g the s t a t e of i A g e n e s and (2N-i) a g e n e s . The f r e q u e n c y of A, deno ted by x i fo r the populat[ocn in s t a t e Ei , can t h e n take va lues i /2N, i= 0,1 . . . . ( 2N-1 ) ,ZN. When x i= 0 or 1 fo r i = 0 and 2N r e s p e c t i v e l y , the p o p u l a t i o n is s a id to be f ixed for A or a r e s p e c t i v e l y . But when x i ~ 0 o r 1, the p o p u l a t i o n is sa id to be s e g r e g a t i n g for A a nd a a l l e l e s . Such a g e n e t i c s i t u - a t ion c o r r e s p o n d s to a f i n i t e a b s o r b i n g M a r k o v C h a i n wi th two a b s o r b i n g s t a t e s

:E o and E2N and (2N-1) t r a n s i e n t s t a t e s E 1 , E 2 . . . . EZN_ 1. A d e t a i l e d d e s c r i p - t ion of th i s cha in , in such a c o n t e x t , is g i v e n i n N a r a i n (1971a). If ~ j r e p r e s e n t s one s tep t r a n s i t i o n p r o b a b i l i t y f o r the s y s t e m to m o v e f r o m E i to E j , the t r a n s i - t ion p r o b a b i l i t y m a t r i x ~ of o r d e r (2N+l) x ] Z N + I ) take~ the f o r m

Fi o o -]

_ _0 1

where__Q is of o r d e r ( 2 N - 1 ) x ( Z N - 1 ) , g iv ing the o n e - s t e p t r a n s i t i o n p r o b a b i l i t i e s a m o n g s t the t r a n s i e n t s t a t e s on ly , _Po and --PZN a r e c o l u m n - v e c t o r s of o r d e r ( 2 .N-1 )x l r e p r e s e n t i n g the one~-step t r a n s i t i o n p r o b a b i l i t { e s f r o m a t r a n s i e n t s t a te to E o and E2N r e s p e c t i v e l y . T h e v e c t o r s of the e v e n t u a l p r o b a b i l i t i e s of f i x a t i o n of A and a, deno t ed by_U and L a r e r e s p e c t i v e l y g iven by

_U = ( ! - _Q)-I _PZN (Z)

L = ( i - @)-1 _% (3)

C o n s i d e r now a f i n i t e a b s o r b i n g M a r k o v C h a i n c o n d i t i o n a l to the e v e n t u a l a b s o r p t i o n in~TZN. We then have only one a b s o r b i n g s t a t e G2N and ( 2 N - l ) t r a n - s i en t s t a t e s E1 . . . . E2N.1 f r o m which a b s o r p t i o n is on ly p o s s i b l e in EZN, L e t p!.G1) be the o n e - s t e p t r a n s i t i o n p r o b a b i l i t y fo r the s y s t e m to m o v e f r o m E i to

t j Ej r e l a t i v e to the even t of u l t i m a t e a b s o r p t i o n in EZN. D e n o t i n g by Ui, the i - t h e l e m e n t of v e c t o r U, the e v e n t u a l p r o b a b i l i t y of f i x a t i o n of A when i n i t i a l l y the p o p u l a t i o n was in s t a t e E i and f o l l o w i n g K e m e n y m n d S n e l l (1960), we c a n d e f i n e

w i t h U 2 N = 1. We then have the c o n d i t i o n a l one s t e p t r a n s i t i o n p r o b a b i l i t y m a t r i x p ( C l ) , of o r d e r 2 N x 2 N g i v e n by

[~ (C 11 ~ ( C I I ] =p(Cl) = :ZN (s)

_ 1

P r e m Na r a i n 51

where Q(CI) i.s of order (ZN-I)x(ZN-I), giving the one-step transition probabi- lities amongst the transient states only, conditional to fixation in EZN and p(Cl)

--ZN is the COlL~mn vector of order (ZN-I]xl representing the one-step transition probability from a transient state to EZN relative to the eventual absorption in EZN. The corresponding t-step transition matrix is given by

-- Q(cl ) ( t ) ~--ZN (t)

P '~'=(ci)'t' = 0, i (6)

and the use of C h a p m a n - K o l m o g o r o v [ F e l l e r (1951)] f o r the C o n d i t i o n a l M a r k o v Cha in g ives

__p(CI)(t) = [__p(CI)~ t (v)

so tha t

__Q(CI)(t) = E=Q(CI)It

p(CI)zN (t) = II_(=Q(CI)(t)] ~I.Q(CI)] -I

(8)

p(cl) ZN (9)

F o l l o w i n g Na. ra in (1971a), the c o l u m n vec tor_U(C1)( t ) of the p r o b a b i l i t y of f i x a t i o n of A by the t - t h g e n e r a t i o n r e l a t i v e to the e v e n t u a l f i x a t i o n for A is o b t a i n e d as

D(C 1 ) --_ u(cl)(t) = LZN (t) (I0)

As t tends to infinity, (~(Cl))t. tends to zero so that the vector of the eventual probability of fixation for A, for the conditional process, is given by

u (el) [~ -c~ (cl)-['ID('c I) = | ~_ZN ( l l )

Writ ing D I = d l a g ( U I , U z , , . . , U z N _ I ) of order (ZN- I ) x (ZN- I ) , we f ind

_Q(cl) = _D[I~__DI (IZ) t

= D~ 1 _-D1 (13)

[J__-~(cl)] - I = D~IL_I-=Q)'I DI (14)

p(Cl) D~I -ZN -- -PZN (15)

It, therefore, follows that

tr(cl) ]I (16) _ =D U = ~_

52 Conditional Markov Chain in genetics

a c o l u m n v e c t o r of u n i t i e s , as e x p e c t e d d u e t o the c o n d i t i o n i n g of the p r o c e s s . Fu r the . r , we get

u(CI'(t} = il- =D:I =Qt DI~ e - {17)

for working out the probabilities of fixation of A by the t-th generation.

S i m i l a r l y , if we c o n s i d e r a f in i t e a b s o r b i n g M a r k o v C h a i n c o n d i t i o n a l to the e v e n t u a l a b s o r p t i o n in E~, we have a g a i n one a b s o r b i n g s t a t e and (ZN-1) ~ r a n s i - eat s t a t e s . Def in ing t h e c o r r e s p o n d i n g o n e - s t e p t r a n s i t i o n p r o b a b i l i t y p ~ f 0 ) = P i j L j / L i with L o= 1 and p r o c e e d i n g in the s a m e way as above , we get

i pl 0' 1 P l C ~ = _ (co)(t) _.9(co)(t)J (191

w h e r e , w r i t i n g =oD = d i a g ( L 1 , L 2 . . . . . LZN _1),

2(c~ = _Do I 9 t _D o (z o)

and

= L(C~ , (zl)

giving the probabilities of fixation of a by the t-th generation in the conditional process with L (C0) = e as usual.

CONDITIONAL EXPECTED RESPONSE "DUE TO ARTIFICIAL SEL'ECTION

The random change in gene frequency due to finite population size has im- .p~rtant applications in animal breeding as shown by Robertson {1960). The pro- bab, ility of fixation of the desirable allele can be converted into the expected response in the character under selection at the limit by making use of the rela- tion between the selective advantage of a gene with its effect on the metric cha- racter under selection given first by Haldane (1931). Under the assumption of independent segregation of several loci affecting the character, the expected response at the limit, expressed in relation to the initial genetic standard devi- ation, is a functionof Nih (the product of population size, intensity of selection and the square-root of heritability) and the initial frequency, p of the desirable allele, assumed equal at all loci. Narain (1971a) showed that this expected limit of response to selection, expressed in terms of the vector of changes in the frequency of desirable allele and denoted by E(R) is given by

E(R_) = ~ - 9) -1 E(8.p_) (zz)

w h e r e E ( ~ p ) is the v e c t o r of i n i t i a l e x p e c t e d r e s p o n s e s . A l s o , the the v e c t o r of

Prem Narain 53

the e x p e c t e d r e s p o n s e by the t - t h g e n e r a t i o n , E[R(t ) ] was show n to be equa l to

E[R_(t)]= (I __Qt)E(_R) (23)

Invoking a conditional process of selection along the same lines as in the previous section., we get the vector of conditional expected response due to selec- tion by the t-th generation as

E[_R(C)(t)]- - [_I-(_Q(CI))t]E(R (C)) (Z4)

r e l a t i v e to the e v e n t u a l f i x a t i o n of A r e g a r d e d a s a d e s i r a b l e a l l e l e . T h e e x p r e s - E(R(C)), however, becomes, as expected slon for

E(~ (c)) = s - _Q(cl))_l E(Bs

= (~- o(Cl)) -I = -zNP(CI) - p_(0) (25)

where p_(.0) is the vector of the frequency of desirable allele in the initial' popula- tion. We then have,

PROBABILITY GENERATING FUNCTION OF THE DISTRIBUTION OF TIME UNTIL FIXATION OF A PARTICULAR ALLELE

Le t T i be the t i m e t a k e n to f i r s t r e a c h f i x a t i o n of A, g i v e n tha t the i n i t i a l p o p u l a t i o n c o n t a i n s i A g e n e s and,{2N-i) , a g e n e s r e l a t i v e to the h y p o t h e s i s of eventLtal a b s o r p t i o n i n E Z N . L e t S[ t) be the p r o b a b i l i t y t ha t T i = t . T h e n c l e a r l y ,

S!I) = p(Cl) (Z7) t - i , 2 N

The p r o b a b i l i t y g e n e r a t i n g f u n c t i o n 7r!C1)(z/,. in t h i s c a s e , c a n then be e x p r e s s e d a s

CO

ll+ Z ".ts! ~ Z

co 2N=I = . ~(CI) zt.l

---i,ZN + z X Z t=2 i= I

pi(Cl)s(t-*) k k

_- ~p!Cll ZN-I i,ZN +" Z P~k el)

i= i

In matrix notations, we can write it as

(C "m k 1)(z) (z 8)

54 C o n d i t i o n a l M a r k o v C h a i n in g e n e t i c s

~(Cl)(z)-- z(l-z__(Cl)) -I (l-~(Cl)) e (zg)

where z is still a sca|ar and 7r(Cl)(z) is the vector of probability generating func- tions.conditional to fixation of A. Using the relationship between functions of Q(CI) and Q given in Section Z, we get

-Tr(Cl)(z) = z__Dll (__I-zQ) "1 (I - _Q) U (30)

The vector of the first moments of the distributions of time until fixation of A is obtained by differentiating _7r(Cl)(z) once and putting z = I. This gives

E(T I) = I (d/dz) ~(C l)(z) I z= l = I (d/dz)=D'l I (z" II-Q)- 1 {!_9)U I z= 1

= _

= 1 3 1 )

It i s e a s y to s e e t h a t t h i s i s a l s o e q u i v a l e n t to ~ = = Q ( C I ) ) = I e , T h e v e c t o r of the s e c o n d f a c t o r i a l m o m e n t i s g i v e n by

Z E(TI)-E(T-I) = l(dZ/dzZ)--vr(Cl)(z)Iz=1

: ZD[I[it-Q)'Z-Q-Q) - l ] _ _ = _U

as

(3z}

Using (3.1) and (32), the vector of the second moment about origin is obtained

E(T~) = =D-11[Z(/I-=O)'Z-(_I-_Q) -1 ] U (33)

W i t h t h e h e l p of t he e l e m e n t s of t h e v e c t o r s g i v e n b y (31) and (33) , o n e c a n o b t a i n t he v a r i a n c e s o f t h e t i m e u n t i l f i x a t i o n of A.

In a s i m i l a r m a n n e r , we o b t a i n t h e v e c t o r of t h e p r o b a b i l i t y g e n e r a t i n g f u n c - t i o n s ~ ( C 0 ) ( z ) , of t n e d i s t r i b u t i o n s of t i m e u n t i l f i x a t i o n of a . T h i s i s g i v e n by

_vr(C0)(z ) = z~ .z=Q(C0)) - I (_I__Q(C0) ) e

= z DO I (_I- z _Q)" I (_I-_Q)L (34)

T h e v e c t o r s of t he f i r s t a n d s e c o n d m o m e n t s in t h i s c a s e a r e g i v e n by

E(T0) = __DSI(I-Q)-IL (35)

E(T~) = D ~ I [ 2 ~ - _ Q ) - Z - ( I - Q ) -11 L (36)

F r o m w h i c h one c a n ge t t h e c o r r e s p o n d i n g v a r i a n c e s of t h e t i m e u n t i l f i x - a t i o n of a .

Prem Narain 55

EIGEN-ROOTS AND EIGEN-VECqTORS OF THE CONDITIONAL MARKOV C H A I N W I T H B I N O M I A L TRANSITION PROBABILITIES

It is apparent from the above matrix derivations that for applying this theory, the element of p(Cl), p(C0) D, and Dn are required to be known. Analytically, this involves working out the eigen-roots and eigen-vectors of p(C ) and p(C0). Since the conditiona[ transition matrices _p(Cl) and p(C0) depen~ on the condi- tional transition matrix P and since D 1 and _D O are shown to be certain functions of__P [Hardin (197.1a)], the problem boils downto a Study of _P or its derivative ~. For specifying the elements of =P, we consider, as an example, the binomial transition probabilities. This case is commonly known as Wright's model [Wright'q1931)]. It assumes absence of selective forces and considers only random drift based on binomial sampling with a constant population size N. The eigen-roots and vectors of__P in such a case are also known [Feller (1951)]. Ex- tension of such a model so as to involve selection in the context of limits Df res- ponse to selection has been extensiyely studied by Hardin and Rohertson (1969). However, it is still of interest to study the eigen-roots and eigen-vectors of the conditional transition matrices p(Cl) and p(C0) For this purpose we follow the approach given in Fel ler (1951).

W i t h b i n o m i a l s a m p l i n g a n d no s e l e c t i o n , w e h a v e

P i j = (ZjN) �9 2 N - j �9 ~ ( t - p i ) i= 0, I .... 2N

j = O, 1 . . . . 2N (37)

i= l,Z ..... ZN j= I, Z ..... ZN (38)

where Pi = i/2N.

The eigen-roots of ~p(Cl) are obtained by solving the characteristic equation

I__e(cl).x=~1 = o (39)

It is found that the roots are given by

~(r C) = ( 1 - r / Z N ) ( Z r N ) r � 8 2 / ( 2 N ) r , r = 0 , 1 . . . . ( Z N - 1 ) (40)

For r= 1.2 .... (2N-I), the roots are the same as that of Q(CI) and similarly as t h a t of__Q(C0), w h i c h in v i e w of (12), a r e t he s a m e a s t ha t=of ~ v i z .

X r = (2rN) r w./(ZN) r , r = 1 . . . . 2 N - 1 (41)

W r i t i n g J(v) = j ( j - 1 ) . . . ( j - v 4 ~ l ) , we g e t

2N I j= 1

(CI). v-I Pij JCv) = [(ZN)(v+I)P~+ v(ZN)(v) Pi ]/ZN (4z)

56 C o n d i t i o n a l M a r k o v Chain in g e n e t i c s

T h i s s h o w s tha t , t a k i n g v= 1, the e x p e c t e d v a l u e of the g e n e f r e q u e n c y , wi th p u r e r a n d o m d r i f t , wi l l not be s i m p l y Pi but i n s t e a d g i v e n by

ZN (CI) (j/ZN)= Pi + (I'Pi)IZN Pij

j= 1 (43)

so tha t

E(Spi ) = (l-Pl)/gN (44)

S i m i l a r l y , wi th v=2, we ge t

Z

T h i s s h o w s tha t , u s i n g (43), the v a r i a n c e of the c h a n g e in t he g e n e f r e q u e n c y , wi th p u r e r a n d o m d r i f t , w i l l not be s i m p l y p i ( 1 - P i ) / Z N but . i n s t e a d , g i v e n by

V(~Pi) = (i/ZN)(l-I/ZN)Pi(l-Pi) (46)

From (44) and (46) it is evident that for a population so barge that (I/ZN) 2 is neg- ligible, the mean and variance of the change in gent frequency due to random

drift, in the conditional process, are (l,Pi)/2N and Pi(l-Pi)/ZN as against 0 and Pi(l-Pi)/ZN respectively in the unconditional case. This is exactly what we get from the diffusion approach for the pure random drift case as shown in Narain (1974). It is however interesting to note that for the exact process, conditioning the pro- cess increases the mean hut decreases the variance.

C o r r e s p o n d i n g to e a c h c h a r a c t e r i s t i c r o o t X (C) g i v e n by (40), t h e s y s t e m os r

linear equations

ZN I) Z ~r ~) i• I, 2, ., 2N (47) P t j X j r = X Xir , . .

j= I

a d m i t s a n o n - t r i v i a l s o l u t i o n x r -- ( X l r . . . X(ZN)r) known a s t he r i g h t - h a n d e i g e n - v e c t o r f o r X(rC). It i s , t h e r e f o r e , a l w a y s p o s s i b l e to f ind c o n s t a n t s a 0 , a l , . . . a r (not a l l of t h e m z e r o ) s u c h t ha t

r

Xjr = Z avJ(v) (48) v'~O

is a s o l u t i o n of (47). S u b s t i t u t i n g (48) in (47) a n d u s i n g (42), we g e t

P r e m N a r a i n 57

r

a v [ ( Z N ' v ) p i + v ] ' ( g N ) v p ~ - I = X{rC) I a v i ( v ) 2N v=0 v=0

i= 1 ,Z . . . . . ZN (49}

S i n c e c o e f f i c i e n t s of a v on the b o t h s i d e s of (49) a r e p o l y n o m i a l s of d e g r e e v i n i , it is possible to write

V

v = Z v i (s) Pi Cs, s=O

(5o)

! w h e r e C s , v s a r e i n d e p e n d e n t of i . S u b s t i t u t i n g (50) in (49) a n d e q u a t i n g t h e c o e f - f i c i e n t s of i ( t ) f o r t = 0,1, . . . . r , we ge t ,

r x(C)a r t = a t (ZN) ( t + l I C t t / ( Z N ) +

v = t + l a v [ ( 2 N "v)Ctv+VC t, (v - 1 ) ] ( 2 N ) J ( Z N ) ,

(t" O, I ..... r) (51)

If we t a k e t= r i n (51) a n d u s e (40) a s w e l l a s (50), [t i s f o u n d t h a t (51) i s s a t i s f i e d fo r v= r a n d a r b i t r a r y a r . We c a n t h e n put a r = 1 in ( 5 i ) f o r t= ( r - l ) g i v i n g a r _ 1. T h i s p r o c e d u r e a l l o w s us to c a l c u l a t e a r . 2 . . . . . a 1 a n d a 0 in s u c c e s s i o n , g i v i n g _ t h e r e b y t he j - t h e l e m e n t of the r i g h t - h a n d e i g e n - v e c t o r c o r r e s p o n d i n g to ~(r C) g i v e n by (48) .

Using the above procedure of obtaining the eigen-vectors for r= 1,2 and 3, we find that for the three eigen-roots,

c)

X (c) 15z)

x c)

= ( 1 - 1 / Z N )

= (1- 1 / Z N ) ( 1 - 2 / 2 N )

= ( 1 - I / Z N ) ( 1 - Z / Z N ) ( 1 - 3 / 2 N )

the v e c t o r s a r e , r e s p e c t i v e l y , g i v e n b y

x j l = ( 1 - p j )

x jg = ( 1 - p j ) ( 1 - Z p j )

x j3 = { 1 - p j ) [ ( 7 - N - 1 ) / ( 1 0 N - 6 ) - p j { 1 - p j ) ] (53)

wi th j = 1 , 2 , . . . , 2N, T h e s e r e s u l t s c a n be c o m p a r e d w i t h t he c o r r e s p o n d i n g r e s u l t s of the u n c o n d i t i o n a l c a s e d e t a i l e d in N a r a i n a n d R o b e r t s o n (1969) . A l - t h o u g h t h e r o o t s a r e the s a m e , the e l e m e n t s of the v e c t o r s a r e now ( 1 / p j ) Of t h o s e in the u n c o n d i t i o n a l c a s e . A l t e r n a t i v e l y , s i n c e k (r C) is a n e i g e n v a l u e of _Q(C1) w i th t he a s s o c i a t e d r i g h t e i g e n - v e c t o r , Xr , we h a v e

58 C o n d i t i o n a l M a r k o v C h a i n in g e n e t i c s

Q(CI)x_ r = ~k~C)xr , r: l,Z ..... ZN.-I

In view of (12), we get

_Q =D I x r = x(rCSDI X_r , r= 1,2 . . . . . 2N-1

This shows that )k(rC) is also an eigenvalue of__(2 with the associated right-eigen- vector Zr=__Dlar, so that by definition of __DI, the j-th ele'ment of x r is ZN/j times as l a r g e as the j - t h e l e m e n t of ~r i . e . t hose c o r r e s p o n d i n g to the uncondi l ; ional case.

EFFECT OF LINKAGE ON THE MEAN AND VARIANCE OF TIME UNTIL FIXATION OF A GAMETE IN SELFED POPULATIONS

The effect of linkage on the probability of fixation of a gamete in @opulations practising self-fertilization was studied inNarain (1971a) which can be consulted for details. The case of self-fertilization corresponds to the situatLon when N=I. The population is sub-divided into lines from each of which two gametes are chosen to form one mature individual only. With two linked loci each wlth two

alleles A-a and B-b respectively with recombination probabilitylr, with s= l-r, and assuming no mutation, there are 10 states of the system corresponding to 10 types of lines out of which four homozygous ones represent absorbing states and the remaining six are trasient states. Amongst the transient states, the two corresponding to two double heterozygotes, AB/ab (coupling5 andAb/aB (repul- sion) are important from the point of view of linkage, the remaining four involving single heterozygotes only. Taking the P-matrix of the process and the_UAB vec- tor of the probabilities of fixation of gamete AI3 from Narain (1971a), the p(CII) matrix for the process, conditional to absorption inAB/AB, has the form

_,_r ] = (545 I01 I- �9

where

~f)' = (I/Z, l/Z, 0,0, sZ(l+ZrS/Z,r(l+Zr)/45

__9(cll) = I 1/Z 0 0 0 0 0 ] i i /z o o o o

�9 0 0 0 0 0

0 0 0 0 0 I rs(l+Zr)/Z rs(l+Zr)/g 0 0 sZ/Z r 3 [_s(l+Z r ) /4 sll+Z r ) /4 0 O r / 4 sZ/Z 'I

(555

(56)

The ordering of the states being AB/Ab, AB/aB, Ab/ab, aB/ab, AB/ab and Ab/aB. With the help of the results given in Narain {1971a 5 and using (31) ~ {335, the vec-

tors of mean as well as second moment about origin of time until fixation of AB

P r e m N a r a i n 59

are respectively given by

!

~(_TAB ) = [Z,Z 0 ,0 ,~(c) u(r) I (57) ' A B ' A]3 J

E(_~ 'B) = [6, 6, 0, 0, 8 (c) [3 ( r ) ] (58) ' -AB ' "AB ~

where e (c)'AB, I~AB~(C) corresgondin ~ L m to the situatkon where the population is initially in the coupling phase, .are given by

u (c) = (l+2r)(l+4rs)/(l+Zrs)+(l-Zr)/(l+Zr) (59) AB

~(c) = (l+2r)(3+26rs+Z4rZsZ)/(l+Zrs) 2 +(l-Zr)(3-Zr)/(l+Zr) z (60) VAB

and u~r~, ~(Ar~ corresponding to the situation when the poptllation is initially in

the repulsion phase, are given hy

u (r) = (l+Zr)(l+4rs)/Zr(l+Zrs)-(l-Zr)/Zr(l+Zr) (61) AB

~(r) = (l+Zr)(3+Z6rs+24rZsZ)/Zr(l+Zrs) 2- (l-Zr)(3-Zr)/Zr(l+2r) Z (62) AB

In a sit-nilar manner, we get the corresponding vectors for moments of time until fixation of Ab, aB and ab. It is found that. when the population is initially in the coupling phase, the means and second moments aboLzt origin of time until fixation of Ab as well as aB are the same as that given by (61) and (62} respectively where- as when the initial population is in repLllsion phase, these are correspondingly given by (59) and (6'0). These results for the time until fixation of ab are exactly the same as that until fixation of ~AB given by (59) to (62). In each case, the variance of time untiVfixatioh is calculated by subtracting the square of u from ~.

It is interesting to note that tr~ean and variance of time until homozygosity can further be obtained by multiplying the mean and variance of time until fixation of a ga,-nete by the corresponding probability of fixation and adding over the four possible cases. For the situation when the initial population is in the coupling

phase, these are given by

(63}

Var(T) = U(f )a(c) +,(c) s(C)+u(C)a(C)+u(C)0(c) [E(T)]2 VAB ~Ab VAb aB WaB ab ab -

= Z (I+i 0rs-8rZsZ)/(l+Z rs) Z (64)

U (c) u(C)_ I/Z(l+Zr) and u(C)= U(~ = r/(l+Zr) [Narain (1971a)]. Be- w h e r e AB - - ab - c a u s e of s y m m e t r y , (63) and (64} h o A ~ f o r the r e p u l s i o n p h a s e a l s o . T h e v a l u e s of E(T) and V a t ( T ) o b t a i n e d h e r e a r e e x a c t l y the s a m e as t h o s e o b t a i n e d by P u r i (1968) who o b t a i n e d t h e m d i r e c t l y wi thout work ing out t he t i m e unt i l f i x a t i o n of a p a r t i c u l a r g a m e t e .

60 C o n d i t i o n a l M a r k o v C h a i n [n g e n e t i c s

T h e e f f e c t of the r e c o m b i n a t i o n f r a c t i o n ' r on the m e a n and the s t a n d a r d deviation of time until fixation was numerically studied with the help of expres-

sions (59) to (6Z). The results for the case when the initial population is in coupling phase are presented in Table I. For the case when the initial popula-

tion is inrepulsion phase, the results are obtainable from the Table by inter- changing either A and a or B and b.

T a b l e 1 : M e a n and s t a n d a r d d e v i a t i o n of t i m e ( n u m b e r of g e n e r a t i o n s ) unt i l f i x a t i o n of a g a m e t e fo r the i n i t i a l po~Julation with h e t e r o z y g o t e s in coupling p h a s e .

AB or ab Ab o r aD r M e a n s . d . M e a n s . d .

0.0000 2.0000 1.414Z 4.0000 2.0000 0.0625 2.0208 1.4337 3.7244 1.8622 0.1250 2.0743 1.4753 3:4973 1.7682 0.1875 Z.1506 1.5Z13 3.3103 1.7078 0.2500 2.2424 1.5635 3.1516 1.6709 0.3125 Z.3441 I'.5959 3.0120 1.6486 0.3750 2.4513 1.6187 2;8873 1.638Z 0.4375 2.5602 1.6306 2.7734 1.6354 0.5000 2.6666 1.6329 2.6666 1.63Z9

It is found tha t when the i n i t i a l p o p u l a t i o n . i s in c o u p l i n g p h a s e , the e f f e c t of l i n k a g e is to d e c r e a s e the a v e r a g e and s t a n d a r d d e v i a t i o n of the n u m b e r of g ' ene - r a t i o n s un t i l f i x a t i o n f o r a coup led g a m e t e (AB o r ab) but t o i n c r e a s e t he s a m e fo r a r e p u l s e d g a m e t e (Ab or aB) . It m a y be no t ed tha t when we c o n s i d e r i n d e - p e n d e n t l y s e g r e g a t i n g l o c i ( r = 0.50):and f i x a t i o n of c o u p l e d g a m e t e s wi th i n i t i a l popu la t i on in c o u p l i n g p h a s e (or of r e p u l s e d g a m e t e s wi th i n i t i a l p o p u l a t i o n in r e p u l s i o n phase ) , the a v e r a g e t i m e to f i x a t i o n is abou t 1.33 t i m e s t h a t f o r c o m - p l e t e l y l i n k e d l o c i w h e r e a s the c h a n c e of f i x a t i o n is ha l f of i ts v a l u e f o r the c o m p l e t e l y l i n k e d c a s e . As e x p e c t e d , wi th i n d e p e n d e n t s e g r e g a t i o n , t h e a v e r a g e t i m e to f i x a t i o n of c o r r e s p o n d i n g r e p u l s e d (o r c o u p l e d ) g a m e s t s , t h e c h a n c e of f i x a t i o n b e i n g the s a m e viz . ,0 .25 in a l l the f o u r c a s e s . But f o r e v e r y t i g h t l i n k - age (r a p p r o a c h i n g z e r o ) a v e r a g e t i m e to f i x a t i o n of a r e p u l s e d g a m e t e wi th i n i t i a l p o p u l a t i o n in coup l ing p h a s e (or of a c o u p l e d g a m e t e wi th i n i t i a l p o p u l a t i o n in r e p u l s i o n phase) t ends to a l i m i t i n g va lue of 4 wi th c h a n c e of i t s f i x a t i o n b e c o m i n g v e r y v e r y s m a l l . As r e g a r d s v a r i a b i l i t y in t i m e to f i x a t i o n , a c h a r a c t e r i s t i c f e a t u r e , t r u e fo r a l l s i t u a t i o n s , is tha t a l a r g e r m e a n is a c c o m p a n i e d by a l a r g e r s t a n d a r d d e v i a t i o n .

S U M M A R Y

A t h e o r y of the s t o c h a s t i c c h a n g e in the f r e q u e n c y of a g e n e in f i n i t e p o p u l a -

t i ons c o n d i t i o n a l to i t s e v e n t u a l f i x a t i o n has b e e n d e v e l o p e d e m p l o y i n g a C o n d i -

Prem N a r a i n 61

ditional Markov Chain. The probability generating f~nctlon of the distribution of time until fixation of a particular allele as well as the eigen-roots and eigen- vectors of the conditional process with binomial transition probabilities have been studied. The theory has been applied to investigate the effect of linkage on

the mean and standard deviation of time until fixation of a gamete in populations practising self-fertilization. It has been found that linkage decreases or in- creases the average and standard deviation of time to fixation of a coupled gamete according as the initial population consists of a coupling or repulsion heterozygote respectively.

A C K N O W L E D G E M E N T S

The author is grateful to Professor E. Pollak, Iowa State University, U.S.A. as well as the referee for helpful suggestions which have considerably'improved the presentation.

REFERENCES

EWENS, W.G., 1973. Conditional diffhs~ion processes in population genetics.

Theor. Pop. Bio. 4, 21-30.

FELLER, W., 1951. Diffusion processes "in 'genetics. Proc. Znd Berkeley Symp. Math. Stat.ist. and Prob., pp. ZZ7-Z46, Univ. of California Press,

Berkeley, C.A.

HALDANE, J.B.S., 1931. A mathematical theory of natural and artificial selec- tion. VII. Selection intensity as a function of mortality rate. Proc. Camb. Phil. Soc. 27, 131-136.

KEMENY, J.G. and SNELL, J.L., 1960. Finite Markov Chains. Princeton, New York.

NARAIN, P., 1969. Response to selection in finite populations. Ph.D. thesis, Edinburgh University.

NARAIN, P., 1970. A note on the diffusion approximation for the variance of the number of generations until fixation of a neutral mutant gene. Genet. Res,, Camb. 15, Z51-255.

NARAIN, P., 1971a. The use of transition probability matrices in studies on limits of response to selection. J. Indian S0 c.Agri.Statist. Z3, 48-60.

NARAIN, P., 1971b. Average number of generations required to at tainlimits of genetic improvement. SABRAO Newsletter. 3, 135-14Z.

NARAIN, P., 1974. The conditioned diffusion equation and its uses in popnla- tion genetics. J.Royal Statist. Soc. London B 36, 258-266.

NARAIN, P. and RO]3ERTSON, A., 1969. Limits and duration of response to

selection in finite populations: the use of transition probability matrices. Indian J.Heredity, l, 79-97.

62 Cond i t iona l M a r k o v Chain in g e n e t i c s

PURl, P .S . , 1968. A s tudy th rough a M a r k o v Chain of a popu la t ion unde rgo ing c e r t a i n mat ing s y s t e m s in the p r e s e n c e of l i n k a g e . C o n t r i b u t i o n s in S t a t i s t . a n d A g r i . Sc i ences . Vol. p r e s e n t e d to Dr . V . G . P a n s e at h is 6Znd b i r t h d a y : 1Z7-141.

ROBERTSON, A., 1960. A t h e o r y of l i m i t s in a r t i f i c i a l s e l e c t i o n . P r o c . Roy. Soc. London D 153, 234-249 .

ROBEt~TSON, A. and NARAIN, P . , 1971. S u r v i v a l of r e c e s s i v e l e t h a l s in f in i te popula t ions . T h e o r . Pop. Bio. 2, 24-50 .

WRIGHT, S., 1931. Evo lu t ion in M e n d e l i a n p o p u l a t i o n s . G e n e t i c s , 16, 97-159.