lecture 13: linkage analysis vi date: 10/08/02 complex models pedigrees elston-stewart algorithm ...

55
Lecture 13: Linkage Analysis VI Date: 10/08/02 Complex models Pedigrees Elston-Stewart Algorithm Lander-Green Algorithm

Upload: shanna-morton

Post on 14-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Lecture 13: Linkage Analysis VI

Date: 10/08/02 Complex models Pedigrees Elston-Stewart Algorithm Lander-Green Algorithm

Page 2: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Complex Linkage Models

The simplest linkage models involve only pairwise recombination fractions ij or adjacent map distances mi,i+1 and map function parameters.

Such models are insufficient to describe many real-life data scenarios.

Page 3: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

For Example

Incomplete penetrance. Differential penetrance.

Genetic imprinting. No available controlled and repeated crosses.

Page 4: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Inference on Pedigrees

Pedigrees are extended families sampled from a natural population. They are used when one cannot set up repeated and controlled crosses.

Unknown phenotypes. Unknown genotypes. Founders.

Page 5: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Ordered vs. Unordered Genotype

An unordered genotype does not include phase information nor parental source of alleles.

An ordered genotype includes phase information and parental source of alleles.

Unordered Genotype Ordered Genotype(s)

A1A2B1B1 A1B1/A2B1

A2B1/A1B1

Page 6: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Penetrance Parameters

A penetrance parameter is introduced in the model to explain the relationship between genotype and phenotype.

We code the phenotype as a random vector of discrete or continuous variables, e.g. X=(X1, X2, ..., Xm).

The phenotype Xi of an individual i is conditionally independent of all other family members given his/her genotype and other characteristics (sex, age, etc). iiinnni CGXCCXXGGX ,P,,,,,,,,P 111

Page 7: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Penetrance Parameters - Assumptions

We assume individual i’s phenotype is a single number (discrete or continuous) conditionally independent of all other genotypes and loci, once we condition on the genotype at a particular locus. i.e. we assume one phenotypic variable per locus.

This assumption forces us to ignore multilocus phenotypes and pleiotropic loci.

Page 8: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Conditional Likelihood of Observed Phenotypes

The conditional independence implies that the likelihood of particular phenotypes observed on a pedigree, conditional on the observed genotypes, is simply a product.

iijij

l

j

n

i

iii

n

i

CGX

CGXCGX

,P

,P,P

11

1

Page 9: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Penetrance Parameters: Simple Dominant Disease

Dominant Disease (A1 > A2)

Ordered Genotype

P(Xi | Gi, Ci) = P(Xi | Gi)

A1A1 1

A1A2 1

A2A1 1

A2A2 0

Page 10: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Penetrance Parameters: Dominant Disease with C

Dominant Disease (A1 > A2) but Sex-Dependent

Ordered Genotype

P(Xi | Gi, male) P(Xi | Gi, female)

A1A1 1 0

A1A2 1 0

A2A1 1 0

A2A2 0 0

Page 11: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Liability Classes

Classes of individuals who differ in penetrance parameters are called liability classes.

In one of the examples above males and females form two different liability classes.

Page 12: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Incomplete Penetrance with Liability Classes

Suppose that a dominant disease affects individuals under 30 with probability a and individuals above 30 with probability b.

Class AA Aa aa

<30 years

>=30 years

Page 13: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Penetrance Parameters: Phenocopies

Dominant Disease (A1 > A2) with Phenocopy Rate pr

Ordered Genotype P(Xi | Gi)

A1A1 1

A1A2 1

A2A1 1

A2A2 pr

Page 14: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Dealing with Penetrance and Phenocopies

Biological solution. Identify features that differentiate genetic and non-genetic forms of the phenotype. Then, the phenotype can be recoded as fully-penetrant with no phenocopies.

Approximation. Estimate genotype-specific risk from segregation ratios observed in a family, then set penetrance to the estimates.

Page 15: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example

Genotype Expected Frequency

Observed Frequency

AA 0.5 0.75

Aa 0.5 0.25

50% of Aa are phenocopies of AA. Or there is only50% penetrance of the a allele.

Page 16: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Penetrance Parameters – More Assumptions

Unless a phenotype is affected by genomic imprinting, we usually assume that different ordered genotypes with the same alleles have the same phenotype.

Genomic imprinting means that the parental origin of the allele affects its expression. For example, a gene may only express if it came from your mother.

Page 17: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Genetic Imprinting in Humans?

Prader-Willi syndrome causes morbid obesity in humans. The disease loci are found on chromosome 15 and working copies must be transmitted from father.

Angelman Syndrome causes development problems including speech impairment and balance disorder. It is caused by a piece of chromosome 15 that is normally activated only on the maternal chromosome.

Page 18: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Problem: Ordered Genotypes are not Observed

Pedigrees almost invariably include missing data, members who have no known genotype.

In addition, there will always be many members for which phase and paternal origin cannot be determined.

In essence, G is not actually observed.

g

gGgGXX PPP

Page 19: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Transmission Parameters

The genotypes in a pedigree are related through genetic inheritance.

Conditional on the parental genotypes, the offspring genotypes are independent of all other members in the pedigree.

Transmission parameters are those parameters which determine the transmission of genes: the recombination fractions.

Page 20: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Independence of Transmission Probabilities

Let Gk be the genotype of offspring k. Let GkM be the allele transmitted by the offspring’s mother and GkP be the allele transmitted by the father. Then,

pkPmkMpmk GGGGGGG PP,P

Page 21: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Maternal Transmission: Generate Haplotype

M P

1mMG

2mMG

lmMG l

mPG

2mPG

1mPG

1

-13mMG 3

mPG 1

1

lmPmPmM GGGZ ,,, 21

l ,,1

otherwise1

and 1 locibetween ion recombinat1 iii

1mMG

2mPG3mPG

lmPG

Z

Page 22: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Maternal Transmission: Transmit Haplotype

l

i ii

ii

ZGmkM r

rGZZG

iikM

1 1 if1

1 if1PP

ZmmkM

ZmkMmkM

GZGZG

GZGGG

P,

,PP

Page 23: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Population Parameters

What about the pedigree members that have no parents? There are no parental genotypes on which to condition.

The distribution of genotypes in these individuals are determined by the so-called population parameters.

In the worst case, this would require (m1m2...ml)2-1 independent parameters, where mi is the number of alleles at locus i.

Page 24: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Population Parameters - Assumptions

Assume Hardy-Weinberg equilibrium (random union of haplotypes) so that the genotype frequencies are determined by the haplotype frequencies. Then there are (m1m2...ml)-1 independent parameters.

Assume linkage equilibrium (random union of alleles at multiple loci into haplotypes). Then there are m1 + m2 + ... + ml – l independent allele frequencies.

Page 25: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Overall Genotype Probabilities

mnmnnpfpfff GGGGGGGGG ,,,1,111 ,P,PPPP

1 1 1 1,, ,transpoppen

PPP

G G

n

i

f

i

n

fipimiiiii

g

n

GGGGGX

gGgGXX

Page 26: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Computation

There are (m1m2...ml)2n terms in the summation.

There are 2n probabilities in each product. Thus, there are (m1m2...ml)2n(2n-1) multiplications

and (m1m2...ml)2n-1 additions.

The calculation grows exponentially in number of loci l and number of individuals n.

Page 27: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Elston-Stewart Algorithm

Algorithm is similar to computation for Hidden Markov Models based on Forward-Backward algorithm. The hidden states are the genotypes.

One must classify people as falling ahead of or behind other people, i.e. we need a linear arrangement of people in the pedigree.

Page 28: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Ordering People in a Pedigree

k

Page 29: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Forward/Backwards Probabilities

kikikk GXG ,P

0P if0

0P ifP

k

kkki

ikk

G

GGXG

G1G2 Gk

X1 X2 Xk

...Gk+1

Xk+1

...

Page 30: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Total Probability

kG

kkkkn GGXX ,,P 1

Page 31: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Calculating Forward Probability

fkGGXG kkkkk ,PP

siblings

,

,PP

,PP

s Gpmsssss

GGppmmpmkkkkk

s

pm

GGGGGX

GGGGGGXG

Page 32: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Calculating Backward Probabilities

leaf is if 1 kGkk

children

,PP

PP

c Gskccccc

Gsssskkkk

C

s

GGGGXG

GGXGXG

Page 33: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 34: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Using 5 as Proband

5

555591 ,,PX

GGXX

Page 35: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example – Calculations Needed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 36: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example – Calculations Needed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 37: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example – Calculations Needed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 38: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example – Calculations Needed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 39: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Forward Probabilities: Founders

2

6

23

22

21

a

a

a

A

paa

paa

paa

pAA

Page 40: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Backward Probabilities: Leaves

1

1

1

9

8

7

Aa

Aa

aa

Page 41: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Examples – Calculations Completed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 42: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Backward Probability 4

children

,PP

PP

cskccccc

Gsssskkkk

GGGGXG

GGXGXGs

4

2

111

2

11111

,P1P,P0P0P1P

2

2

8734

a

a

p

p

aaAaAaAaAaaaAaaaaaaaaaaaAaAa

1 means affected0 means not affected

Page 43: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example – Calculations Completed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 44: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Forward Probability 5

4

14

111

,P1P,P1P

24

222

4215

Aa

aAa

pp

ppp

aaAAAaAaAaaaAAaaAAAaAaAa

Page 45: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example – Calculations Completed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 46: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Backward Probability 5

2

2

11111

,P1P0P1P

2

2

965

a

a

p

p

aaAaAaAaAaaaaaAaAa

Page 47: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example – Calculations Completed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Page 48: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Example – Final Calculation

8

24

,,P

26

224

55

555591

5

Aa

aAa

X

pp

ppp

AaAa

XXXX

Page 49: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Efficiency of the Elston-Stewart Algorithm

In our example, each genotype was defined without ambiguity. There were no sums over genotypes.

In general, this is not true and the forward and backward probabilities must sum over the possible parental genotypes or spousal genotypes respectively.

The ES algorithm calculations increase exponentially with respect to the number of genotypes.

Fortunately, the ES algorithm calculations only increase linearly in the number of pedigree members.

Page 50: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Lander-Green Algorithm

View the pedigree as a Hidden Markov model on haplotypes.

Pattern of inheritance at a single locus is described by v a 2(n – f)-long vector of 0’s and 1’s indicating if allele is paternal (0) or maternal (1) in origin.

There are 22(n-f) such inheritance vectors possible.

Page 51: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Inheritance Vector v

4

AA aa

aA aA aaaa

Aa

1 2

3 5 6

7 89

Aaaa

Gamete v

4M 0|1

4P 0|1

5M 0|1

5P 0|1

7M 1

7P 0|1

8M 0

8P 0|1

9M 0

9P 0|1

Page 52: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Conditional Probability

G

vGGXvX PPP

Prior to viewing the data, all inheritance vectors are equally likely.

11

PP

Q

vXX

t

ii

Page 53: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Multiple Loci

Suppose there are l loci. Then, the joint probability can be factored 12112312121 ,,,P,PPP,,,P XXXXXXXXXXXXX lll

But, conditional on the vi, Xi is independent of all Xj with j<i.

iiiii vXXXXvX P,,,,P 121

Page 54: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Multiple Loci (cont)

And, conditional on the inheritance vectors of preceding loci, the inheritance vector at locus i is independent of all but the immediately preceding inheritance vector.

jfn

ij

i

iiii vvvvvv

)(2

11

1121

1

P,,,P

Page 55: Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm

Multiple Loci (cont)

11

PPP

PPPPP,,P

1211

1

221121111

1 2

llt

vlllll

v vl

QTQTQ

vvXvv

vvXvvvvXXX

l