the drop-out/drop-in model

31 May 2012

The Likelihood Ratio model

Hinda [email protected]

Outline

I Illustration of the LR principle applied to DNAmixtures

I Two-person mixtures to explain the principle(but no general formula is given!)

I Example with and without allelic dropout

The LR model— Avila June 2013 1

DNA mixtures

I Two or more individuals contributing to the sample

I More than two peaks per locus


Why are mixtures challenging?

What genotypes created the mixture?

I 12,12/13,15

I 12,15/13,15

I 12,13/13,15

I ...


ISFG DNA commission recommendations

The likelihood ratio is the preferred approach to mixtureinterpretation. DNA commission 2005

Probabilistic approaches and likelihood ratio principles are superiorto classical methods.

DNA commission 2012


The Bayesian framework: likelihood ratios

LR =Pr(data|Hprosecution)

Pr(data|Hdefence)

I data: alleles and their peaks

I ratio of two probabilities or,ratio of two likelihoods


Interpretation

I Need for an interpretation framework that applies to all typesof samples:

• High template• Low template: PCR-related stochastic effects are exacerbated,

creating uncertainty about the composition of thecrime-sample

Reporting officers make pre-case assessments and formulate thepropositions to be evaluated within the likelihood ratio framework.


Dropout/Drop-in definitions

Allele or locus dropout is defined as a signal that is below the limitof detection threshold, it occurs when one or both alleles of aheterozygote fail to PCR-amplify.

Allele drop-in is an allele that is not associated with thecrime-sample and remains unexplained by the contributors undereither Hp or Hd.


Low/High template DNA

High template DNA

I The epg reflects the composition of the sample:

• no dropout• no drop-in

Low level DNA

I The epg does not reflect the composition of the sample:

• allele dropout• allele drop-in• stutters• ...


Part 1: High template DNA, the epg reflects thecomposition of the sample.


Two-person mixture example

I Two-person mixture


Two-person mixture example

Locus1

Evidence 9,11,12

Suspect 9,11

Victim 11,12

I Hp: Suspect + Victim contributed to the sample

I Hd : Victim + Unknown person (unrelated to the suspect)contributed to the sample


Two-person mixture: Under Hp

Locus1

Evidence 9,11,12

Suspect 9,11

Victim 11,12

Hp: Suspect + Victim contributed tothe sample

Pr(Evidence|Hp) = 1


Two-person mixture: Under Hd

Locus1

Evidence 9,11,12

Victim 11,12

Unknown ?

Hd : Unknown + Victim contributedto the sample


Two-person mixture: Under Hd

I The victim’s profile explains 11 and 12

I The unknown has to have allele 9: allele 9 is constrained

Locus1

Evidence 9,11,12

Victim 11,12

Unknown 9,119,129,9

Pr(evidence|Hd) =2p9p11 + 2p9p12 + p29


Two-person mixure: LR


I Hd : Victim + Unknown person (unrelated to the suspect)contributed to the sample

Pr(Evidence|Hp) = 1

Pr(evidence|Hd) = 2p9p11 + 2p9p12 + p29

LR =1

2p9p11 + 2p9p12 + p29


What is the underlying model?

I LR is a function of the genotypic frequencies

I Assumes independent association of the alleles within loci:Hardy Weinberg equilibrium

I Multiply between loci: Linkage equilibrium

The product rule


Summary

I Derive the possible genotypes for the unknowns

I Determine the genotypic probabilities

I Sum up the probabilities for all plausible genotypes

I Calculate the ratio of the probabilities under Hp and under Hd

You should not do this by hand!

I usually, analysis of 15 or more loci simultaneously

I calculations get complicated with two or more unknowns


What happens if there are two unknowns under Hd?


I Hd : Two Unknown individuals (unrelated to the suspect)contributed to the sample

Locus1

Evidence 9,11,12

Unknown 1 ?

Unknown 2 ?

I Have to consider all theplausible genotypiccombinations for the unknownthat explain alleles 9,11,12observed in the crime-sample.


Under Hd: two unknowns

Unknown 1 Unknown 2

9,9 11,1211,11 9,1212,12 9,119,11 9,129,11 11,129,12 11,12

Pr(Evidence|Hd) = 2(p292p11p12 + p2112p9p12 + p2122p9p11+

2p9p112p9p12 + 2p9p112p11p12 + 2p9p122p11p12)


LR: two unknowns

LR =1

2(p292p11p12 + p2112p9p12 + p2122p9p11 + 2p9p112p9p12 + 2p9p112p11p12 + 2p9p122p11p12)

I Increasing the number of unknowns increases the number ofterms under Hd


Part 2: Low template DNA, the epg does not reflectthe composition of the sample.


Likelihood ratios vs. Low template DNA

I Classical approach of the LR: the product rule

I Main source of uncertainty in previous examples: Genotypesof unknown contributors

We will now see how we can modify the classical LR approach toaccount for uncertainty in the data, due to low template DNAconditions


Uncertainty in the data: single-source example

Locus1

Evidence 11

Suspect 9,11

I Hp: Suspect contributed to the sample

I Hd: Unknown person (unrelated tothe suspect) contributed to the sample

I Classical LR: Pr(Evidence|Hp) = 0

I LR with dropout and drop-in: Pr(Evidence|Hp) 6= 0


LR with dropout and drop-in

I Main theory described by:• Haned et al, FSIG, 2012• DNA commission ISFG, FSIG 2012• Gill et al, FSI 2007• Curran et al, FSI, 2005

I Two key parameters in the model• dropout: Heterozygote, Homozygote• drop-in: not treated here

Basic model: qualitative data only, also called the drop-model.


LR with dropout and drop-in

I An allele drops out with a probability of d

I An allele does not drop out with a probability of 1− d

I Allele dropout from a heterozygote: d

I Allele dropout from a homozygote: d ′


Single-source example: Under Hp

I Hp: Suspect contributed to the sample

dropout

Allele 9 yesAllele 11 no

Pr(evidence|Hp) = Pr(dropout of 9)× Pr(non-dropout of 11)

= d × (1− d)


Single-source example: Under Hd

I Unknown contributed to the sample

Locus1

Evidence 11

Unknown ?


The Q alleles

I What are the possible genotypes for the unknown?• The dropped out alleles are gathered under a virtual alleles Q• Q is a ‘place-holder’ to all possible genotypes!• The Unknown’s genotype has to explain allele 11 (no drop-in)


Under Hd

Locus1

Evidence 11

Unknown 11,1111,Q

I Q can be anything except 11

I Unknown genotype must explain 11

I This leaves us with two possibilities:

• Homozygote: 11, 11• Heterozygote 11, Q


Q allele

• Locus L has five alleles: {9, 10, 11, 12}• p9 + p10 + p11 + p12 = 1

• pQ = 1− p11

• pQ = p9 + p10 + p12

I 11,Q can be:• 9,11• 10,11• 11,12

No need to worry about deriving all thegenotypes!

I All thee genotypes are regroupedunder 11Q with frequency: 2p11pQ


Summary

I Two possible genotypes: 11,11 and 11Q

Dropout Genotype probability11,11 (1− d ′) p21111Q (1− d)d 2p11pQ

LR =d(1− d)

(1− d ′)p211 + (1− d)d2p11pQ


LR vs. probability of dropout


Low-template mixture

I Low-template DNA mixture


Two-person mixture example: one dropout, no drop-in

Locus1

Evidence 9,10,12

Suspect 9,11

Victim 10,12

I Hp: Suspect + Victim

I Hd: Two unknowns (unrelated tosuspect/victim)


Under Hp: Dropout from the suspect

Suspect 9,11 d(1-d)

Victim 10,12 (1-d)2

Pr(Evidence|Hp) = d(1− d)3


Under Hd: dropout is possible

I Hd: two unknowns

I Dropout is possible: Q allele, can be anything except 9, 10, 12

9,9 10,12

No-dropout

10,10 9,1212,12 9,109,12 9,109,12 10,1210,12 9,10

9Q 10,12One dropout10Q 9,12

12Q 9,10


Under Hd: dropout is possible

I Hd: two unknowns

I Dropout is possible: Q allele, can be anything execept 9, 10,12

Dropout Genotype Prob.

9,9 10,12(1− d ′)(1− d)2

p29 × 2p10p1210,10 9,12 p210 × 2p9p1212,12 9,10 p212 × 2p9p109,12 9,10

(1− d)42p9p12 × 2p9p10

9,12 10,12 2p9p12 × 2p10p1210,12 9,10 2p10p12 × 2p9p109Q 10,12

d(1-d)32p9pQ × 2p10p12

10Q 9,12 2p10pQ × 2p9p1212Q 9,10 2p12pQ × 2p9p10


Likelihood ratio


LR vs. dropout probability

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

2530

d

LRLR vs. Drop−out


How about drop-in probability?

Under Hp: Dropout from the suspect

Suspect 9,11 d(1-d)

Victim 10,12 (1-d)2

I If drop-in=0 Pr(Evidence|Hp) = d(1− d)3

I If drop-in 6= 0: Pr(Evidence|Hp) = d(1− d)3 × (1− c)

I c is the probability of drop-in


Under Hd: two unknowns

I Dropout is possible, no drop-in: Q allele, can be anythingexcept 9, 10, 12

I If drop-in is possible: Q allele can be anything!

I So the genotypes of the unknown have no longer to explainalleles 9, 10, 12.

I This increases the number of terms under Hd


Think of drop-in as a scaling factor

I If an allele is a drop-in: multiply by c× frequency of allele i.

I If an allele is not a drop-in, multiply by (1− c)


LR vs. dropout and drop-in probability

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

2530

d

LRLR vs. Drop−out

drop−in=0drop−in=0.01drop−in=0.05


Summary

I Derive the possible genotypes for the unknowns

I Determine the genotypic probabilities

I Sum up the probabilities for all plausible genotypes

I Determine the corresponding dropout probabilities

I Calculate the ratio of the probabilities under Hp and under Hd


Software

I Derive genotypes of the unknowns is the key issue

I Assign genotype probability to each genotype

I The number of possibilities increases with the number ofcontributors, deriving LRs for mixtures by hand is not realistic!


Casework example 1: A 3-person mixture

I Victim is major contributor

I At least two minor contributors


Casework example 1: A 3-person mixture

I Hp: Victim + Suspect + Unknown

I Hd: Victim + two unknowns


Sensitivity analysis: Overall LR

Same dropout probability for allcontributors

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

d

log1

0 LR

0.01 0.20 0.40 0.60 0.80 0.99

Overall LR for the 10 SGM+ loci


Sensitivity analysis: Overall LR

Average probability vs. Splittingdropout/contributor =⇒ Nosignificant differences between themodels!

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

d

log1

0 LR

0.01 0.20 0.40 0.60 0.80 0.99

Basic modelSplitDrop model



Plausible ranges for PrD?

LR dropout≤ 1010 0.01 ≤ D ≤ 0.50[109, 108] 0.50 < D ≤ 0.99

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

d

log1

0 LR

0.01 0.20 0.40 0.60 0.80 0.99



Casework example 2: two-person mixture

LR dropout(1) [1010, 109] 0 ≤ D ≤ 0.50(2) [109, 106] 0.50 < D ≤ 0.76(3) [106, 104] 0.76 < D ≤ 0.84(4) [104, 1] D > 0.84

0

5

10

Probability of dropout

log1

0 LR

0.01 0.50 0.76 0.93

(1) (2) (3) (4)


Casework example 3: three-person mixture

LR dropout(1) [1014, 109] 0 ≤ D ≤ 0.08(2) [109, 106] 0.08 < D ≤ 0.53(3) [106, 104] 0.53 < D ≤ 0.75(4) [104, 100] 0.75 < D ≤ 0.86(5) [100, 1] 0.86 < D ≤ 0.93

0

5

10

15

Probability of dropout

log1

0 LR

0.08 0.53 0.75 0.86

(1) (2) (3) (4) (5)


All models are wrong...

I Continuous models are expected to extract more informationfrom the data, but their implementation is difficult andtedious in practice

I semi-continuous methods are easier to implement and canserve as a good approximation


How to inform dropout probabilities?

I Estimate dropout probabilities via logistic regression• difficult to extended to > 2-person mixtures

I Define plausible ranges of dropout

• based on expert belief• based on maximum likelihood principle

I Bayesian approach: combine prior belief and likelihood toyield a posterior distribution


the drop-out/drop-in model

Science

lr model avila

unknowns lr

lr principle

sample hd

likelihood ratios lr

underlying model

sample previdencehp

unknown person unrelated