identification of an individual in complex forensic...
TRANSCRIPT
Identification of an Individual in
Complex Forensic Mixtures Using
SNP Microarrays
Sheri Ayers, MS Research Associate
Cellmark Forensics
Dallas, TX
NIJ Award# 2011-DN-BX-K555
Introduction
Introduction
» DNA mixtures of multiple individuals
• Evidence touched by several people
• Sexual assaults with multiple perpetrators
• Mass disasters with a large number of victims
» Drawbacks of STRs:
• Limited usefulness for interpreting mixtures
• Subjective - Low level peaks – stutter, alleles, artifacts, etc.
• Lack of statistical power
Low-level mixtures
Complex mixtures (≥3 contributors)
SNP Markers
» Single nucleotide polymorphisms • Bi-allelic markers (two possible alleles at each locus)
• More markers needed for same power of discrimination
» Current SNP methods • Test >100,000 SNP loci
• Minor allele frequency (MAF) ~0.5
» Drawbacks of current methods: • Large DNA input
• High per-sample cost
• Does not allow exclusion of relatives
• Requires accurate reference population data
• Privacy concerns
http://en.wikipedia.org/wiki/File:Dna-SNP.svg
Proposed Method
Proposed Method
» Each individual has a unique set of minor alleles • If these minor alleles are present in a mixture, it
can be concluded that the individual is a contributor
» Proposed method • Genotype a few thousand SNPs
• Choose loci with low MAF (0.05-0.1)
• High power of discrimination
• More cost-effective
Proposed Method
Voskoboinik L, Darvasi A. Forensic Sci Int Genet 2011;5:428-435.
A few SNP loci have a large effect on power of discrimination
Optimal MAF depends on number of SNPs and contributors (0.05-0.1)
» Number of Loci » Optimal MAF
Proposed Method
Too many shared alleles Not enough statistical power
Some individuals are not represented Not enough alleles to identify an individual
Low MAF (0.05-0.1)
High MAF
Very low MAF
Little or no overlap between individuals Enough minor alleles to identify an individual
Proposed Method
Voskoboinik L, Darvasi A. Forensic Sci Int Genet 2011;5:428-435.
Method is robust even in the presence of: 1) Genotyping errors
2) Population admixture
3) Relatives
Proposed Method
» Collaboration of Cellmark Forensics with Dr. Ariel Darvasi and Lev Voskoboinik (Hebrew University of Jerusalem)
» Proof of concept • Pilot study with high DNA input
• Limited by available technology
» Testing the theory • 3000 SNPs
• MAF ~0.05-0.1
» Testing the technology and chemistry • Sensitivity to low DNA contributors?
• Performance of whole genome amplification?
Materials & Methods
HumanCytoSNP-12
» HumanCytoSNP-12
• Illumina® Infinium® HD array
• 300,000 SNPs
• 200ng DNA input
» SNP selection
• MAF 0.02-0.15
• 3,000 loci chosen for analysis
• Loci with best performance
http://www.illumina.com/images/products/WG_311_1220_th.jpg
HumanCytoSNP-12
http://www.illumina.com/documents/products/datasheets/datasheet_infiniumhd.pdf
1)
2)
3)
4)
5)
6)
» Mixtures 1-7 • 3-6 individuals
• 1% - 66% contribution
• Caucasian Americans or Ashkenazi Jews
» Mixture 8 • 10 individuals
• 10% contribution
• Caucasian Americans
Sample Preparation
% Contribution per individual
1 2 3 4 5 6 7 8 9 10
1 50% 30% 20%
2 30% 25% 20% 15% 10%
3 30% 20% 20% 15% 10% 5%
Mixture 4 35% 35% 15% 5% 5% 5%
5 65% 15% 15% 2% 2% 1%
6 63% 10% 10% 10% 5% 2%
7 66% 10% 10% 5% 5% 2% 2%
8 10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
Sample Preparation
» Sample Preparation • Pristine samples
• Inhibitors/Degradation (mixture 1 and 7 only)
- Hematin
- Humic acid
- Indigo dye
- Degradation
- Control
• Whole genome amplification - Qiagen REPLI-g® Mini kit
- 25ng, 5ng, 1ng DNA
» SNP Genotyping • Mixtures typed in
duplicate
• DNA input - 200ng unamplified DNA
- 200ng after whole genome amplified DNA (if available)
Analysis
» RMNE and LR calculations on all mixtures
» Random man not excluded (RMNE)
• What is the probability that someone randomly picked from the general population will be excluded as a contributor?
• Based on allele frequency at each SNP locus
• Low RMNE: support for suspect being in the mixture
Analysis
Voskoboinik L, Darvasi A. Forensic Sci Int Genet 2011;5:428-435.
» Likelihood ratio • Compares the probability of the evidence under two
different hypotheses
• A mixture of “n” individuals: - Prosecutor hypothesis (HP): the suspect and “n-1” other
individuals contributed to the mixture
- Defense hypothesis (HD): “n” other individuals contributed to the mixture
• LR = HP/HD - LR <1: support for suspect not being in mixture
- LR >1: support for suspect being in mixture
- Stronger support as LR increases
Analysis
» Likelihood ratio
• Estimate the number of contributors
Voskoboinik L, Darvasi A. Forensic Sci Int Genet 2011;5:428-435.
91%
99%
Results
Whole Genome Amplification
Dropout increases as DNA input decreases
Likelihood Ratio
267 non-contributors chosen from
HapMap database (CEU, CHB, JPT, YRI)
Likelihood ratio of contributors
increases when:
Input DNA increases
The proportion of the contributor
increases
Likelihood Ratio
267 non-contributors chosen from
HapMap database (CEU, CHB, JPT, YRI)
When is LR>>1?
1ng DNA or higher
Individuals contributing 15% 150pg
5ng DNA or higher
Individuals contributing 5% 250pg
25ng DNA or higher
Individuals contributing 2% 500pg
Non-contributors:
Less than 0.1% of non-contributors
had LR>1
25ng
≥ 2%
% Contribution per individual
1 2 3 4 5 6 7 8 9 10
1 50% 30% 20%
2 30% 25% 20% 15% 10%
3 30% 20% 20% 15% 10% 5%
Mixture 4 35% 35% 15% 5% 5% 5%
5 65% 15% 15% 2% 2% 1%
6 63% 10% 10% 10% 5% 2%
7 66% 10% 10% 5% 5% 2% 2%
8 10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
1 50% 30% 20%
2 30% 25% 20% 15% 10%
3 30% 20% 20% 15% 10% 5%
Mixture 4 35% 35% 15% 5% 5% 5%
5 65% 15% 15% 2% 2% 1%
6 63% 10% 10% 10% 5% 2%
7 66% 10% 10% 5% 5% 2% 2%
8 10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
1 50% 30% 20%
2 30% 25% 20% 15% 10%
3 30% 20% 20% 15% 10% 5%
Mixture 4 35% 35% 15% 5% 5% 5%
5 65% 15% 15% 2% 2% 1%
6 63% 10% 10% 10% 5% 2%
7 66% 10% 10% 5% 5% 2% 2%
8 10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
Likelihood Ratio
5ng
≥ 5%
1ng
≥ 15%
DNA contribution per individual (pg)
1 2 3 4 5 6 7 8 9 10
1 12500 7500 5000
2 7500 6250 5000 3750 2500
3 7500 5000 5000 3750 2500 1250
Mixture 4 8750 8750 3750 1250 1250 1250
5 16250 3750 3750 500 500 250
6 15750 2500 2500 2500 1250 500
7 16500 2500 2500 1250 1250 500 500
8 2500 2500 2500 2500 2500 2500 2500 2500 2500 2500
1 2500 1500 1000
2 1500 1250 1000 750 500
3 1500 1000 1000 750 500 250
Mixture 4 1750 1750 750 250 250 250
5 3250 750 750 100 100 50
6 3150 500 500 500 250 100
7 3300 500 500 250 250 100 100
8 500 500 500 500 500 500 500 500 500 500
1 500 300 200
2 300 250 200 150 100
3 300 200 200 150 100 50
Mixture 4 350 350 150 50 50 50
5 650 150 150 20 20 10
6 630 100 100 100 50 20
7 660 100 100 50 50 20 20
8 100 100 100 100 100 100 100 100 100 100
Likelihood Ratio
25ng
≥ 500pg
5ng
≥ 250pg
1ng
≥ 150pg
Likelihood Ratio
10 person mixture
10% contribution each
Inhibition and Degradation
control > indigo dye > hematin > degradation > humic acid
Random Man Not Excluded
267 non-contributors chosen from
HapMap database (CEU, CHB, JPT, YRI)
Non-contributors:
All excluded
As DNA input increases, fewer contributors are excluded
Not excluded:
5ng DNA or higher 250pg
5% contributors not excluded
1ng DNA or higher 150pg
15% contributors not excluded
Related Pairs
» Question: • Will a non-contributing individual be wrongly included if a relative
is in the mixture?
» Data set includes 2 parent-offspring pairs • Preliminary data
» How to address this? • Adjust LR calculations
• Allow less drop-out
» Result? • A higher proportion of the true contributor is required to identify
his presence
• Non-contributing relatives won’t be included
• Minor contributors can still be identified
Conclusions
Conclusions
» LR and RMNE
• Strong support for presence of contributors
- Low-level contributors
- Multiple contributors
- Number of contributors does not affect accuracy
• Exclusion of non-contributors
- Rarely have LR>1
- Excluded using RMNE
Conclusions
» HumanCytoSNP-12
• Practical applicability of off-the-shelf SNP
microarrays
• Technology meets accuracy and sensitivity
requirements of forensics
• Whole genome amplification is compatible
with HumanCytoSNP-12
Conclusions
Voskoboinik L, Darvasi A. Forensic Sci Int Genet 2011;5:428-435.
Method is robust even in the presence of: 1) Genotyping errors YES 2) Population admixture YES 3) Relatives YES
Future Research
» Focused array with a few thousand SNPs • Illumina® Infinium® custom array
• Benefits - Reduced DNA input for forensic needs
- Reduced cost
- Increased accuracy
• Whole genome amplification when necessary - Qiagen REPLI-g® FFPE kit
• Addition of SNPs for ancestry or appearance
» Next generation sequencing • Preliminary data on Illumina® MiSeq platform
• Alternative option for SNP genotyping
Acknowledgments
Cellmark Forensics • Dr. Aaron LeFebvre
• Cynthia Smitherman
Hebrew University of Jerusalem
• Dr. Ariel Darvasi
• Lev Voskoboinik
Baylor Institute for Immunology Research
• Esperanza Anguiano
University of Texas Southwestern Medical Center Genomics & Microarray Facility
• Dr. Quan Li
• Tao Chen
• Bo Zhang
National Institute of Justice • Funding provided by NIJ
award# 2011-DN-BX-K555
Questions?
Calling threshold chosen to 1) Minimize drop-out and drop-in 2) Maximize LR of contributors 3) Minimize LR of non-contributors
Conclusions
» Short term • Commercially available SNP arrays
• Large number of SNPs to increase statistical power
• Whole genome amplification when necessary
Conclusions
» Short term • Commercially available SNP arrays
• Large number of SNPs to increase statistical power
• Whole genome amplification when necessary
» Long term • Focused array with a few thousand SNPs
- Reduced DNA input for forensic needs
- Reduced cost
- Increased accuracy
• Whole genome amplification when necessary - Qiagen REPLI-g® FFPE kit
• Addition of SNPs for ancestry or appearance
Analysis
» Related pairs • Can this method distinguish between contributors
and close relatives?
• Problem 1: - The suspect is not a real contributor to the mixture
- The suspect has a high LR because his relative is a contributor
• Problem 2: - The suspect is a real contributor
- The suspect claims he has a high LR because his relative is a contributor
• Data set includes 2 parent-child pairs
Proposed Method
Too many minor alleles in the mixture Too many shared alleles Not enough statistical power
Too few minor alleles in the mixture Some individuals are not represented Not enough alleles to identify an individual
Low MAF (0.05-0.1)
High MAF
Very low MAF