tools for using nist reference materials
DESCRIPTION
Tools for Using NIST Reference MaterialsTRANSCRIPT
![Page 1: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/1.jpg)
Genome in a Bottle: Tools for Using NIST Reference Materials
Next Generation Diagnostics Summit Short CourseAugust 2014
Justin Zook, Marc Salit, and the Genome in a Bottle Consortium
![Page 2: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/2.jpg)
Learning Objectives
• How can Genome in a Bottle Reference Materials help with validating NGS assays?
• Comparing your variant calls to high-confidence calls
• Tools available for understanding potential false positives and false negatives
• Examples of how labs are using our high-confidence calls
![Page 3: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/3.jpg)
NIST-hostedGenome in a Bottle Consortium
• Infrastructure for performance assessment of NGS– support science-based regulatory
oversight
• No widely accepted set of metrics to characterize the fidelity of variant calls from NGS…
• Genome in a Bottle Consortium is developing standards to address this…– human genomes as Reference Materials
(RMs)• characterize and disseminate by NIST
– tools and methods to use these RMs• common sequencing instruments• bioinformatics workflows.
http://genomeinabottle.org
![Page 4: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/4.jpg)
Whole genome sequencing technologies disagree about 100,000’s of variants
3,198,316 (80.05%)
125,574 (3.14%)
Platform #1
Platform #2
Platform #3
230,311 (5.76%)
121,440 (3.04%)
208,038 (5.21%)
71,944 (1.80%)
39,604 (0.99%)
# SNPs (% of SNPs detected
by any platform)
![Page 5: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/5.jpg)
Bioinformatics programs also disagree
O’Rawe et al. Genome Medicine 2013, 5:28
![Page 6: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/6.jpg)
Measurement ProcessSample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
• gDNA reference materials will be developed to characterize performance of a part of process– materials will be certified
for their variants against a reference sequence, with confidence estimates
gene
ric m
easu
rem
ent p
roce
ss
![Page 7: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/7.jpg)
NIST Human Genome RMs in the pipeline
• All 10 ug samples of DNA isolated from multistage large growth cell cultures– all are intended to act as stable,
homogeneous references suitable for use in regulated applications
– all genomes also available from Coriell repository
• Pilot Genome– ~8400 tubes
• Ashkenazim Jewish Trio– ~10000 son; ~2500 each parent
• Asian Trio– ~10000 son; parents not yet
planned as NIST RM
![Page 8: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/8.jpg)
8
Goals for Data to Accompany RM
• ~0 false positive AND false negative calls in confident regions
• Include as much of the genome as possible in the confident regions (i.e., don’t just take the intersection)
• Avoid bias towards any particular platform– take advantage of strengths of each platform
• Avoid bias towards any particular bioinformatics algorithms
![Page 9: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/9.jpg)
Integration Methods to Establish Reference Variant Calls
Candidate variants
Concordant variants
Find characteristics of bias
Arbitrate using evidence of bias
Confidence Level Zook et al., Nature Biotechnology, 2014.
![Page 10: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/10.jpg)
Assigning confidence to genotypes
High-confidence sites• Sequencing/bioinformatics
methods agree or we understand the biases causing disagreement
• At least some methods have no evidence of bias
• Inherited as expected
Less confident sites• In a region known to be
difficult for current technologies
• State reasons for lower confidence
• If a site is near a low confidence site, make it low confidence
![Page 11: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/11.jpg)
Reasons we exclude regions from high-confidence set
![Page 12: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/12.jpg)
12
Challenges with assessing performance
• All variant types are not equal
• All regions of the genome are not equal– Homopolymers, STRs,
duplications– Can be similar or
different in different genomes
• Labeling difficult variants as uncertain leads to higher apparent accuracy when assessing performance
• Genotypes fall in 3+ categories (not positive/negative)– standard diagnostic
accuracy measures not well posed
![Page 13: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/13.jpg)
Preliminary uses of high-confidence NIST-GIAB genotypes for NA12878
• NIST have released several versions of high-confidence genotypes for its pilot RM
• These data are presently being used for benchmarking– prior to release of RMs– SNPs & indels
• ~77% of the genome
![Page 14: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/14.jpg)
NIST Plays a Role in the First FDA Authorization for Next-Generation SequencerNovember 20, 2013
![Page 15: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/15.jpg)
Integrating NIST Call Sets into a Validation Workflow
Validation ReportFalse Positive Ratio FPR=FP/(FP+TN)
False Discovery Rate FDR=FP/(FP + TP)
Sensitivity Sens. = TP/(TP+FN)
Specificity Spec. = TN/(FP +TN)
Balanced Accuracy (Sens. + Spec.)/2
![Page 16: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/16.jpg)
16
GCAT – Interactive Performance Metrics
• NIST is working with GCAT to use our highly confident variant calls
• Assess performance of many combinations of mappers and variant callers
• Currently assesses only exome sequencing
• www.bioplanet.com/gcat
![Page 17: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/17.jpg)
GCAT Tests
![Page 18: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/18.jpg)
GCAT Variant Calling Tests
Pre-run Tests
Upload your own variant calls
![Page 19: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/19.jpg)
GCAT – Upload your own exome calls
![Page 20: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/20.jpg)
Freebayes SNP calls changed very little in 2013
http://www.bioplanet.com/gcat/reports/1933-westleouzm/variant-calls/illumina-100bp-pe-exome-150x/bwamem-freebayes-0-9-10-131226/compare-1934-akckizzzfr-1931-laqgzjytqw-1935-xwckffckoa/snp/group-quality
![Page 21: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/21.jpg)
Freebayes indel calls improved in 2013
http://www.bioplanet.com/gcat/reports/1933-westleouzm/variant-calls/illumina-100bp-pe-exome-150x/bwamem-freebayes-0-9-10-131226/compare-1934-akckizzzfr-1931-laqgzjytqw-1935-xwckffckoa/indel/group-quality
![Page 22: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/22.jpg)
Background• Clinical laboratory – Division of Genomic Diagnostics Certified by regulatory
agencies (CAP).• CWES test requires stringent validation per CAP criteria to establish performance
metrics of the test.
Utilizing NIST data in validation of CWES Test
• Sequence and call variants of NA12878 at CHOP• CHOP ROI: Agilent SureSelect V5+ (SSV5+) baits file• Compare CHOP dataset to NIST data set for concordance
NIST Data Set Details:*High quality reference data set on NA12878 (Dec. 2013)*NIST’s highly confident Region of Interests (ROI) *Variants called in 219,222 regions on hg19 assembly
*: National Institute of Standards and Technology
Analytical Validation of Clinical Whole-Exome Sequencing (CWES) Test
![Page 23: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/23.jpg)
SENSITIVITY /SPECIFICITY RefGene +/- 15bp (SSV5+)
CHOP NIST
TPSNVs: 18480 INDELs: 396
FPSNVs: 26INDELs: 3
FNSNVs: 63INDELs: 30
FP: False PositiveTP: True PositiveFN: False NegativeTN: True Negative
SNVs INDELsSensitivity (TP/TP+FN) 99.66% 92.96%Specificity (TN/TN+FP) ~100% ~100%FDR (FP/FP+TN) 0.02% 0.08%Accuracy (TP+TN/TP+TN+FP+FN) ~100% ~100%
TN = NIST highly confident regions – CHOP ROIs
![Page 24: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/24.jpg)
Further analysis on presumptive 93 FNs and 29 FPs
63 SNVs 30 INDELs
93 FNs
29 FPs
26 SNVs 3 INDELs
![Page 25: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/25.jpg)
Using the GeT-RM Browser• http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/• Allows visualization of questionable calls
![Page 26: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/26.jpg)
GeT-RM Load alignments for visualization
![Page 27: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/27.jpg)
Chr6:151669820 Chr6:151669828
Difficult site in homopolymer in intron of gene AKAP12
![Page 28: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/28.jpg)
Chr1:1666303
SNP in Gene SLC35E2, which is also in a pseudogene and a segmental duplication
![Page 29: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/29.jpg)
SegmentalDuplication
Pseudo-gene
StructuralVariant
![Page 30: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/30.jpg)
Feedback from MoCha lab in NCI • We built a targeted amplicons NGS assay for
detecting mutations in clinical tumor specimens• To assess the assay’s specificity, we compared 84
runs of CEPH NA12878 data from our assay with NIST’s consensus variant list (VCF v2.15)
• We observed a high overall concordance with a few FP variants in homopolymeric regions unique in our platform
• We concluded that NIST GIAB is a useful reference standard to evaluate assay specificity
![Page 31: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/31.jpg)
Using Genome in a Bottle calls to benchmark clinical exome sequencing
at Mount Sinai School of Medicine
“We evaluate a set of NA12878 technical replicates against GIAB for each new pipeline version.”
![Page 32: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/32.jpg)
Benchmarking somatic variant callingat Qiagen
![Page 33: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/33.jpg)
HSPH – Brad Chapman Comparing variant callers
http://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection-methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/
![Page 34: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/34.jpg)
NextSeq: New Chemistry – Does it work?
Whole Genome Metrics NextSeq500 HiSeq2500% Genome Covered (>= 10X in Q20 bases) 96% 96%
Mean Coverage in Q20 Bases 28.3X 31.8X
SNPs Called (% dbSNP 129) 3,643,998 (89%) 3,664,014 (88%)
InDels Called (% dbSNP 129) 646,907 (65.7%) 686,547 (64.5%)
Genome in a Bottle SNP Sensitivity & Precision 99.07% | 99.04% 99.25% | 99.90%
Genome in a Bottle Indel Sensitivity & Precision 86.90% | 98.85% 93.29% | 97.54%
![Page 35: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/35.jpg)
Ion Benchmarking I
![Page 36: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/36.jpg)
Ion Benchmarking II
![Page 37: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/37.jpg)
Command-line tools for variant benchmarking
• USeq VCFComparator– http://sourceforge.net/projects/useq/
• RTG vcfeval– ftp://ftp-trace.ncbi.nih.gov/giab/ftp/tools/RTG/
• bcbio.variation– http://bcbio.wordpress.com/2013/05/06/framework-
for-evaluating-variant-detection-methods-comparison-of-aligners-and-callers/
• SMaSH– http://smash.cs.berkeley.edu/
![Page 38: Tools for Using NIST Reference Materials](https://reader035.vdocuments.site/reader035/viewer/2022062216/558a13fcd8b42ab1588b468f/html5/thumbnails/38.jpg)
How Can I Get Involved?• Use our integrated SNP/indel
genotypes for NA12878 and give us feedback– Cells and DNA currently available from
Coriell– NIST RM available late 2014
• Sequencing/analyzing the new Genome in a Bottle samples
• Help with Structural Variant calls• Help with analyzing data from long-
read technologies• Attend our biannual workshops
(January in CA, August in MD)• Help develop methods to measure
performance using our well-characterized genomes
http://genomeinabottle.org
Email: Justin Zook - [email protected] Salit – [email protected]
Slides on slideshare at:http://www.slideshare.net/GenomeInABottle