Massively Parallel Sequencing -integrating the Ion PGM™
sequencer into your forensic laboratory.
| Dr SallyAnn Harbison
March 2015
Its not all about the science- its also about the people.
What is it that we are replacing?
National laboratory with Research and Development function, the sole provider to NZ police
National DNA Profile Databank
4 STR Multiplexes, Globalfiler® kits, SGMPlus® kits , PowerplexY ® kits, Minifiler™ kits
Continuous models for DNA interpretation
34 cycle SGM Plus® kit Low Copy Number testing
mRNA profiling for body fluid identification
Laser Micro-dissection including XY FISH.
Where are we now?I think we are all agreed that:
Our existing platforms are very sensitive
They are well characterised
We know how they work and what to try when they don ’t
We have methods to interpret the data
We’re all (more or less) doing the same thing
Turn around times for samples and cases can be quic k (1-2 days)
Evidence is (generally) accepted in court.
What are we looking for?
A robust well characterised technology, supported b y the forensic community
That is compatible with the NZ DNA Profile Databank
That approaches the sensitivity of existing STRs
That delivers more information per sample
Bio-informatic pipelines to interpret the data
With a reasonably fast sample through put time less than 5 days, automation
That is cost neutral.
Our project:
A project with three objectives:
�Beginning the transformation of (routine) STR profiling to genomic technologies
�Adding additional SNP markers (e.g. ancestry informative, hair, skin and eye colour, mtDNA, serology) in a single workflow
�Non Human DNA testing- Species identification from difficult and mixed samples expanding our range of services.
Evaluate MPS technology’s ability to sequence forensic markers� Is it possible to sequence samples already
amplified with Identifiler® and PPY® kits?� Design and test a custom panel of markers
including most of the existing forensic markers.
Sequencing Identifiler ® + PPY® products
Amplify STRs with Identifiler® and PPY® kits[5x duplicates plus two mixtures]
Extract and quantify DNA from samples
Amplicons purified and quantified[Agencourt® AMPure XP® and Agilent 2100 Bioanalyzer]
Ion PGM™ workflow
Ion PGM™ WorkflowAdd Ion PGM™ adaptors and barcodes [KAPA© Ion Torrent DNA library prep kit]
Quantify (Ion Library Quantification kit) and pool libraries (26pM)
Template Preparation and enrichment
[Ion PGM™ OT2 400bp kit]
Sequence on Ion PGM™ 316 chip
Amplicons purified and checked[Agencourt® AMPure® XP and Agilent 2100 Bioanalyzer]
Bioinformatic WorkflowInitial QC was carried out on the Torrent Server
Quality assessment using FastQC and Solexa QA
Alignment programs Bowtie2 and STRait razor v1.2
FastQ quality trimmerPhred score of 25 and minimum length of 40bases
Sequence quality and alignment
Whilst 76% of the reads aligned with the STR sequences using Bowtie 2, only 1/3rd of these spanned the entire repeat region and were able to be aligned with STRait Razor.
Improving the quality/ length of the sequence reads to maximise allele calling is a key area of development
Sample 1a. FastQC quality score across all nucleotides of sequencing reads.
Establishing some starting guidelines for calling alleles
In each sample
> 50 reads must be aligned to a given locus (minimum coverage threshold)
>10 reads or a defined % of the total reads for the locus equivalent to 10 reads was required for an allele to be called
CE based stutter values
Heterozygote locus balance of 50%.
Calling Genotypes- single source samples
Some Observations
FGA Reads Call Length Sequence
19.1 24
19.2 89
19.3 1250 FGA:19.3 79 bases
TTTCTTTCTTTCTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCT
TTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC
20 1189 FGA:20 80 bases
TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTC
TTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC
In all 7 novel sequence variations found:• Four at D2S1338• 1 each at D8S1179, D3S1358, D19S433 and
D21S11
and 35 examples of allelic discrimination in our 5 people.
Uneven representation of reads across loci compared to the
electropherograms
female/
female
mixture
2:1
Allele Number of
reads
aligned
predicted
major
genotype -
MPS
predicted
minor
genotype -
MPS
RFU major
genotype
predicted
from RFU
minor
genotype
predicted
from RFU
known
major
genotype
known
minor
genotype
FGA 19.3/20 184 0,0 0,0 720 22,27 20,20 22,27 20,20
FGA 21.3/22 227 925
FGA 26.3/27 17not
detected923
total574
DNA multiplexed in one Ion Ampliseq Library preparation for Ion PGM™sequencing from each of 8 people.
Our panel amplifies 280 targets in a single reaction • Amplicons range from 100-300bp in length
• 67 STRs – including all the autosomal and Y STRs contained in the commercial kits, some X STRs and rapidly mutating Y STRs.
• 211 SNPs including: Individual identification SNPs, AIMs, Phenotype (hair, eye, skin), ABO.
• And amelogenin
A custom panel of markers including many existing f orensic markers
Primer Design and Sample Preparation
STR amplicons <= to 275bpSNP amplicons> 100base pairs
Primer panel design for candidate STRs and SNPs using the Ampliseq™ Designer software
https://amplicon.com
DNA from 8 unrelated donors processed in duplicate
Ion PGM™ workflow
Ion PGM™ WorkflowAmplify targets add adaptors and barcodes –
Ion Ampliseq™ Library Preparation Kit
Quantify (Ion Library Quantification kit) and libraries pooled (100pM)
Template Preparation and enrichment [Ion PGM™ OT2 400bp kit]
Sequence on Ion PGM™ 318 v2 chip
Amplicons purified [AMPure XP x2]
Quality Check [2100 Bioanalyzer]
Bioinformatic Workflow
Adaptor removal, barcode sorting and initial QC Torrent Server
Quality assessment using FastQC and Solexa QA
Alignment programs Bowtie2 and STRait razor v1.2
FastQ quality trimmerPhred score of 25 and minimum length of 40bases
Sequence quality and alignment
76% of the reads aligned with either an STR or SNP sequence using Bowtie 2
But only a small proportion (10%) of these spanned an STR repeat region and were able to be aligned with STRaitRazor.
Whereas 80% of these aligned with the SNP sequences
On average 92%
of SNPs were
detected, ranging
from most of
them (206) to
some of them
(121).
In amplicon order in the
custom primer panel
CPP In amplicon order in
the Identifiler multiplex
In amplicon order in the
Identifiler multiplex
� DYS438 + 5 SNPs were not detected in any sample
� DYS392 and Penta E amplified off the repeat target. The detected sequence aligns close to the repeat but does not include the repeat
� DYS389 I, DYS389 II and DYS448 – the sequenced amplicon ends in flanking region and could not be recognized by STRait Razor
� DYS385a, DYS385b and DYS458 – the sequenced amplicon did not contain the full repeat region
Some more details
Quality management, guidelines and standards
Personnel- qualifications, training and competence
Accommodation and environmental conditions
Selection and validation of methods
Control of data
Quality Control procedures
Interpretation and reporting
Guidelines and Standards
Guidelines for laboratory accreditation of MPS (NGS) NATA Technical Note 37 updated October 2014
Ethical and legal issues
Wet lab matters
Bioinformatics
Reporting
Infrastructure
These guidelines can be accessed from the following links:
http://www.nata.com.au
and
http://www.rcpa.edu.au/Library/College-Policies/Guidelines/Implementation-of-
Massively-Parallel-Sequencing
Quality management, guidelines and standards
Personnel- qualifications, training and competence
Accommodation and environmental conditions
Selection and validation of methods
Control of data
Quality Control procedures
Interpretation and reporting
People and relationships we will need:Different skill sets are needed- re-training and/or recruitment of scientists
Access to, or employment of, a Bioinformatician or 2
Access to, or employment of, a Statistician or 2
Sequencing specialists to run the sequencers and maintain currency with technology
Good relationships with suppliers
Training and awareness programmes for:
Managers, customers, the public?
Quality management, guidelines and standards
Personnel- qualifications, training and competence
Accommodation and environmental conditions
Selection and validation of methods
Control of data
Quality Control procedures
Interpretation and reporting
Lab design - work areas
DNA Extraction and Quantitation
Amplicon PCR
Library Preparation
Template Preparation
Sequence
Analyse
Emulsion PCR step
and particle recovery
Sequencing
Pre PCR
PCR of targets
Library PCR
Bioinformatics
MPS
Pre PCR
Multiplex amplification
Capillary Electrophoresis
Profile Analysis
CE
Quality management, guidelines and standards
Personnel- qualifications, training and competence
Accommodation and environmental conditions
Selection and validation of methods
Control of data
Quality Control procedures
Interpretation and reporting
Selection of methods
Performance of the sequencing chemistry- a balance between many factors
A commercially provided kit of DNA markers (eg HID Ion AmpliSeq™ Identity Panel ) or a custom-designed primer set
Select a subset of these DNA markers (eg HID-Ion Ampliseq™ Ancestry Panel ) or go for broke on every sample
DNA Extraction and Quantitation
Library Preparation
Template Preparation
Sequence
Analyse
Steps which are likely to introduce bias
PCR of amplicons
Barcoding/ adapting
Ligating adapters to PCR products, unless already included at the 5’ end of your PCR primers
Library amplification- if used
5-10 PCR cycles of adapted and barcoded amplicons
Ion Library Quantitation Kit (qPCR)
Amplification of the fragments on each ISP bead
Quality management, guidelines and standards
Personnel- qualifications, training and competence
Accommodation and environmental conditions
Selection and validation of methods
Control of data
Quality Control procedures
Interpretation and reporting
The first problem in sequencing data analysis
Can / should you keep the raw data from the instrument?
If so, how long for and how?
Do you keep the data after reads of less quality than your QC settings are trimmed?
If so, how long for and how?
Do you keep the aligned data only and discard the rest?
If so, how long for and how?
Do you keep only the allele calls and sequence?
What about incidental findings?
X millionsTarget seqAdaptor seq Adaptor seqB/C seq
Lots of different options
On platform
for example –the Torrent Server
Open source software command line driven programs, such as:
SAMtools, Burrows-Wheeler Aligner (BWA), and Genome Analysis Tool Kit (GATK), Integrative Genomics Viewer (IGV), Bowtie , Galaxy and GenomeMapper
STR alignment
STRait Razor : [Warshauer et al (FSI: Gen 7 (2013) 409–417).]
Torrent Suite v4.0.2 with the HID_STR_Genotyper (v2.0), FastQC (v3.4.1.1) and
coverage Analysis (v4.0-r77897) plugins (Thermo Fisher).. [S.L. Fordyce et al. /
FSI: Gen 14 (2015) 132–140]
Quality management, guidelines and standards
Personnel- qualifications, training and competence
Accommodation and environmental conditions
Selection and validation of methods
Control of data
Quality Control procedures
Interpretation and reporting
Controls you might consider
Extraction
Amplicon PCR
Library PCR
Template PCR
Sequencing
For each stage a positive and negative control?
= 6-8 controls
= 6-8 barcodes used
Extraction +ve and –ve
Amplicon/Library +ve and –ve
Template +ve and –ve
Appropriate (positive ) controls might be:
• NIST standards
• In house Laboratory controls
Measuring the quality and quantity of “DNA “ during the process
Quantity
Qubit® 2.0 or 3.0 (Life Technologies)
Bioanalyzer (Agilent)
Quantitative PCR- initial sample and library quantification
Quality
BioanalyzerIdentifiler and PPY
amplified products
prior to library prep
Quality management, guidelines and standards
Personnel- qualifications, training and competence
Accommodation and environmental conditions
Selection and validation of methods
Control of data
Quality Control procedures
Interpretation and reporting
Establishing some starting guidelines for calling alleles
In each sample
> 50 reads must be aligned to a given locus (minimum coverage threshold)
>10 reads or a defined % of the total reads for the locus equivalent to 10 reads was required for an allele to be called
CE based stutter values
Heterozygote locus balance of 50%.
Future Directions
Where to from here....
To implement a specialist service 2 years plus or minusTo implement a routine, all case-sample service 3-5 years
Current areas of development focusAdjusting the testing method to maximise sensitivity for forensic samples
Ensuring reproducibility and reliability
Behaviour of DNA sequences in mixed DNA samples
Software for compatibility with DNA Profiling Databank
Ethical concerns and policy issues
Collection of population data for the New Zealand populations not represented elsewhere
Staff training and development
Current areas of research focus
mtDNA sequencing:
whole mt genome sequencing
extended control region sequencing
Identical Twins
Microbial community sequencing for forensic samples, egsaliva, faeces, skin and vaginal samples
Species identification of non human sample mixtures, Traditional East Asian Medicines and food forensics.
In conclusion:MPS is recognized as a suitable technology for forensic science and is already widely used for clinical applications
MPS is within the capability of most forensic laboratories, given adequate training and resources
Community solutions provide a good entry point for laboratories without R and D options
Custom solutions can provide a more in depth assessment and are also well supported
Acknowledgements
Ryan England – for the laboratory work and most of the data analysis
New Zealand Genomics Limited- initial MPS sequencing services
Ministry of Business Innovation and Enterprise and ESR for research funding
Life Technologies, Melbourne, Australia for assistance with the custom primer panel design.
Life Technologies and its affiliates are not endorsing, recommending, or promoting any use or application of Life Technologies products presented by third parties during this seminar. Information and materials presented or provided by third parties are provided as-is and without warranty of any kind, including regarding intellectual property rights and reported results. Parties presenting images, text and material represent they have the rights to do so.
All products referenced in the presentation are, unless stated otherwise on the individual product labelling, For Research, Forensic or Paternity Use Only. Not for use in diagnostic procedures.
When used for purposes other than Human Identification the instruments and modules of the cited software are for Research Use Only. Not for use in diagnostic procedures.
AGENCOURT AMPURE XP is a registered trademark of Agencourt Bioscience and is for laboratory use only.
Speaker was provided travel and hotel support by Thermo Fisher Scientific for this presentation, but no remuneration