optimusprime:,multiplex,primer,design,tool, for,hi7plex ... · peak%at%target% melting%...
TRANSCRIPT
Peak%at%target%Melting%
Temperature,60%oC
Narrow%temperature%distribution%of%
designed%primers
IntroductionThe$advent$of$Massively$Parallel$Sequencing$(MPS)$has$dramatically$reduced$the$cost$and$increased$the$throughput$of$DNA$sequencing.$A number$of$methods$have$been$developed$that$target$specified$genomic$regions$for$MPS.$However,$these,$variously,$are$compromised$by$issues$of$relative$expense,$accuracy, requirement$for$specialist$equipment$and$the$cumbersome$nature$of$protocols.$In$this$work$we$present$HiGPlex,$a$novel$MPS$platform,$and$multiplex$Polymerase$Chain$Reaction$(PCR)$primer$design$software$to$meet$the$challenges$the$platform$presents.
OptimusPrime:,Multiplex,Primer,Design,Tool,for,Hi7Plex,Sequencing
Edmund&Lau1,&Daniel&Park1,2 ,&Bernard&Pope1,3,*www.hiplex.org
What%is%HiAPlex?
HiGPlex is a Massively Parallel Sequencing (MPS) platform using a highly multiplexed PCRGbased approach for targeted MPS that employs tagged geneGspecificGprimers (GSP) to seedand universal bridge primers to drive the majority of amplification, minimising bias caused bydiffering geneGspecific primer efficiencies. Compared to traditional sequencing methods, HiGPlex has the following advantages:
A%Computational%Solution
Challenges%in%MultiplexAPCR
A%recursive%problem“The$best$choice$for$the$next$tile$is$the$choice$that$maximises$the$score$
given$the$choices$made$so$far.”
A problem with SIZE – the need for an efficient primer design tool!The number of candidate solutions easily exceeds the millions and grows exponentiallywith region size. This means a brute force computational search will not be able to findoptimal solution in reasonable time.
Search
Post7processing,&,Output
The$primer$design$problem$is$framed$as$a$computational$optimisation$problem.$We$employ$a$searchGandGscore$technique$to$explore$the$solution$space$and$output$the$best$solution.$
Dynamic%Programming%(DP)By reformulating the problem as arecursive multistage decision problem,we are able to use dynamicprogramming to explore theexponentially large search space inlinear time. With this formulation, theoutput solution is provably optimalwith respect to a given tileGbasedscoring.
A primer with Melting Temperature(Tm) close to the specified target Tmwill score higher.
We use the NearestGNeighbour DNAthermodynamic model to accuratelypredict Tm given a primersequence.
The Burrow Wheeler Alignment(BWA) algorithm is employed topredict potential binding sites for agiven primer sequence.This allows elimination of primerswith the potential to bind offGtarget.
Several different measures are implementedto avoid a chosen primer forming undesirablesecondary structures.These scores ensure sufficient complexity ofthe chosen sequence and avoid homologoussubsequences.
While exploring the solution space, the algorithm will be presentedwith individual primers and asked to evaluate its suitability as part ofthe final solution. This is done by assigning several different scores tocriteria relevant to the success of a reaction. These scores are thencombined via a sum with user configured weightings to obtained agoodness measure for any primer candidate.
Time%taken%for%computation%proportional%to%length%of%region
Melting%Temperature
Specificity
Score
Secondary%Structure
Entropy
%GC
Runs Hairpins
Simple operating protocolLibrary preparation for HiGPlex only requiresa single multiplex PCR using equipmentreadily available in standard labs. HiGPlexrequires no automation, specialist machineryor expensive reagents for streamlined highGthroughput screening.
Supports the specification of small and tightAranged insert sizeThe size selection step allows the target insertsize to be defined within a small and tight range.This increases uniformity of amplificationefficiency, provides a filter for offGtarget artefactsand opens the platform to the use of degradedDNAs, such as those derived from FFPEGtumourspecimens.
Highly AccurateHiGPlex’s PCR reaction is compatible with the highestfidelity thermostable DNA polymerases available. Asmall insert size allows the readGpair to overlapcompletely allowing the application of readGpairGoverlapGconsiderate variant calling software (e.g.UNDRGROVER) to perform variant calling with highstatistical confidence, sensitivity and accuracy.
CostAeffective%reagentsNo specialist reagents – simply unmodifiedoligonucleotides and DNA polymeraseyielding a highly costGeffective system.
What%can%go%wrong%in%multiplexAPCR?Primer$selfGinteraction$
(hairpin)PrimerGprimer$interaction(primerGdimer)
Inefficient$binding(low$melting$temperature)
What%do%we%expect?The$ideal$experimental$outcome$
of$multiplexGPCR$will$be$a$uniform$amplification$of$all$
specified$regions$with$no$other$undesired$products.$
OffGtarget$binding(low$specificity)
Every$pair$of$forward$and$reverse$gene$specific$primers$defines$an$amplicon,$or$a$“tile”.$Allowed$tile$sizes$are$
specified$by$the$experimenter.
Formulation%of%the%multiplex%primer%design%problem%The$solution$space$consists$of$all$possible$tiling$patterns.$A$candidate$solution$is$a$tiling$that$covers$the$region.$A$specified$amount$of$tile$
overlap$is$allowed.
All$candidate$solutions$are$compared.$The$tiling$that$maximises$the$chance$of$a$
successful$reaction$is$output$as$a$solution.$
Specified&Region
Amplicon&(Tile)
Primer Dimer: A Hard ProblemDetection of potential primerGdimer conflicts requires allGtoGallcomparison of primers in the candidate solution. Incorporating thiscomputation into the optimisation is known to be a computationallyhard problem. Thus, we postGprocess a given solution, looking forpotential conflicts and redesign the offending primers. The finalsolution is output in tabular form and is readily processed andvisualised by standard bioinformatics tools.AllAtoAall%comparison
ReferencesNguyenGDumont,$T.$et$al.$Anal.$Biochem.$442,$127–129$(2013).$NguyenGDumont,$T.$et$al.$Biotechniques 55,$69–74$(2013).$$NguyenGDumont,$T.$et$al.$Biotechniques 58,$33–36$(2015).$$NguyenGDumont,$T.$et$al.$BMC$Med.$Genomics$6,$48$(2013).$$NguyenGDumont,$T.$et$al.$Anal.$Biochem.$470,$48–51$(2015).$$Pope,$B.J.$et$al.$Source$Code$Biol.$Med.$9,$3$(2014).$$Park,$D.J.$et$al.$BMC$Bioinformatics$17,$1–7$(2016).
AcknowledgementThis project was supported by VLSCI and The Department of Pathology at The University of Melbourne as part of the UndergraduateResearch Opportunity Program (UROP) Biomedical Research Victoria. Further support was provided via Cancer Council Victoria grantGinGaid#1066612 and NHMRC project grant #1108179.
1.
2.
3.
While HiGPlex is tolerant to reagent design defects that would preclude the use ofconventional multiplexGPCR, it is not immune to very strong inhibitory effects.
A%test%run%– score%weightings%are%configurable%
Affiliations1$Victorian$Life$Science$and$Computation$Initiative,$The$University$of$Melbourne2$Department$of$Pathology,$The$University$of$Melbourne3$Department$of$Computing$and$Information$Systems,$The$University$of$Melbourne*&Corresponding&author&([email protected])