darwin-wga: a co-processor provides increased sensitivity ... › ~gsneha › docs ›...

1
Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup Yatish Turakhia 1 , Sneha D. Goenka, Gill Bejerano 1 , William J. Dally 1,2 1 Stanford University, 2 NVIDIA Research Results Ordered local alignments in both species chained together using AXTCHAIN [3]. Hardware Architecture (Step 2 and 3) PEs compute the cell scores and global maximum in 1 cycle in a systolic fashion – diagonal wavefront, in the region with score > (max - y_drop) Direction pointers stored in BRAM sequentially, start/stop position pointers in another BRAM This novel architecture considering Y-drop uses small constant memory to align arbitrarily long sequences Banded Smith-Waterman does not use traceback memory Darwin – WGA Algorithm Overview Step 1 – Seeding and D-SOFT Step 3 – GACT-X extension Reference Query Reference bin and Query chunk determines diagonal band Each seed hit falls in a single diagonal band At most 1 seed hit per diagonal band is extended Anchor is extended in the left and the right directions Extension occurs in tiled fashion (like GACT) Traceback pointer is used or discarded depending on the overlap region Each new tile starts from the position the previous traceback completed or overlap boundary The seed is extended using banded Smith-Waterman If the maximum score is greater than a threshold, the maximum score position is the anchor or the starting point for alignment extension using GACT-X (see below) Step 2 – Gapped extension filtering genomic segment in the common ancestor without rearrangement Whole Genome Alignment (WGA) WGA is the computational process of finding pairwise local alignments in the entire genomes of two or more species. It helps: Motivation for Acceleration Several projects look at sequencing and characterizing the whole of life – Earth BioGenome, Earth Microbiome, B10K, G10K Thousands of species will be sequenced over the next decade => millions of pairwise WGA Overwhelming computational cost - single human-mouse WGA requires ~1,000 CPU hours Motivation for Gapped filtering Current art software whole genome aligners do not capture several biologically-meaningful, high-scoring alignment regions [1] All software aligners are based on the popular seed-and-extend paradigm (motivated by BLAST) Distantly related species have more frequent indels, which increase the need for gapped extension step which slows down the current software WGA by 100x 1. infer the evolutionary relationship between species 2. identify and predict functional elements, such as genes and regulatory sequences 1. Seeding – find a pattern of matches between sequences 2. Ungapped filtering 3. Gapped extension through dynamic programming Motivation for New Algorithms Whole genome alignments contain several long gaps - current constant memory hardware acceleration algorithms (GACT) [2] require very large on-chip memory (~4-10MB) for optimal results Co-design hardware with algorithm to improve sensitivity at high speed and small constant on-chip memory requirement (1MB) Architectural Setup FPGA Architecture: Xilinx Virtex UltraScale+ Banded Smith-Waterman: 16PEs/array, 48 arrays GACT-X: 16 PEs/Array, 1MB traceback memory per array, 8 arrays D-SOFT implemented in software Performance 20X speedup comparison to single-thread LASTZ 2000X speedup as compared to gapped filtering software [1]Frith et al. "Improved search heuristics find 20 000 new alignments between human and mouse genomes" NAR ‘14 [2]Turakhia et al. "Darwin: A Genomics Co-processor Provides upto 15000X Acceleration on Long Read Assembly" ASPLOS ’18 [3]Kent et al. "Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes" PNAS ‘03 0 1 2 3 4 5 6 7 8 9 dm6-droSim1 dm6-droYak2 dm6-dp4 ce11-cb4 % increase in top-5 chain scores LASTZ with 1 transition Darwin-WGA without transition Darwin-WGA with 1 transition (Baseline: LASTZ without transition) Darwin-WGA provides increased sensitivity especially for distantly related species Sensitivity Analysis Darwin-WGA discovers alignments in difficult to align exons False Positive Rate Analysis To measure false positive rate, we simulated a random genome with same chromosome sizes and nucleotide composition as cb4 and aligned and chained to the ce11 genome False positive rate is measured as G_f/G_t, where G_f is the number of matching base-pairs in chain generated on the simulated random genome and G_t is the equivalent for the cb4 genome We find G_f = 0bp and G_t = 140Mbp, implying a negligibly small false positive rate With the gapped threshold lowered to 3000 (same as LASTZ), the false positive rate goes upto 0.1%

Upload: others

Post on 23-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Darwin-WGA: A Co-processor Provides Increased Sensitivity ... › ~gsneha › docs › Darwin-WGA_ISMB2018.pdf · Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole

Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup

Yatish Turakhia1, Sneha D. Goenka, Gill Bejerano1, William J. Dally1,2

1 Stanford University, 2 NVIDIA Research

Results

Ordered local alignments in both species chained together using AXTCHAIN [3].

Hardware Architecture (Step 2 and 3)

• PEs compute the cell scores and global maximum in 1 cycle in a systolic fashion –

diagonal wavefront, in the region with score > (max - y_drop)

• Direction pointers stored in BRAM sequentially, start/stop position pointers in

another BRAM

• This novel architecture considering Y-drop uses small constant memory to align

arbitrarily long sequences

• Banded Smith-Waterman does not use traceback memory

Darwin – WGA Algorithm OverviewStep 1 – Seeding and D-SOFT

Step 3 – GACT-X extension

Reference

Qu

ery

• Reference bin and Query chunk

determines diagonal band

• Each seed hit falls in a single diagonal

band

• At most 1 seed hit per diagonal band is

extended

• Anchor is extended in the left and the

right directions

• Extension occurs in tiled fashion (like

GACT)

• Traceback pointer is used or discarded

depending on the overlap region

• Each new tile starts from the position

the previous traceback completed or

overlap boundary

• The seed is extended using banded

Smith-Waterman

• If the maximum score is greater than a

threshold, the maximum score position

is the anchor or the starting point for

alignment extension using GACT-X (see

below)

Step 2 – Gapped extension filtering

genomic segment in the common ancestor without rearrangement

Whole Genome Alignment (WGA)

WGA is the computational process of finding pairwise localalignments in the entire genomes of two or more species. It helps:

Motivation for Acceleration

• Several projects look at sequencing and characterizing the whole

of life – Earth BioGenome, Earth Microbiome, B10K, G10K

• Thousands of species will be sequenced over the next decade =>

millions of pairwise WGA

• Overwhelming computational cost - single human-mouse WGA

requires ~1,000 CPU hours

Motivation for Gapped filtering

• Current art software whole genome aligners do not capture

several biologically-meaningful, high-scoring alignment regions [1]

• All software aligners are based on the popular seed-and-extendparadigm (motivated by BLAST)

• Distantly related species have more frequent indels, which

increase the need for gapped extension step which slows down the

current software WGA by 100x

1. infer the evolutionary relationship between species

2. identify and predict functional elements, such as genes and

regulatory sequences

1. Seeding – find a pattern of

matches between sequences

2. Ungapped filtering

3. Gapped extension through

dynamic programming

Motivation for New Algorithms

• Whole genome alignments contain several long gaps - current

constant memory hardware acceleration algorithms (GACT) [2]

require very large on-chip memory (~4-10MB) for optimal results

• Co-design hardware with algorithm to improve sensitivity at high

speed and small constant on-chip memory requirement (1MB)

Architectural Setup• FPGA Architecture: Xilinx Virtex UltraScale+

• Banded Smith-Waterman: 16PEs/array, 48 arrays

• GACT-X: 16 PEs/Array, 1MB traceback memory per array, 8 arrays

• D-SOFT implemented in software

Performance• 20X speedup comparison to single-thread LASTZ• 2000X speedup as compared to gapped filtering software

[1]Frith et al. "Improved search heuristics find 20 000 new alignments between human and mouse genomes" NAR ‘14

[2]Turakhia et al. "Darwin: A Genomics Co-processor Provides upto 15000X Acceleration on Long Read Assembly" ASPLOS ’18

[3]Kent et al. "Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes" PNAS ‘03

0123456789

dm6-droSim1

dm6-droYak2

dm6-dp4

ce11-cb4

% in

crea

se in

top-

5 ch

ain

scor

es

LASTZ with 1 transition Darwin-WGA without transitionDarwin-WGA with 1 transition (Baseline: LASTZ without transition)

Darwin-WGA provides increased sensitivity especially for distantly related

species

Sensitivity AnalysisDarwin-WGA discovers alignments in difficult to align exons

False Positive Rate Analysis• To measure false positive rate, we simulated a random genome with same

chromosome sizes and nucleotide composition as cb4 and aligned and

chained to the ce11 genome

• False positive rate is measured as G_f/G_t, where G_f is the number of

matching base-pairs in chain generated on the simulated random genome

and G_t is the equivalent for the cb4 genome

• We find G_f = 0bp and G_t = 140Mbp, implying a negligibly small false

positive rate

• With the gapped threshold lowered to 3000 (same as LASTZ), the false

positive rate goes upto 0.1%