using network processors in genomics
DESCRIPTION
H. Bos – Leiden University 13/02/2004. 1. Using Network Processors in Genomics. Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit, Netherlands http://www.liacs.nl/~herbertb/projects/biocomp/. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/1.jpg)
Using Network Processors inGenomics
Herbert Bos* †
Kaiming Huang*
{herbertb,khuang}@liacs.nl
*Leiden Universiteit, Netherlands† Vrije Universiteit, Netherlands
http://www.liacs.nl/~herbertb/projects/biocomp/
H. Bos – Leiden University 13/02/2004 1
![Page 2: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/2.jpg)
Case study: BLAST
● search nucleotide/protein database for query● BLAST discovers similarity rather than exact
match● two main phases:
1. scoring (registering where query and DNADB match)
2. alignment (dynamic programming)
● only the first phase on NPUs
H. Bos – Leiden University 13/02/2004 2
![Page 3: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/3.jpg)
Window matching
H. Bos – Leiden University 13/02/2004 3
![Page 4: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/4.jpg)
Window matching
H. Bos – Leiden University 13/02/2004 4
![Page 5: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/5.jpg)
Window matching
H. Bos – Leiden University 13/02/2004 5
![Page 6: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/6.jpg)
Window matching
H. Bos – Leiden University 13/02/2004 6
![Page 7: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/7.jpg)
Window matching
● naïve approach: roughly W*N*M comparisons● does not scale ● string search algorithms: Aho-Corasick
– all windows matched at the same time– shifting genome one nucleotide at a time– matching algorithm transformed in a DFA
● DFA may be quite large
H. Bos – Leiden University 13/02/2004 7
![Page 8: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/8.jpg)
Aho-Corasick
H. Bos – Leiden University 13/02/2004 8
● Alphabet: acgt● Window size: 3● Query: acgccga● Windows:
{acg,cgc,gcc,ccg,cga}
![Page 9: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/9.jpg)
Aho-Corasick
H. Bos – Leiden University 13/02/2004 9
0 1 2 3
4 5 6
12
10 11
7 8 9
t a c g
c
g
g c
a
g
cc
c
s 1 2 3 4 5 6 7 8 9 10 11 12
f(s) 0 4 5 0 7 8 0 4 10 4 5 1
● Alphabet: acgt● Window size: 3● Query: acgccga● Windows:
{acg,cgc,gcc,ccg,cga}
![Page 10: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/10.jpg)
Aho-Corasick
H. Bos – Leiden University 13/02/2004 10
0 1 2 3
4 5 6
12
10 11
7 8 9
t a c g
c
g
g c
a
g
cc
c
● Alphabet: acgt● Window size: 3● Query: acgccga● Windows:
{acg,cgc,gcc,ccg,cga}
s 1 2 3 4 5 6 7 8 9 10 11 12
f(s) 0 4 5 0 7 8 0 4 10 4 5 1
3 6 9 11 12
acg cgc gcc ccg cga
![Page 11: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/11.jpg)
Aho-Corasick
H. Bos – Leiden University 13/02/2004 11
0 1 2 3
4 5 6
12
10 11
7 8 9
t a c g
c
g
g c
a
g
cc
c
● Alphabet: acgt● Window size: 3● Query: acgccga● Windows:
{acg,cgc,gcc,ccg,cga}
s 1 2 3 4 5 6 7 8 9 10 11 12
f(s) 0 4 5 0 7 8 0 4 10 4 5 1
3 6 9 11 12
acg cgc gcc ccg cga tacgcga
![Page 12: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/12.jpg)
H. Bos – Leiden University 13/02/2004 12
ControlProcessor
NPU (IXP1200)
ME
ME
ME
ME
ME
ME
PCI Bus
StrongARM Microengines
DRAM
SRAM
Gbps ports
Pentium
PCI
scratch
IXPBlastArchitecture
![Page 13: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/13.jpg)
H. Bos – Leiden University 13/02/2004 13
ControlProcessor
NPU (IXP1200)
ME
ME
ME
ME
ME
ME
PCI Bus
StrongARM Microengines
DRAM
SRAM
Gbps ports
Pentium
PCI
scratch
IXPBlastArchitecture
![Page 14: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/14.jpg)
H. Bos – Leiden University 13/02/2004 14
ControlProcessor
NPU (IXP1200)
ME
ME
ME
ME
ME
ME
PCI Bus
StrongARM Microengines
DRAM
SRAM
Gbps ports
Pentium
PCI
scratch
IXPBlastArchitecture
![Page 15: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/15.jpg)
H. Bos – Leiden University 13/02/2004 15
ControlProcessor
NPU (IXP1200)
ME
ME
ME
ME
ME
ME
PCI Bus
StrongARM Microengines
DRAM
SRAM
Gbps ports
Pentium
PCI
scratch
IXPBlastArchitecture
0 1 2 3
4 5 6
12
10 11
7 8 9
t a c g
c
g
g c
a
g
cc
c
![Page 16: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/16.jpg)
H. Bos – Leiden University 13/02/2004 16
ControlProcessor
NPU (IXP1200)
ME
ME
ME
ME
ME
ME
PCI Bus
StrongARM Microengines
DRAM
SRAM
Gbps ports
Pentium
PCI
scratch
IXPBlastArchitecture
0 1 2 3
4 5 6
12
10 11
7 8 9
t a c g
c
g
g c
a
g
cc
c
![Page 17: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/17.jpg)
H. Bos – Leiden University 13/02/2004 17
ControlProcessor
NPU (IXP1200)
ME
ME
ME
ME
ME
ME
PCI Bus
StrongARM Microengines
DRAM
SRAM
Gbps ports
Pentium
PCI
scratch
IXPBlastArchitecture
0 1 2 3
4 5 6
12
10 11
7 8 9
t a c g
c
g
g c
a
g
cc
c
![Page 18: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/18.jpg)
IXPBlast: packet handling
● packets read and processed in batches of 100.000● “spilling” must be taken into account● currently no feedback
H. Bos – Leiden University 13/02/2004 18
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
![Page 19: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/19.jpg)
Results
● 232 MHz IXP1200 ~ 1.8GHz Pentium-4● 1611 Nucleotide query (MyD88)● 1.4 GB genome (Zebrafish)
– IXP1200: 90 sec with DFA– IXP1200: 129 sec with “trie”– P4: 132: 132 sec with “trie”
● number of matches: 524856
H. Bos – Leiden University 13/02/2004 19
![Page 20: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/20.jpg)
Results
H. Bos – Leiden University 13/02/2004 20
Query size
DNADB
sizeImpl. Performance
1611 1.4 GB P4 132 sec
1611 1.4 GB IXP1200 129 sec
1611 1.4 GB IXP1200
DFA
90 sec
![Page 21: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/21.jpg)
Conclusions
● NPUs are useful in other application domains● Newer hardware is expected to perform much
better● “Throughput processors”● Adapting our current approach to use BLAST
tricks/heuristics
H. Bos – Leiden University 13/02/2004 21
![Page 22: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/22.jpg)
Network processors
● geared for high throughput● used exclusively in network systems● example: intrusion detection● similar to looking for gene on
in genomes● differences
H. Bos – Leiden University 13/02/2004 22
Radisysixp1200 board
![Page 23: Using Network Processors in Genomics](https://reader036.vdocuments.site/reader036/viewer/2022062518/56814516550346895db1d92b/html5/thumbnails/23.jpg)
Application domain: “Genomics”
● example: search genome for occurrence of “patterns”● similar problems as IDS, poor performance on GPP
cannot exploit parallelism– throughput-driven– how about FPGAs?– how about clusters?
● NPU– easier to program than FPGAs– cheaper than cluster computing– “on the desktop” IP never leaves the room
H. Bos – Leiden University 13/02/2004 23