solving the longest common subsequence (lcs) problem …svirdi/data/thesis.pdfparallel counterpart...

77
Virdi Sabegh Singh (Advisor Dr. Robert A. Walker) Computer Science Department Kent State University Solving the Longest Common Subsequence (LCS) problem using the Associative ASC Processors with Reconfigurable 2D Mesh

Upload: others

Post on 31-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Virdi Sabegh Singh(Advisor Dr. Robert A. Walker)Computer Science Department

Kent State University

Solving the Longest Common Subsequence (LCS) problem using the Associative ASC Processors with Reconfigurable 2D Mesh

Page 2: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation Outline

String matching and its variationsMotivation of LCSRole of LCS in Molecular Biology Overview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 3: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation OutlineReconfigurable Network in the ASC ProcessorModifying the Network for LCS AlgorithmLongest Common Subsequence on Reconfigurable 2D Mesh

Exact match

Longest Common Subsequence on Reconfigurable 2D Mesh

Approximate match

Summary and Future work

Page 4: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation Outline

String Matching and its variationsMotivation of LCSRole of LCS in Molecular Biology Overview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 5: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

String MatchingFundamental operation in computingComparison of characters, words etc. to determine their similarityInterest is in the area of bioinformatics, in particular searching genetic databasesString are enormous, efficient string processing is therefore a requirement

Page 6: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

String Matching VariationsIs Exact match the only solution?What if the pattern does not occur in the text? Find the longest subsequence that occurs both in the pattern and in the text. Longest Common Subsequence, Longest Common Substring, Sequence alignment, Edit distance Problem are all variation of SM problem

Page 7: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Sequence alignmentProcedure of comparing 2 or more sequencesSearches series of individual character pattern in the same order in the sequence

LCSFind a common string for both the sequences preserving symbol order

Sequence alignment vs. LCS

GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII

GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV|||::::| : |::| ||:::||||:|:|||:: ::| |::::

Page 8: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation Outline

String matching and its variationsMotivation of LCSRole of LCS in Molecular Biology Overview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 9: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Motivation of LCS

Molecular BiologyFile comparison Screen redisplay Cheater finderPlagiarism detectionCodes and Error Control

Spell checkingHuman speechGas ChromatographyBird song analysisData compressionSpeech recognition

Page 10: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation Outline

String matching and its variationsMotivation of LCSRole of LCS in Molecular BiologyOverview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 11: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Role of LCS in Molecular biology

DNA sequences (genes) represented by four letters ACGT, corresponding to the four submolecules forming DNAWhen biologists find a new sequences, they typically want to know what other sequences it is most similar toOne way of computing how similar (homologous) two sequences are is to find the length of their longest common subsequence

Page 12: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Role of LCS in Molecular biology

This is a simplification, since in the biological situation one would typically take into account not only the length of the LCS, but also i.e., how gaps occur when the LCS is embedded in the two original sequences.An obvious measure for the closeness of two strings is to find the maximum number of identical symbols (preserving symbol order)This by definition, is the longest common subsequence of the strings

Page 13: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation Outline

String matching and its variationsMotivation of LCSRole of LCS in Molecular Biology Overview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 14: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Longest Common Subsequences

Formally, we compare two strings, X[1..m] and Y[1..n], which are elements of the set Σ*; here Σdenotes the input alphabet containing σ symbolsThe LCS of strings X and Y, lcs(X,Y) is a common subsequences of maximal lengthSpecial case of the edit distance problem

The distance between X and Y is defined as the minimal number of elementary operations needed to transform the source string X to the target string YIn practical applications, operation are restricted to insertions, deletions and substitutionsFor each operation, an application dependent cost is assigned

Page 15: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Longest Common SubsequencesLCS(X,Y) typically solved with the dynamic programming technique and filling an mxntableTable elements acts as a vertices in a graph, and the simple dependencies between the table values defines the edgesThe task is to find the longest path between the vertices in the upper left and lower right corner of the table

Page 16: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation Outline

String matching and its variationsMotivation of LCSRole of LCS in Molecular Biology Overview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 17: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Folklore AlgorithmFoundation of most of the LCS algorithmsGiven two strings, find the LCS common to both strings.Example:

String 1: AGACTGAGGTAString 2: ACTGAG

AGACTGAGGTA- -ACTGAG - - - list of possible alignments- -ACTGA - G- -A- -CTGA - G- -A- -CTGAG - - -

The time complexity of this algorithm is clearly O(nm);

Page 18: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Folklore AlgorithmComplexity does not depend on the sequences uand v themselves but only on their lengthsBy choosing carefully the order of computing the d(i,j)'s one can execute the above algorithm in space O(n+m)The bottleneck in efficient parallelization of LCS problem are the calculating the value of diagonal elements, as shown

Page 19: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

As seen, the value of {i,j} depend upon the previous element {i-1,j-1}, when a match is found.We may have more then one LCS for the same problemIn order to find the best LCS, we associate some parameterThe Smith-Waterman Algorithm uses the same concept that of Folklore algorithm, but gives us the optimal result (LCS)

Folklore Algorithm

Page 20: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Folklore Algorithm

1 1 1 1 1

11

2111

1 222222

111111

3

1

1

1

44443222

3333

43332

5

55

43332 6

5

4

3

2 2

666

5 5

4

3

0 0 0 0 0 0 0 0 0 0 0 0A G A C T G A G G T A

0

0

0

0

0

0

A

C

T

G

A

G

Page 21: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation Outline

String matching and its variationsMotivation of LCSRole of LCS in Molecular Biology Overview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 22: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Parallel CounterpartSerial LCS algorithm runs in O(nm) time, where n is the length of the text string, and m is the length of pattern stringEfficient Parallel algorithm do exist to solve this computational extensive task

Some algorithm runs in O(max{n,m}) using O(min{n,m}) processorsO(logn) using O(mn/logn) processorsThere are constant time algorithm for this LCS problem using the DP approach, using some assumptions

Page 23: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Computation ModelVarious Network Models have been used to solve this LCS problemPRAM model, Suffix Tree, 2D-Mesh Network, Mesh with Reconfigurable buses, Mesh with Multiple buses etcAlgorithm which runs in constant time, assume that most of the operation are done in constant timeIn parallel version, one of the important task is to distribute data efficiently and easy manner

Page 24: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation Outline

String matching and its variationsMotivation of LCSRole of LCS in Molecular Biology Overview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 25: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

The ASC ProcessorA scalable design implemented on a million gate Altera FPGASIMD-like architectureSearches data by content instead of address8-bit Instruction Stream (IS) control unit with 8-bit Instruction and Data addresses, 32-bit instructions

Page 26: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

mem

ory

and

supp

ortin

g ci

rcui

try

PE and Memory

Net

wor

k

PE and Memory

PE and Memory

PE and MemoryCommonRegisters

ResponderResolution

Unit

PE Array

ControlUnit

Inst

ruct

ion

Bus

Dat

aB

us

From Control Unit

The ASC Architecture

Page 27: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

The ASC ArchitectureEach PE listens to the IS through the broadcast and reduction networkPEs can communicate amongst themselves using the PE NetworkPE may either execute or ignore the microcode instruction broadcast by IS under the control of the Mask Stack

Page 28: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

The ASC FeaturesAssociative Search

Each PE can search its local memory for a key under the control of IS

Responder ResolutionA special circuit signals if ‘at least one’ record was found

Masked OperationLocal Mask Stacks can turn on or off the execution of instruction from IS

Page 29: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Communication between PE’sIn 2D mesh network,

Communication between P.E’s themselves take place in two different ways

By using the nearest neighbors mesh interconnection networkPowerful variation on the nearest-neighbor mesh called the “Coterie network”, developed in response to the requirement for nonlocal communication

Processors in a group share common properties and purpose, we call the group a coterie, and hence the name coterie network

Page 30: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation OutlineString matching and its variationsMotivation of LCSRole of LCS in Molecular Biology Overview of LCSDiscussion on Folklore algorithmParallel Algorithms for LCSDiscussion on ASC processorBrief introduction on Coterie Network

Page 31: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Coteries[ Weems & Herbordt ]“A small often selected group of persons who

associate with one another frequently”Features:

Related to other Reconfigurable broadcast networkDescribable using hypergraphsAnd they are dynamic in nature

Advantages:Propagation of information quickly over long distances at electrical speedSupport of one-to-many communication within coterie, reconfigurability of the coterie

Page 32: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Coterie NetworkProvides method of performing operations on regions of an image in parallelUsed extensively for Matrix Arithmetic, FFT, Convex Hull Computation, Simulating a pyramid processors, General Permutation Routing and Parallel PrefixNote that the coterie network is separate from the nearest-neighbor mesh, which we refer to as the SEWN networkCoterie network results in a new mode of parallelism that falls between SIMD and MIMD

Page 33: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

PE’s form Coteries

5 x 5 coterie network with switches shown in “arbitrary”settings. Shaded areas denotes coterie (the set of PEs Sharing same circuit)

Page 34: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Coterie’s Physical StructureIn the physical implementation, each PE controls set of switches

Four of these switches control access in the different directions (N,S,E,W)Two switches H and V are used to emulated horizontal and vertical busesThe two switches NE and NW are used to creation of eight way connected region

Coteries Structure

NW NE

WS ES

V

H E

S

W

: Switch

N

Page 35: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Coterie NetworkThe isolated group of processors called coterie’s, have access only to the multicast within a coterieWhen the switches are set, connected processors form a CoterieThe coterie network switches are set by loading the corresponding bits of the mesh control register in each P.E

Page 36: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Basic Coterie structure algorithmThe complexity is assumed to be O(1) unless otherwise stated

Transfer of data between two adjacent coteriesSymmetry breaking between a pair of nodes in a coterieTwo nodes within a coterie exchange information

Page 37: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation OutlineReconfigurable Network in the ASC ProcessorModifying the Network for LCS AlgorithmLongest Common Subsequence on Reconfigurable 2D Mesh

Exact match

Longest Common Subsequence on Reconfigurable 2D Mesh

Approximate match

Summary and Future work

Page 38: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Reconfigurable Network in the ASC Processor

Scalable design with Reconfigurable networkCan be used as dedicated ASIC or Co-processorImplemented on Altera APEX20KC1000, single CPU, 50 pipelined PE & linear PE interconnection networkKey to reconfigurability is the Data Switchinside each PE S

N

W E

DATA SWITCH

Page 39: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Reconfigurable Network in the ASC Processor

Linear network, PE communicates both ways2D Reconfigurable Network, PE communicates with all of its neighbors (N-E-S-W) Data switch has bypass mode to allow PE communication to skip non-responder, so as to support Associative computing

Page 40: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation OutlineReconfigurable Network in the ASC ProcessorModifying the Network for LCS AlgorithmLongest Common Subsequence on Reconfigurable 2D Mesh

Exact match

Longest Common Subsequence on Reconfigurable 2D Mesh

Approximate match

Summary and Future work

Page 41: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Modifying the Network for LCS Algorithm

Coterie Network, one of the powerful networkBut we don’t need full features of the same for the LCS AlgorithmAugmented ASC with new 2D Mesh, with row and column broadcast busesModified linear network into 2D MeshAdded features inspired by Coterie networkA PE can communicate now, with any of its four neighborsBypass mode augmented to support H and V bypass as well

Page 42: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation OutlineReconfigurable Network in the ASC ProcessorModifying the Network for LCS AlgorithmLongest Common Subsequence on Reconfigurable 2D Mesh

Exact match

Longest Common Subsequence on Reconfigurable 2D Mesh

Approximate match

Summary and Future work

Page 43: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

We assume, initially all the internal switch of the PEs are openEach PEs have a Match Register “M” and Length Register “L”, initially having value 0Let the Text string T=T(1)T(2)…T(n) been fed into row 1 of the Reconfigurable 2D MeshPE(0,j) stores T(j), where 0<=j<=n, as shownThis steps take unit time.

Page 44: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

A G A C T G A C T G A

Page 45: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Broadcast each character of the text string along the column, using column broadcast busIn case of Coterie network

Form coteries along the columnPerform operation multicast in all coteriesThis step takes unit time.

Page 46: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

A G A C T G A C T G A

A G A C T G A C T G A

A G A C T G A C T G A

A G A C T G A C T G A

A G A C T G A C T G A

A G A C T G A C T G A

Page 47: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Let the Pattern string P=P(1)P(2)…P(m) been fed into column 1 of the Reconfigurable 2D MeshPE(i,0) stores P(j), where 0<=i<=m, as shownThis steps take unit time

Page 48: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

A

C

T

G

A

C

Page 49: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

PE’s form CoteriesBroadcast each character of the Pattern string along the row, using row broadcast busIn case of Coterie network

Form coteries along the rowsPerform operation multicast in all coteriesThis step takes unit time

Page 50: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

A

C

T

G

A

C

Page 51: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

After this step each PE’s with index [i,j] have P[i] T[j].Now each PE’s compares the content held in his internal Register.It set the value 1 if they are equal else 0 in its Match register M.This step takes unit time.Next figure shows the value after this operation

Page 52: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

1 0 1 0 0

00

0000

0 010001

100010

1

0

0

1

00010001

1000

00010

0

01

00100 1

0

1

0

0 0

000

0 1

0

0

A G A C T G A C T G A

A

C

T

G

A

C

Page 53: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Parallel VLDC SM Algorithm on MCCRB Network

A Parallel SM algorithm With VLDC proposed by K.L. Chung in 1995Uses the Mesh-Connected Computer with reconfigurable buses system.Runs in O(1) timePattern of size m , Text of size n uses, O(nm) PE’s.

Page 54: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Now expect the PE’s with index[0,j], where 0<=j<=n, all PEs having value 0 in its Match register M closes the N-E switch.PE’s with value 1 in its Match Register M closes the W-S switch as shown Both the steps takes unit time

Page 55: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

1 0 1 0 0

00

0000

0 010001

100010

1

0

0

1

00010001

1000

00010

0

01

00100 1

0

1

0

0 0

000

0 1

0

0

A G A C T G A C T G A

A

C

T

G

A

C

Page 56: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Sequential Version:Each PE at the beginning (bottom) of an LCS sends a token to its West neighborA PE receiving a token adds 1 to its token if its Match Register “M” Contains 1, and passes the token on if its W-S bypass switch is set and stores it in its Length Register “L”Perform operation MAX on the entire networkThe PE with the largest value in its Length register “L” is the start of the LCSComplexity being the length of the LCS found

Page 57: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

1 0 6 0 0

00

0000

0 040005

100050

4

0

0

1

00030003

3000

00020

0

02

00100 1

0

2

0

0 0

000

0 1

0

0

A G A C T G A C T G A

A

C

T

G

A

C

Page 58: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Parallel Version:Each PE a the beginning (bottom) sends its [row, column] id to its west neighborPE receiving an ID passes it onOr is it’s the end of an LCS subtracts its own ID from the received IDStore the value in the Length Register “L”Perform operation Max on the networkPE having largest value in its Length Register “L” is the start of the LCSComplexity, Constant time

Page 59: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

1,1 1,2 1,3 1,4 1,5

02,1

0003,1

0 02,80002,4

1,111,101,91,81,71,6

3,5

4,1

6,1

5,1

0004,60004,2

3,9000

0005,30

0

05,7

006,400 6,8

0

4,10

0

0 0

000

0 5,11

0

0

A G A C T G A C T G A

A

C

T

G

A

C

Page 60: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

1 6 5

3

A G A C T G A C T G A

A

C

T

G

A

C

Page 61: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Exact match implemented on Altera APEX1000KC FPGASufficient to hold 6 x 11 arrays of PEs, used in the exampleRan at a clock speed of 37 MHz, with respect to the number of PEsLarger network can be easily supported, due to ASC scalability

Page 62: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

The algorithm described above solve the LCS problem for exact matchDoesn’t address approximate matchThe next example demonstrate this problem

For the string:Text : AGACTGAGGTAPattern : ACCAGGLCS being : ACAGG

Page 63: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation OutlineReconfigurable Network in the ASC ProcessorModifying the Network for LCS AlgorithmLongest Common Subsequence on Reconfigurable 2D Mesh

Exact match

Longest Common Subsequence on Reconfigurable 2D Mesh

Approximate match

Summary and Future work

Page 64: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

1 0 1 0 0

00

1000

0 000001

100010

0

1

0

0

00100010

0000

10001

0

10

10001 1

1

0

0

0 0

001

0 0

1

0

A G A C T G A G G T A

A

C

C

A

G

G

Page 65: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

0 1 0 0 0

01

1000

1 001000

001101

0

1

0

0

00100010

0000

10001

0

10

10001 1

1

0

0

0 1

001

0 0

1

0

A G A C T G A G G T A

G

A

C

A

G

G

Page 66: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Inject token from the bottom rowToken reaches a gap, enter south port of some PE, and stops at that PE, whose W-S switch is not setClose the W-S bypass switch of that PE, and bypass Vertically (N-S) of all to the top of the PEs identified in above step

Page 67: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Inject token from the top rowToken reaches a gap, enter West port of some PE, and stop at that PE whose W-S switch is not setClose the W-S bypass switch of that PE, and Bypass Horizontally (W-S) of all PEs to the right of the PE identified in above stepBypass W-S switch of all those PEs, where there is cross over of H and V switch

Page 68: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

Inject token from the bottom rowPE receiving a token adds 1 to its Match Register “M” contains 1 and passes it on if its W-S bypass switch is set, if ends of LCS stores it in the Length Register “L”The PE with the largest value in its “L”register is the start of LCSIncrement “L” by 1, if “M” register has value 1

Page 69: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

When H or V switch are set, the token bypass this switch, the “L” value remains unchangedWe bypass only those tokens whose, value in the “M” Match register is maximum and that in “L” Length register is Minimum.If both the token have “M” value same, block that token having “L” value maximumIf both “L” and “M” value are same, select any one of them

Page 70: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

1 0 1 0 0

00

1000

0 000001

100010

0

1

0

0

00100010

0000

10001

0

10

10001 1

1

0

0

0 0

001

0 0

1

0

A G A C T G A G G T A

A

C

C

A

G

G

Page 71: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

LCS Algorithm on Reconfigurable 2D Mesh

0 1 0 0 0

01

1000

1 001000

001101

0

1

0

0

00100010

0000

10001

0

10

10001 1

1

0

0

0 1

001

0 0

1

0

A G A C T G A G G T A

G

A

C

A

G

G

Page 72: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Presentation OutlineReconfigurable Network in the ASC ProcessorModifying the Network for LCS AlgorithmLongest Common Subsequence on Reconfigurable 2D Mesh

Exact match

Longest Common Subsequence on Reconfigurable 2D Mesh

Approximate match

Summary and Future work

Page 73: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Summary and Future workSummary:

In this Presentation, we have described a new parallel algorithm on specialized hardwareInspired by certain feature of Coterie NetworkModified ASC processor to add reconfigurable 2D MeshExact Match implemented on Altera FPGAConstant time algorithm for Exact matchApproximate algorithm depends upon the diameter of the network

Page 74: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Summary and Future workFuture Work:

Optimize the algorithm for Approximate matchIncorporating additional parameters to find the best LCS, instead of longest oneIncorporating different weights schemesConserve memory by using encoding scheme

Use two bits to represent four bases of DNAUsing this idea, we save 75% of space/memory

Page 75: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

AcknowledgementsProfessor WalkerCommittee members for their timeASC/MASC Group for their useful CommentsProfessor Helen Piontkivska from Biology DepartmentProfessor Charles Weems and Martin HerbordtHong Wang for implementing the exact match algorithm on FPGA

Page 76: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

THANK YOU

Page 77: Solving the Longest Common Subsequence (LCS) problem …svirdi/data/Thesis.pdfParallel Counterpart Serial LCS algorithm runs in O(nm) time, where n is the length of the text string,

Questions….