space efficient alignment algorithms and affine gap penalties dr. nancy warter-perez

21
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms and Affine Gap Penalties

Dr. Nancy Warter-Perez

Page 2: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 2

Outline Algorithm complexity Complexity of dynamic programming

alignment algorithms Memory efficient algorithms Hirschberg’s Divide and Conquer

algorithm Affine gap penalty

Page 3: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 3

Algorithm Complexity Indicates the space and time (computational) efficiency

of a program Space complexity refers to how much memory is required to

execute the algorithm Time complexity refers to how long it will take to execute

(compute) the algorithm Generally written in Big-O notation

O represents the complexity (order) n represents the size of the data set

Examples O(n) – “order n”, linear complexity O(n2) – “order n squared”, quadratic complexity

Constants and lower orders ignored O(2n) = O(n) and O(n2 + n + 1) = O(n2)

Page 4: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 4

Complexity of Dynamic Programming Algorithms for

Global/Local Alignment Time complexity – O(m*n)

For each cell in the score matrix, perform 3 operations

Compute Up, Left, and Diagonal scores O(3*m*n) = O(m*n)

Space complexity – O(m*n) Size of scoring matrix = m*n Size of trace back matrix = m*n O(2*m*n) = O(m*n)

Where, m and n are the lengths of the sequences being aligned. Since m n, O(n2 ) – quadratic complexity!

Page 5: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 5

Memory Requirements For a sequence of 200-500 amino

acids or nucleotides O(n2) = 5002 = 250,000 If store each score as a 32-bit value =

4 bytes, it requires 1,000,000 bytes to represent the scoring matrix!

If store each trace back symbol as a character (8-bit value), it requires 250,000 bytes to represent the trace back matrix

Page 6: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 6

Simple Improvement for Scoring Matrix In reality, the space complexity of

the scoring matrix is only linear, i.e., O(2*min(m,n)) = O(min(m,n)) O(min(m,n)) O(n) for sequences of

comparable lengths 2,000 bytes (instead of 1 million) But, trace back still quadratic space

complexity

Page 7: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 7

Hirschberg’s “Divide and Conquer” Space Efficient Algorithm

Compute the score matrix(s) between the source (0,0) and (n, m/2). Save m/2 column of s. Compute the reverse score matrix (sreverse) between the sink (n, m) and (0,m/2). Save the m/2 column of sreverse.

Find middle (i, m/2) satisfies max 0 in {s(i, m/2) + sreverse(n-i, m/2)}

Recursively partition problem into 2 subproblems

middle

m/2 m(0,0)

(n,m)n

i

m/2 m(0,0)

(n,m)n

middle

m/2 m(0,0)

n

middle

middle

(n,m)

m(0,0)

(n,m)n

m(0,0)

n (n,m)

m(0,0)

n (n,m)

Source

Sink

Page 8: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 8

Pseudo Code of Space-Efficient Alignment Algorithm

Path (source, sink)If source and sink are in consecutive columns

output the longest path from the source to the sink

Elsemiddle middle vertex between source and sinkPath (source, middle)Path (middle, sink)

Page 9: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 9

Complexity of Space-Efficient Alignment Algorithm

Time complexity Equal to the sum of the areas of the rectangles

Area + ½ Area + ¼ Area + … 2*Areawhere, Area = n*m

O(2n*m) = O(n*m) Quadratic time/computation complexity (same as before)

Space complexity Need to save a column of s and sreverse for each computation

(but can discard after computing middle) O(min(n,m)) – if m < n, switch the sequences (or save a row

of s and sreverse instead) Linear space complexity!!

Reference: http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/

Page 10: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 10

Gap Penalties Gap penalties account for the introduction

of a gap - on the evolutionary model, an insertion or deletion mutation - in both nucleotide and protein sequences, and therefore the penalty values should be proportional to the expected rate of such mutations.

http://en.wikipedia.org/wiki/Sequence_alignment#Assessment_of_significance

Page 11: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 11

Page 12: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 12

Source: http://www.apl.jhu.edu/~przytyck/Lect03_2005.pdf

Page 13: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 13

Page 14: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 14

Page 15: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 15

Page 16: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 16

Page 17: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 17

Page 18: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 18

Page 19: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 19

Project Verification - Use EMBOSS Pairwise Alignment Tool http://www.ebi.ac.uk/Tools/emboss/align/index.html

Page 20: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 20

Project Verification – LALIGNhttp://www.ch.embnet.org/software/LALIGN_form.html

Page 21: Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

Space Efficient Alignment Algorithms 21

Workshop Work on Sequence Alignment project Email me a progress report by 6 p.m. on

Tuesday, July 3rd

Specify the implementation status for each module List each function within a module and specify it’s status

Date written Date testing completed Author

Include functions in the list that are not completed (I.e., not written yet or fully tested). For these cases, write TBD (to be determined) in the respective date field.

Only one report per group, but cc your partner on your e-mail!