input sensitive algorithms for multiple sequence alignment

20
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Agarwal @Duke Yonatan Bilu @Hebrew University Rachel Kolodny @Stanford

Upload: spence

Post on 12-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Input Sensitive Algorithms for Multiple Sequence Alignment. Pankaj Agarwal @Duke Yonatan Bilu @Hebrew University Rachel Kolodny @Stanford. Multiple Sequence Alignment. Quantifies similarities among [DNA, Protein] sequences Detects highly conserved motifs & remote homologues - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Input Sensitive Algorithms for  Multiple Sequence Alignment

Input Sensitive Algorithms for

Multiple Sequence Alignment

Pankaj Agarwal @DukeYonatan Bilu @Hebrew

UniversityRachel Kolodny @Stanford

Page 2: Input Sensitive Algorithms for  Multiple Sequence Alignment

Multiple Sequence Alignment

• Quantifies similarities among [DNA, Protein] sequences

• Detects highly conserved motifs & remote homologues– Evolutionary insights– Transfer of annotation– Representation of protein families

Page 3: Input Sensitive Algorithms for  Multiple Sequence Alignment

Multiple Sequence Alignment

• Input: k sequences

• Output: optimal alignment– Gap infused sequences (-), one per row.– Restrictions column pattern

(1) GARFIELD MET NERMAL(2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE(3) GARFIELD AND HIS ASSOCIATE NERMAL

----GARFIELD MET----------------- NERMAL ------------------------------ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------

Page 4: Input Sensitive Algorithms for  Multiple Sequence Alignment

Multiple Sequence Alignment

• Input: k sequences

• Output: optimal alignment– Minimal width– Score function

• Columns summation• e.g. sum of pairs

(1) GARFIELD MET NERMAL(2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE(3) GARFIELD AND HIS ASSOCIATE NERMAL

----GARFIELD MET----------------- NERMAL ------------------------------ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------

Page 5: Input Sensitive Algorithms for  Multiple Sequence Alignment

DP solves MSA– Build a score matrix

• k-dimensional hypercube

– An alignment is a path

– Time:

GARFIELDANDHISASSOCIATENERMAL

GARFIELDMETNERMAL

num of nodes num neighbors per node

GARFIELDMET---------------NERMAL GARFIELD---ANDHISASSOCIATENERMAL

Page 6: Input Sensitive Algorithms for  Multiple Sequence Alignment

Previous WorkMSA Heuristics MSA Complexity

AnalysisFaster pairwise

SA•[Carrillo Lipman 88]•MACAW [Schuler, Altschul, Lipman 91]•ClustalW [Thompson et al 94]•DIAlign [Werner,Morgenstern, Dress 96]•T-Coffee [Notredame et al. 00]•POA [Lee et al. 02]•…

•Optimizing over the space of all possible inputs is NP hard [Jiang,Wang 94]•NP hard for SP[Just 01]•NP hard for SP that is a metric [Bonizzoni, Della Vedova 01]

•Assuming many common subsequences [Wilbur,Lipman 83]•Convex/Concave score functions [Eppstein et al. 92]•Exploiting compressibility of sequences [Landau Crochemore Ziv Ukelson 02]•…

•Review : Biological Sequence Analysis [Durbin et al.]

Page 7: Input Sensitive Algorithms for  Multiple Sequence Alignment

Pairwise Restriction• The “true” information: the aligned

subsequences and their relative positioning

• Study pairwise alignment first and restrict the alignment– Time:

• Focus efforts on “true” tradeoffs

GARFIELDMETNERMAL

GARFIELDANDHISASSOCIATENERMAL

Page 8: Input Sensitive Algorithms for  Multiple Sequence Alignment

Segments Matching Graph (SMG)

• Sequences are partitioned into segments

GARFIELD ANDHISASSOCIATE NERMAL

GARFIELD NERMALMET

NERMALODIE ANDHISASSOCIATE ANDHISASSOCIATEGARFIELDMET

nodesEdges: • self edges• between 2-equal-lengths-segments of different sequences• have scores

Defines allowed paths and their score

Page 9: Input Sensitive Algorithms for  Multiple Sequence Alignment

GARFIELDANDHISASSOCIATENERMAL

ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE

GARFIELD ANDHISASSOCIATE NERMAL

NERMALODIE ANDHISASSOCIATE ANDHISASSOCIATEGARFIELDMET

Page 10: Input Sensitive Algorithms for  Multiple Sequence Alignment

GARFIELDANDHISASSOCIATENERMAL

ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE

GARFIELD ANDHISASSOCIATE NERMAL

NERMALODIE ANDHISASSOCIATE ANDHISASSOCIATEGARFIELDMET

Page 11: Input Sensitive Algorithms for  Multiple Sequence Alignment

GARFIELDANDHISASSOCIATENERMAL

ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE

GARFIELD ANDHISASSOCIATE NERMAL

NERMALODIE ANDHISASSOCIATE ANDHISASSOCIATEGARFIELDMET

Extreme paths:

Page 12: Input Sensitive Algorithms for  Multiple Sequence Alignment

GARFIELDANDHISASSOCIATENERMAL

ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE

GARFIELD ANDHISASSOCIATE NERMAL

NERMALODIE ANDHISASSOCIATE ANDHISASSOCIATEGARFIELDMET

Extreme paths:

Page 13: Input Sensitive Algorithms for  Multiple Sequence Alignment

All paths

Extreme paths

Optimalpaths

Lemma: there is an optimal path that is extreme

Page 14: Input Sensitive Algorithms for  Multiple Sequence Alignment

GARFIELDANDHISASSOCIATENERMAL

ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE

Improved algorithm: DP on the segments

Page 15: Input Sensitive Algorithms for  Multiple Sequence Alignment

Transitive PR-MSAMore restrictions:

• Transitivity• Scoring function is shortest path

Faster algorithms

DNA sequences

*no scores in SMG, only matches

Page 16: Input Sensitive Algorithms for  Multiple Sequence Alignment

Maximal Directions

• Transitivity implies that for any point in the hypercube, the directions are partitioned into cliques – Defines maximal directions

• The shortest path can be taken over maximal directions.

• Pushes down the work per node

Page 17: Input Sensitive Algorithms for  Multiple Sequence Alignment

Obvious Directions

GARFIELD ANDHISASSOCIATE NERMAL

GARFIELD NERMALMET

NERMALODIE ANDHISASSOCIATE ANDHISASSOCIATEGARFIELDMET

GARFIELD ANDHISASSOCIATE NERMAL

GARFIELD NERMALMET

NERMALODIE ANDHISASSOCIATE ANDHISASSOCIATEGARFIELDMET

Obvious:

Non-Obvious:

?

Page 18: Input Sensitive Algorithms for  Multiple Sequence Alignment

Obvious Directions

• Lemma:Optimal pathis found, evenwhen making obvious decisions

• Not all nodes are relevant• Work for every node increases to

Page 19: Input Sensitive Algorithms for  Multiple Sequence Alignment

Special Vertices

(0,0)

Straightjunction

Corner junction

Page 20: Input Sensitive Algorithms for  Multiple Sequence Alignment

Thank you